clojure split string into different size chunks

clojure split string into different size chunks - clojure

I'm trying to split a string into n chunks of variable sizes.
As input I have a seq of the sizes of the different chunks:
(10 6 12)
And a string:
"firstchunksecondthirdandlast"
I would like to split the string using the sizes as so:
("firstchunk" "second" "thirdandlast")
As a newbie I still have a hard time wrapping my head around the most idiomatic way to do this.

Here is two ways to do this:
One version uses reduce which you can use very often if you want to carry some kind of state (here: The index where you're currently at). The reduce would need a second fn call applied to it to have the result in your form.
;; Simply take second as a result:
(let [s "firstchunksecondthirdandlast"]
(reduce
(fn [[s xs] len]
[(subs s len)
(conj xs (subs s 0 len))])
[s []]
[10 6 12]))
The other version first builds up the indices of start-end and then uses destructing to get them out of the sequence:
(let [s "firstchunksecondthirdandlast"]
(mapv
(fn [[start end]]
(subs s start end))
;; Build up the start-end indices:
(partition 2 1 (reductions + (cons 0 [10 6 12])))))
Note that neither of these are robust and throw ugly errors if the string it too short. So you should be much more defensive and use some asserts.

Here is my go at the problem (still a beginner with the language), it uses an anonymous function and recursion until the chunks list is empty. I have found this pattern useful when wanting to accumulate results until a condition is met.
str-orig chunks-orig [] sets the initial arguments for the anonymous function: the full string, full list of chunks and an empty vec to collect results into.
(defn split-chunks [str-orig chunks-orig]
((fn [str chunks result]
(if-let [len (first chunks)] (recur
(subs str len)
(rest chunks)
(conj result (subs str 0 len)))
result))
str-orig chunks-orig []))
(split-chunks "firstchunksecondthirdandlast" '(10 6 12))
; ["firstchunk" "second" "thirdandlast"]

Related

How to really shuffle sequence in Clojure?

(defn shuffle-letters
[word]
(let [letters (clojure.string/split word #"")
shuffled-letters (shuffle letters)]
(clojure.string/join "" shuffled-letters)))
But if you put in "test" you can get "test" back sometimes.
How to modify the code to be sure that output will never be equal to input.
I feel embarrassing, I can solve it easily in Python, but Clojure is so different to me...
Thank you.
P.S. I thing we can close the topic now... The loop is in fact all I needed...

You can use loop. When the shuffled letters are the same as the original, recur back up to the start of the loop:
(defn shuffle-letters [word]
(let [letters (clojure.string/split word #"")]
(loop [] ; Start a loop
(let [shuffled-letters (shuffle letters)]
(if (= shuffled-letters letters) ; Check if they're equal
(recur) ; If they're equal, loop and try again
(clojure.string/join "" shuffled-letters)))))) ; Else, return the joined letters
There's many ways this could be written, but this is I think as plain as it gets. You could also get rid of the loop and make shuffle-letters itself recursive. This would lead to unnecessary work though. You could also use let-fn to create a local recursive function, but at that point, loop would likely be cleaner.
Things to note though:
Obviously, if you try to shuffle something like "H" or "HH", it will get stuck and loop forever since no amount of shuffling will cause them to differ. You could do a check ahead of time, or add a parameter to loop that limits how many times it tries.
This will actually make your shuffle less random. If you disallow it from returning the original string, you're reducing the amount of possible outputs.
The call to split is unnecessary. You can just call vec on the string:
(defn shuffle-letters [word]
(let [letters (vec word)]
(loop []
(let [shuffled-letters (shuffle letters)]
(if (= shuffled-letters letters)
(recur)
(clojure.string/join "" shuffled-letters))))))

Here's another solution (using transducers):
(defn shuffle-strict [s]
(let [letters (seq s)
xform (comp (map clojure.string/join)
(filter (fn[v] (not= v s))))]
(when (> (count (into #{} letters)) 1)
(first (eduction xform (iterate shuffle letters))))))
(for [_ (range 20)]
(shuffle-strict "test"))
;; => ("etts" "etts" "stte" "etts" "sett" "tste" "tste" "sett" "ttse" "sett" "ttse" "tset" "stte" "ttes" "ttes" "stte" "stte" "etts" "estt" "stet")
(shuffle-strict "t")
;; => nil
(shuffle-strict "ttttt")
;; => nil
We basically create a lazy list of possible shuffles, and then we take the first of them to be different from the input. We also make sure that there are at least 2 different characters in the input, so as not to hang (we return nil here since you don't want to have the input string as a possible result).

If you want your function to return a sequence:
(defn my-shuffle [input]
(when (-> input set count (> 1))
(->> input
(iterate #(apply str (shuffle (seq %))))
(remove #(= input %)))))
(->> "abc" my-shuffle (take 5))
;; => ("acb" "cba" "bca" "acb" "cab")
(->> "bbb" my-shuffle (take 5))
;; => ()

Is it bad practice to try and keep track of iterations while using reduce/map in Clojure?

So being new to Clojure and functional programming in general, I sometimes (to quote a book) "feel like your favourite tool has been taken from you". Trying to get a better grasp on this stuff I'm doing string manipulation problems.
So knowing the functional paradigm is all about recursion (and other things) I've been using tail recursive functions to do things I'd normally do with loops, then trying to implement using map or reduce. For those more experienced, does this sound like a sane thing to do?
I'm starting to get frustrated because I'm running into problems where I need to keep track of the index of each character when iterating over strings but that's proving difficult because reduce and map feel "isolated". I can't increment a value while a string is being reduced...
Is there something I'm missing; a function for exactly this.. Or can this specific case just not be implemented using these core functions? Or is the way I'm going about it just wrong and un-functional-like which is why I'm stuck?
Here's an example I'm having:
This function takes five separate strings then using reduce, builds a vector containing all the characters at position char-at in each string. How could you change this code so that char-at (in the anonymous function) gets incremented after each string gets passed? This is what I mean by it feels "isolated" and I don't know how to get around this.
(defn new-string-from-five
"This function takes a character at position char-at from each of the strings to make a new vector"
[five-strings char-at]
(reduce (fn [result string]
(conj result (get-char-at string char-at)))
[]
five-strings))
Old :
"abc" "def" "ghi" "jkl" "mno" -> [a d g j m] (always taken from index 0)
Modified :
"abc" "def" "ghi" "jkl" "mno" ->[a e i j n] (index gets incremented and loops back around)

I don't think there's anything insane about writing string manip functions to get your head around things, though it's certainly not the only way. I personally found clojure for the brave and true, 4clojure, and the clojurians slack channel most helpful when learning clojure.
On your question, probably the most common thing to do would be to add an index to your initial collection (in this case a string) using map-indexed
(user=> (map-indexed vector [9 9 9])
([0 9] [1 9] [2 9])
So for your example
(defn new-string-from-five
"This function takes a character at position char-at from each of the strings to make a new vector"
[five-strings char-at]
(reduce (fn [result [string-idx string]]
(conj result (get-char-at string (+ string-idx char-at))))
[]
(map-indexed vector five-strings)))
But how would I build map-indexed? Well
Non-lazily:
(defn map-indexed' [f coll]
(loop [idx 0
res []
rest-coll coll]
(if (empty? rest-coll)
res
(recur (inc idx) (conj res (f idx (first rest-coll))) (rest rest-coll)))))
Lazily (recommend not trying to understand this yet):
(defn map-indexed' [f coll]
(letfn [(map-indexed'' [idx f coll]
(if (empty? coll)
'()
(lazy-seq (conj (map-indexed'' (inc idx) f (rest coll)) (f idx (first coll))))))]
(map-indexed'' 0 f coll)))

You can use reductions:
(defn new-string-from-five
[five-strings]
(->> five-strings
(reductions
(fn [[res i] string]
[(get-char-at string i) (inc i)])
[nil 0])
rest
(mapv first)))
But in this case, I think map, mapv or map-indexed is cleaner. E.g.
(map-indexed
(fn [i s] (get-char-at s i))
["abc" "def" "ghi" "jkl" "mno"])

How to make reduce more readable in Clojure?

A reduce call has its f argument first. Visually speaking, this is often the biggest part of the form.
e.g.
(reduce
(fn [[longest current] x]
(let [tail (last current)
next-seq (if (or (not tail) (> x tail))
(conj current x)
[x])
new-longest (if (> (count next-seq) (count longest))
next-seq
longest)]
[new-longest next-seq]))
[[][]]
col))
The problem is, the val argument (in this case [[][]]) and col argument come afterward, below, and it's a long way for your eyes to travel to match those with the parameters of f.
It would look more readable to me if it were in this order instead:
(reduceb val col
(fn [x y]
...))
Should I implement this macro, or am I approaching this entirely wrong in the first place?

You certainly shouldn't write that macro, since it is easily written as a function instead. I'm not super keen on writing it as a function, either, though; if you really want to pair the reduce with its last two args, you could write:
(-> (fn [x y]
...)
(reduce init coll))
Personally when I need a large function like this, I find that a comma actually serves as a good visual anchor, and makes it easier to tell that two forms are on that last line:
(reduce (fn [x y]
...)
init, coll)
Better still is usually to not write such a large reduce in the first place. Here you're combining at least two steps into one rather large and difficult step, by trying to find all at once the longest decreasing subsequence. Instead, try splitting the collection up into decreasing subsequences, and then take the largest one.
(defn decreasing-subsequences [xs]
(lazy-seq
(cond (empty? xs) []
(not (next xs)) (list xs)
:else (let [[x & [y :as more]] xs
remainder (decreasing-subsequences more)]
(if (> y x)
(cons [x] remainder)
(cons (cons x (first remainder)) (rest remainder)))))))
Then you can replace your reduce with:
(apply max-key count (decreasing-subsequences xs))
Now, the lazy function is not particularly shorter than your reduce, but it is doing one single thing, which means it can be understood more easily; also, it has a name (giving you a hint as to what it's supposed to do), and it can be reused in contexts where you're looking for some other property based on decreasing subsequences, not just the longest. You can even reuse it more often than that, if you replace the > in (> y x) with a function parameter, allowing you to split up into subsequences based on any predicate. Plus, as mentioned it is lazy, so you can use it in situations where a reduce of any sort would be impossible.
Speaking of ease of understanding, as you can see I misunderstood what your function is supposed to do when reading it. I'll leave as an exercise for you the task of converting this to strictly-increasing subsequences, where it looked to me like you were computing decreasing subsequences.

You don't have to use reduce or recursion to get the descending (or ascending) sequences. Here we are returning all the descending sequences in order from longest to shortest:
(def in [3 2 1 0 -1 2 7 6 7 6 5 4 3 2])
(defn descending-sequences [xs]
(->> xs
(partition 2 1)
(map (juxt (fn [[x y]] (> x y)) identity))
(partition-by first)
(filter ffirst)
(map #(let [xs' (mapcat second %)]
(take-nth 2 (cons (first xs') xs'))))
(sort-by (comp - count))))
(descending-sequences in)
;;=> ((7 6 5 4 3 2) (3 2 1 0 -1) (7 6))
(partition 2 1) gives every possible comparison and partition-by allows you to mark out the runs of continuous decreases. At this point you can already see the answer and the rest of the code is removing the baggage that is no longer needed.
If you want the ascending sequences instead then you only need to change the < to a >:
;;=> ((-1 2 7) (6 7))
If, as in the question, you only want the longest sequence then put a first as the last function call in the thread last macro. Alternatively replace the sort-by with:
(apply max-key count)
For maximum readability you can name the operations:
(defn greatest-continuous [op xs]
(let [op-pair? (fn [[x y]] (op x y))
take-every-second #(take-nth 2 (cons (first %) %))
make-canonical #(take-every-second (apply concat %))]
(->> xs
(partition 2 1)
(partition-by op-pair?)
(filter (comp op-pair? first))
(map make-canonical)
(apply max-key count))))

I feel your pain...they can be hard to read.
I see 2 possible improvements. The simplest is to write a wrapper similar to the Plumatic Plumbing defnk style:
(fnk-reduce { :fn (fn [state val] ... <new state value>)
:init []
:coll some-collection } )
so the function call has a single map arg, where each of the 3 pieces is labelled & can come in any order in the map literal.
Another possibility is to just extract the reducing fn and give it a name. This can be either internal or external to the code expression containing the reduce:
(let [glommer (fn [state value] (into state value)) ]
(reduce glommer #{} some-coll))
or possibly
(defn glommer [state value] (into state value))
(reduce glommer #{} some-coll))
As always, anything that increases clarity is preferred. If you haven't noticed already, I'm a big fan of Martin Fowler's idea of Introduce Explaining Variable refactoring. :)

I will apologize in advance for posting a longer solution to something where you wanted more brevity/clarity.
We are in the new age of clojure transducers and it appears a bit that your solution was passing the "longest" and "current" forward for record-keeping. Rather than passing that state forward, a stateful transducer would do the trick.
(def longest-decreasing
(fn [rf]
(let [longest (volatile! [])
current (volatile! [])
tail (volatile! nil)]
(fn
([] (rf))
([result] (transduce identity rf result))
([result x] (do (if (or (nil? #tail) (< x #tail))
(if (> (count (vswap! current conj (vreset! tail x)))
(count #longest))
(vreset! longest #current))
(vreset! current [(vreset! tail x)]))
#longest)))))))
Before you dismiss this approach, realize that it just gives you the right answer and you can do some different things with it:
(def coll [2 1 10 9 8 40])
(transduce longest-decreasing conj coll) ;; => [10 9 8]
(transduce longest-decreasing + coll) ;; => 27
(reductions (longest-decreasing conj) [] coll) ;; => ([] [2] [2 1] [2 1] [2 1] [10 9 8] [10 9 8])
Again, I know that this may appear longer but the potential to compose this with other transducers might be worth the effort (not sure if my airity 1 breaks that??)

I believe that iterate can be a more readable substitute for reduce. For example here is the iteratee function that iterate will use to solve this problem:
(defn step-state-hof [op]
(fn [{:keys [unprocessed current answer]}]
(let [[x y & more] unprocessed]
(let [next-current (if (op x y)
(conj current y)
[y])
next-answer (if (> (count next-current) (count answer))
next-current
answer)]
{:unprocessed (cons y more)
:current next-current
:answer next-answer}))))
current is built up until it becomes longer than answer, in which case a new answer is created. Whenever the condition op is not satisfied we start again building up a new current.
iterate itself returns an infinite sequence, so needs to be stopped when the iteratee has been called the right number of times:
(def in [3 2 1 0 -1 2 7 6 7 6 5 4 3 2])
(->> (iterate (step-state-hof >) {:unprocessed (rest in)
:current (vec (take 1 in))})
(drop (- (count in) 2))
first
:answer)
;;=> [7 6 5 4 3 2]
Often you would use a drop-while or take-while to short circuit just when the answer has been obtained. We could so that here however there is no short circuiting required as we know in advance that the inner function of step-state-hof needs to be called (- (count in) 1) times. That is one less than the count because it is processing two elements at a time. Note that first is forcing the final call.

I wanted this order for the form:
reduce
val, col
f
I was able to figure out that this technically satisfies my requirements:
> (apply reduce
(->>
[0 [1 2 3 4]]
(cons
(fn [acc x]
(+ acc x)))))
10
But it's not the easiest thing to read.
This looks much simpler:
> (defn reduce< [val col f]
(reduce f val col))
nil
> (reduce< 0 [1 2 3 4]
(fn [acc x]
(+ acc x)))
10
(< is shorthand for "parameters are rotated left"). Using reduce<, I can see what's being passed to f by the time my eyes get to the f argument, so I can just focus on reading the f implementation (which may get pretty long). Additionally, if f does get long, I no longer have to visually check the indentation of the val and col arguments to determine that they belong to the reduce symbol way farther up. I personally think this is more readable than binding f to a symbol before calling reduce, especially since fn can still accept a name for clarity.
This is a general solution, but the other answers here provide many good alternative ways to solve the specific problem I gave as an example.

Given a clojure vector, iteratively remove 1 element

I'm trying to build a set of functions to compare sentences to one another. So I wrote a function called split-to-sentences that takes an input like this:
"This is a sentence. And so is this. And this one too."
and returns:
["This is a sentence" "And so is this" "And this one too."]
What I am struggling with is how to iterate over this vector and get the items that aren't the current value. I tried nosing around with drop and remove but haven't quite figured it out.
I guess one thing I could do is use first and rest in the loop and conj the previous value to the output of rest.

(remove #{current-value} sentences-vector)

Just use filter:
(filter #(not= current-value %) sentences-vector)

I believe you may want something like this function:
(defn without-each [x]
(map (fn [i] (concat (subvec x 0 i) (subvec x (inc i))))
(range (count x))))
Use it like this:
>>> (def x ["foo" "bar" "baz"])
>>> (without-each x)
==> (("bar" "baz") ("foo" "baz") ("foo" "bar"))
The returned elements are lazily concatenated, which is why they are not vectors. This is desirable, since true vector concatenation (e.g. (into a b)) is O(n).
Because subvec uses sharing with the original sequence this should not use an excessive amount of memory.

The trick is to pass your sentences twice into the reduce function...
(def sentences ["abcd" "efg" "hijk" "lmnop" "qrs" "tuv" "wx" "y&z"])
(reduce
(fn [[prev [curr & foll]] _]
(let [aren't-current-value (concat prev foll)]
(println aren't-current-value) ;use it here
[(conj prev curr) foll]))
[[] sentences]
sentences)
...once to see the following ones, and once to iterate.

You might consider using subvec or pop because both operate very quickly on vectors.

Cleaning up Clojure function

Coming from imperative programming languages, I am trying to wrap my head around Clojure in hopes of using it for its multi-threading capability.
One of the problems from 4Clojure is to write a function that generates a list of Fibonacci numbers of length N, for N > 1. I wrote a function, but given my limited background, I would like some input on whether or not this is the best Clojure way of doing things. The code is as follows:
(fn fib [x] (cond
(= x 2) '(1 1)
:else (reverse (conj (reverse (fib (dec x))) (+ (last (fib (dec x))) (-> (fib (dec x)) reverse rest first))))
))

The most idiomatic "functional" way would probably be to create an infinite lazy sequence of fibonacci numbers and then extract the first n values, i.e.:
(take n some-infinite-fibonacci-sequence)
The following link has some very interesting ways of generating fibonnaci sequences along those lines:
http://en.wikibooks.org/wiki/Clojure_Programming/Examples/Lazy_Fibonacci
Finally here is another fun implementation to consider:
(defn fib [n]
(let [next-fib-pair (fn [[a b]] [b (+ a b)])
fib-pairs (iterate next-fib-pair [1 1])
all-fibs (map first fib-pairs)]
(take n all-fibs)))
(fib 6)
=> (1 1 2 3 5 8)
It's not as concise as it could be, but demonstrates quite nicely the use of Clojure's destructuring, lazy sequences and higher order functions to solve the problem.

Here is a version of Fibonacci that I like very much (I took the implementation from the clojure wikibook: http://en.wikibooks.org/wiki/Clojure_Programming)
(def fib-seq (lazy-cat [0 1] (map + (rest fib-seq) fib-seq)))
It works like this: Imagine you already have the infinite sequence of Fibonacci numbers. If you take the tail of the sequence and add it element-wise to the original sequence you get the (tail of the tail of the) Fibonacci sequence
0 1 1 2 3 5 8 ...
1 1 2 3 5 8 ...
-----------------
1 2 3 5 8 13 ...
thus you can use this to calculate the sequence. You need two initial elements [0 1] (or [1 1] depending on where you start the sequence) and then you just map over the two sequences adding the elements. Note that you need lazy sequences here.
I think this is the most elegant and (at least for me) mind stretching implementation.
Edit: The fib function is
(defn fib [n] (nth fib-seq n))

Here's one way of doing it that gives you a bit of exposure to lazy sequences, although it's certainly not really an optimal way of computing the Fibonacci sequence.
Given the definition of the Fibonacci sequence, we can see that it's built up by repeatedly applying the same rule to the base case of '(1 1). The Clojure function iterate sounds like it would be good for this:
user> (doc iterate)
-------------------------
clojure.core/iterate
([f x])
Returns a lazy sequence of x, (f x), (f (f x)) etc. f must be free of side-effects
So for our function we'd want something that takes the values we've computed so far, sums the two most recent, and returns a list of the new value and all the old values.
(fn [[x y & _ :as all]] (cons (+ x y) all))
The argument list here just means that x and y will be bound to the first two values from the list passed as the function's argument, a list containing all arguments after the first two will be bound to _, and the original list passed as an argument to the function can be referred to via all.
Now, iterate will return an infinite sequence of intermediate values, so for our case we'll want to wrap it in something that'll just return the value we're interested in; lazy evaluation will stop the entire infinite sequence being evaluated.
(defn fib [n]
(nth (iterate (fn [[x y & _ :as all]] (cons (+ x y) all)) '(1 1)) (- n 2)))
Note also that this returns the result in the opposite order to your implementation; it's a simple matter to fix this with reverse of course.
Edit: or indeed, as amalloy says, by using vectors:
(defn fib [n]
(nth (iterate (fn [all]
(conj all (->> all (take-last 2) (apply +)))) [1 1])
(- n 2)))

See Christophe Grand's Fibonacci solution in Programming Clojure by Stu Halloway. It is the most elegant solution I have seen.
(defn fibo [] (map first (iterate (fn [[a b]] [b (+ a b)]) [0 1])))
(take 10 (fibo))
Also see
How can I generate the Fibonacci sequence using Clojure?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

clojure split string into different size chunks - clojure

Related

How to really shuffle sequence in Clojure?

Is it bad practice to try and keep track of iterations while using reduce/map in Clojure?

How to make reduce more readable in Clojure?

Given a clojure vector, iteratively remove 1 element

Cleaning up Clojure function

Categories

Resources