Removing duplicates in clojure

Removing duplicates in clojure - clojure

I have a sequence and trying to remove duplicates
case 1:
(vec (into #{} [1 1 2 2 3 3])) ; => [1 2 3]
case 2:
(distinct [1 1 2 2 3 3]) ; => [1 2 3]
Both cases are bringing the results, so which one is better to use?
What's the difference between on those?

As for differences, jas covered most of them in his comment:
distinct is lazy
distinct with no arguments is a transducer
distinct maintains order
As for which one is preferred, distinct is for the above reasons, but also because it explains what you need. I forget which Lisp book talked about this (might have been Let Over Lambda), but when given the choice between two similar functions, prefer the one that's more specific. distinct explains that you want distinct elements. (into #{} xs) might produce distinct values, but it also allows your intent to be misunderstood. Someone could easily think you wanted a set for some reason. distinct narrows down the why.

Related

Clojure Core function argument positions seem rather confusing. What's the logic behind it?

For me as, a new Clojurian, some core functions seem rather counter-intuitive and confusing when it comes to arguments order/position, here's an example:
> (nthrest (range 10) 5)
=> (5 6 7 8 9)
> (take-last 5 (range 10))
=> (5 6 7 8 9)
Perhaps there is some rule/logic behind it that I don't see yet?
I refuse to believe that the Clojure core team made so many brilliant technical decisions and forgot about consistency in function naming/argument ordering.
Or should I just remember it as it is?
Thanks
Slightly offtopic:
rand&rand-int VS random-sample - another example where function naming seems inconsistent but that's a rather rarely used function so it's not a big deal.

There is an FAQ on Clojure.org for this question: https://clojure.org/guides/faq#arg_order
What are the rules of thumb for arg order in core functions?
Primary collection operands come first. That way one can write → and its ilk, and their position is independent of whether or not they have variable arity parameters. There is a tradition of this in OO languages and Common Lisp (slot-value, aref, elt).
One way to think about sequences is that they are read from the left, and fed from the right:
<- [1 2 3 4]
Most of the sequence functions consume and produce sequences. So one way to visualize that is as a chain:
map <- filter <- [1 2 3 4]
and one way to think about many of the seq functions is that they are parameterized in some way:
(map f) <- (filter pred) <- [1 2 3 4]
So, sequence functions take their source(s) last, and any other parameters before them, and partial allows for direct parameterization as above. There is a tradition of this in functional languages and Lisps.
Note that this is not the same as taking the primary operand last. Some sequence functions have more than one source (concat, interleave). When sequence functions are variadic, it is usually in their sources.
Adapted from comments by Rich Hickey.

Functions that work with seqs usually has the actual seq as last argument.
(map, filter, remote etc.)
Accessing and "changing" individual elements takes a collection as first element: conj, assoc, get, update
That way, you can use the (->>) macro with a collection consistenly,
as well as create transducers consistently.
Only rarely one has to resort to (as->) to change argument order. And if you have to do so, it might be an opportunity to check if your own functions follow that convention.

For some functions (especially functions that are "seq in, seq out"), the args are ordered so that one can use partial as follows:
(ns tst.demo.core
(:use tupelo.core tupelo.test))
(dotest
(let [dozen (range 12)
odds-1 (filterv odd? dozen)
filter-odd (partial filterv odd?)
odds-2 (filter-odd dozen) ]
(is= odds-1 odds-2
[1 3 5 7 9 11])))
For other functions, Clojure often follows the ordering of "biggest-first", or "most-important-first" (usually these have the same result). Thus, we see examples like:
(get <map> <key>)
(get <map> <key> <default-val>)
This also shows that any optional values must, by definition, be last (in order to use "rest" args). This is common in most languages (e.g. Java).
For the record, I really dislike using partial functions, since they have user-defined names (at best) or are used inline (more common). Consider this code:
(let [dozen (range 12)
odds (filterv odd? dozen)
evens-1 (mapv (partial + 1) odds)
evens-2 (mapv #(+ 1 %) odds)
add-1 (fn [arg] (+ 1 arg))
evens-3 (mapv add-1 odds)]
(is= evens-1 evens-2 evens-3
[2 4 6 8 10 12]))
Also
I personally find it really annoying trying to parse out code using partial as with evens-1, especially for the case of user-defined functions, or even standard functions that are not as simple as +.
This is especially so if the partial is used with 2 or more args.
For the 1-arg case, the function literal seen for evens-2 is much more readable to me.
If 2 or more args are present, please make a named function (either local, as shown for evens-3), or a regular (defn some-fn ...) global function.

What syntax core.logic matche, defne pattern matching constructs use?

Some of core.logic constructs (matcha, matche, matchu, defne, fne) use pattern matching expressions as body and can be used such as:
(run* [q]
(fresh [a o]
(== a [1 2 3 4 5])
(matche [a]
([ [1 2 . [3 4 5] ]]
(== q "first"))
([ [1 2 3 . [4 5] ]]
(== q "second"))
([ [1 . _] ]
(== q "third")))))
;=> ("first"
; "second"
; "third")
(example from Logic-Starter wiki)
But I can't find specification of syntax for pattern matching in core.logic documentation. What is this syntax? Maybe I can find it in some minikanren docs or books?
What is difference between matched variables prefixed with ? and without it?
Is there any other destructing constructs in addition to lists with . (similar to & in clojure)?
Will [_ _] match only sequences with two elements?
Is it possible to destruct maps?

I'll do my best to answer here. Intel from Ambrose Bonnaire-Sergeant's notes, which is the only place I could find that had any real documentation on the subject. My suspicion is that a lot of the syntax can be found buried in the research papers upon which core.logic is based, but as those are 270-page dissertations I didn't think they'd make a great reference.
What is difference between matched variables prefixed with ? and without it?
Variables prefixed with ? are implicitly declared instead of needing to be declared as an argument to fresh. In all other regards they behave the same.
Is there any other destructing constructs in addition to lists with . (similar to & in clojure)?
No, there are no other missing pieces of magical syntax for destructuring.
Will [_ _] match only sequences with two elements?
Yes.
Is it possible to destruct maps?
Not really. There's a good long-form article on the subject you can read here, but essentially maps and map-like structures have not historically been a subject of focus for the solvers that provide the theoretical underpinnings for core.logic and its ilk. If you're interested in logic solving on maps, probably the best tool you have at your disposal is featurec. To quote the documentation:
(featurec x fs)
Ensure that a map contains at least the key-value
pairs in the map fs. fs must be partially instantiated - that is, it
may contain values which are logic variables to support feature
extraction.

Is there a more idiomatic way to get N random elements of a collection in Clojure?

I’m currrently doing this: (repeatedly n #(rand-nth (seq coll))) but I suspect there might be a more idiomatic way, for 2 reasons:
I’ve found that there’s frequently a more concise and expressive alternative to using short anonymous functions, e.g. partial
the docstring for repeatedly says “presumably with side effects”, implying that it’s not intended to be used to produce values
I suppose I could figure out a way to use reduce but that seems like it would be tricky and less efficient, as it would have to process the entire collection, since reduce is not lazy.

An easy solution but not optimal for big collections could be:
(take n (shuffle coll))
Has the "advantage" of not repeating elements. Also you could implement a lazy-shuffle but it will involve more code.

I know it's not exactly what you're asking - but if you're doing a lot of sampling and statistical work, you might be interested in Incanter ([incanter "1.5.2"]).
Incanter provides the function sample, which provides options for sample size, and replacement.
(require '[incanter.stats :refer [sample]]))
(sample [1 2 3 4 5 6 7] :size 5 :replacement false)
; => (1 5 6 2 7)

Can somebody explain the behavior of "conj"?

(conj (drop-last "abcde") (last "abcde"))
returns (\e \a \b \c \d)
I am confusing. In the doc of conj, I notice
The 'addition' may happen at different 'places' depending on the concrete type.
Does it mean that for LazySeq, the place to add the new item is the head?
How can I get (\a \b \c \d \e) as the result?

'The 'addition' may happen at different 'places' depending on the
concrete type.'
This refers to the behavior of Clojure's persistent collections that incorporate the addition in the most efficient way with respect to performance and the underlying implementation.
Vectors always add to the end of the collection:
user=> (conj [1 2 3] 4)
[1 2 3 4]
With Lists, conj puts the item at the front of the list, as you've noticed:
user=> (conj '(1 2 3) 4)
(4 1 2 3)
So, yes, a LazySeq is treated like a List with respect to its concrete implementation.
How can I get (\a \b \c \d \e) as the result?
There's a number of ways, but you could easily create a vector from your LazySeq:
(conj (vec (drop-last "abcde"))
(last "abcde"))

It is important to realize that conj simply delegates to the implementation of cons on the IPersistentCollection interface in Clojure's Java stuff. Therefore, depending on the given data structure being dealt with it can behave differently.
The intent behind conj is that it will always add an item to the data structure in the way that is most efficient.
For lists the most efficient spot to put it is the front. For vectors the most efficient spot to put it is at the end.

What's the one-level sequence flattening function in Clojure?

What's the one-level sequence flattening function in Clojure? I am using apply concat for now, but I wonder if there is a built-in function for that, either in standard library or clojure-contrib.

My general first choice is apply concat. Also, don't overlook (for [subcoll coll, item subcoll] item) -- depending on the broader context, this may result in clearer code.

There's no standard function. apply concat is a good solution in many cases. Or you can equivalently use mapcat seq.
The problem with apply concat is that it fails when there is anything other than a collection/sequential is at the first level:
(apply concat [1 [2 3] [4 [5]]])
=> IllegalArgumentException Don't know how to create ISeq from: java.lang.Long...
Hence you may want to do something like:
(defn flatten-one-level [coll]
(mapcat #(if (sequential? %) % [%]) coll))
(flatten-one-level [1 [2 3] [4 [5]]])
=> (1 2 3 4 [5])
As a more general point, the lack of a built-in function should not usually stop you from defining your own :-)

i use apply concat too - i don't think there's anything else in the core.
flatten is multiple levels (and is defined via a tree-walk, not in terms of repeated single level expansion)
see also Clojure: Semi-Flattening a nested Sequence which has a flatten-1 from clojure mvc (and which is much more complex than i expected).
update to clarify laziness:
user=> (take 3 (apply concat (for [i (range 1e6)] (do (print i) [i]))))
012345678910111213141516171819202122232425262728293031(0 1 2)
you can see that it evaluates the argument 32 times - this is chunking for efficiency, and is otherwise lazy (it doesn't evaluate the whole list). for a discussion of chunking see comments at end of http://isti.bitbucket.org/2012/04/01/pipes-clojure-choco-1.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Removing duplicates in clojure - clojure

I have a sequence and trying to remove duplicates case 1: (vec (into #{} [1 1 2 2 3 3])) ; => [1 2 3] case 2: (distinct [1 1 2 2 3 3]) ; => [1 2 3] Both cases are bringing the results, so which one is better to use? What's the difference between on those?

Related

Clojure Core function argument positions seem rather confusing. What's the logic behind it?

What syntax core.logic matche, defne pattern matching constructs use?

Is there a more idiomatic way to get N random elements of a collection in Clojure?

Can somebody explain the behavior of "conj"?

What's the one-level sequence flattening function in Clojure?

Categories

Resources