Scala: find all chains in a list - list

Say I have a list of items:
Seq(A, B, B, B, B, G, G, S, S, S, B, A, G)
And I want to find all the chains and get a sequence of them like so:
Seq(Seq(A), Seq(B, B, B, B), Seq(G, G), Seq(S, S, S), Seq(B), Seq(A), Seq(G))
I want to maintain order, and use a custom comparison function to decide if two objects are the "same". I'm thinking a fold or a scan may be what I need, but I'm having trouble coming up with the exact case. I'm using Scala.
EDIT: I've modified the answer from that similar question to get this:
def collapse(input: Seq[Stmt]): Seq[Seq[Stmt]] = {
val (l, r) = input.span(_.getClass == input.head.getClass)
l :: collapse(r)
}

Cleaner solution:
def pack[T](input: List[T]): List[List[T]] =
input.foldRight(Nil : List[List[T]]) ((e, accu) => accu match {
case Nil => List(List(e))
case curList#(h :: t) if e == h => List(e) :: curList
case curList#(h :: t) => List(List(e)) ::: curList
})
Not using any library functions (ugly):
def pack[T](input: List[T]): List[List[T]] = {
def packWithPrevious(remaining: List[T])(previous: List[T]): List[List[T]] =
remaining match {
case List() => List(previous)
case head :: tail =>
val nextIter = packWithPrevious(tail)(_)
previous match {
case List() => nextIter(List(head))
case prevHead :: _ =>
if (head != prevHead)
previous :: nextIter(List(head))
else
nextIter(head :: previous)
}
}
packWithPrevious(input)(List())
}
scala> val s = List('A', 'B', 'B', 'B', 'B', 'G', 'G', 'S', 'S', 'S', 'B', 'A', 'G')
s: List[Char] = List(A, B, B, B, B, G, G, S, S, S, B, A, G)
scala> pack(s)
res2: List[List[Char]] = List(List(A), List(B, B, B, B), List(G, G), List(S, S, S), List(B), List(A), List(G))
Source: https://github.com/izmailoff/scala-s-99/blob/master/src/main/scala/s99/p09/P09.scala
Test: https://github.com/izmailoff/scala-s-99/blob/master/src/test/scala/s99/p09/P09Suite.scala

Similar to existing answers but I find using a partial function directly in foldLeft as a clean solution:
val s = Seq("A", "B", "B", "B", "B", "G", "G", "S", "S", "S", "B", "A", "G")
s.foldLeft(Seq[Seq[String]]()) {
case (Seq(), item) => Seq(Seq(item))
case (head::tail, item) if head.contains(item) => (item +: head) +: tail
case (seq, item) => Seq(item) +: seq
}.reverse
res0: Seq[Seq[String]] = List(List(A), List(B, B, B, B), List(G, G), List(S, S, S), List(B), List(A), List(G))

Consider following solution:
seq.foldLeft(List(List(seq.head))) { case (acc,item)=>
if(acc.head.head==item) (item::acc.head)::acc.tail else List(item)::acc
}.reverse
seq may be empty, so:
seq.foldLeft(List(seq.headOption.toList)) { case (acc,item)=>
if(acc.head.head==item) (item::acc.head)::acc.tail else List(item)::acc
}.reverse

I thought groupBy would be helpful here, but my solution got slightly awkward:
val seq = Seq("A", "B", "B", "B", "B", "G", "G", "S", "S", "S", "B", "A", "G")
val parts = {
var lastKey: Option[(Int, String)] = None
seq.groupBy(s => {
lastKey = lastKey.map((p: (Int, String)) =>
if (p._2.equalsIgnoreCase(s)) p else (p._1 + 1, s)) orElse Some((0, s))
lastKey.get
}).toSeq.sortBy(q => q._1).flatMap(q => q._2)
}
(using equalsIgnoreCase as example for a comparision function)

Related

Scala process lists of string and produce Map["Combination",Count of Lists with that combination]

I have a Seq[List[String]]. Example :
Vector(
["B","D","A","P","F"],
["B","A","F"],
["B","D","A","F"],
["B","D","A","T","F"],
["B","A","P","F"],
["B","D","A","P","F"],
["B","A","F"],
["B","A","F"],
["B","A","F"],
["B","A","F"]
)
I would like to get the count of different combinations (like "A","B") in a Map[String,Int] where the key (String) is the element combinations and value (Int) is the count of Lists having this combinations. If "A" and "B" and "F" are appearing in all the 10 records, instead of having "A", 10 and "B", 10 and "C", 10 would like to consolidate that into ""A","B","F"" , 10
Sample (not all combinations included) result for the above Seq[List[String]]
Map(
""A","B","F"" -> 10,
""A","B","D"" -> 4,
""A","B","P"" -> 2,
...
...
..
)
Would appreciate if I can be given any scala code / solution to get this output.
Assuming the data with different order counted as one group, e.g: BAF, and ABF will be in one group, the solution would be.
//define the data
val a = Seq(
List("B","D","A","P","F"),
List("B","A","F"),
List("B","D","A","F"),
List("B","D","A","T","F"),
List("B","A","P","F"),
List("B","D","A","P","F"),
List("B","A","F"),
List("B","A","F"),
List("B","A","F"),
List("A","B","F")
)
//you need to sorted so B,A,F will be counted as the same as A,B,F
//as all other data with different sequence
val b = a.map(_.sorted)
//group by identity, and then count the length
b.groupBy(identity).collect{case (x, y) => (x, y.length)}
the output will be like this:
res1: scala.collection.immutable.Map[List[String],Int] = HashMap(List(A, B, F, P) -> 1, List(A, B, D, F) -> 1, List(A, B, D, F, T) -> 1, List(A, B, D, F, P) -> 2, List(A, B, F) -> 5)
to understand more about how Scala's groupBy identity works, you can go to this post
The format of your vector is not correct scala syntax, which I think you mean something like this:
val items = Seq(
Seq("B", "D", "A", "P", "F"),
Seq("B", "A", "F"),
Seq("B", "D", "A", "F"),
Seq("B", "D", "A", "T", "F"),
Seq("B", "A", "P", "F"),
Seq("B", "D", "A", "P", "F"),
Seq("B", "A", "F"),
Seq("B", "A", "F"),
Seq("B", "A", "F"),
Seq("B", "A", "F")
)
It sounds like what you are trying to accomplish is two group by clauses. First, you would like to get all combinations from each list, then get the most frequent combinations accross the sets, get how often they occur, and then for groups that occur at the same frequency, do another group by and merge those together.
For this you will need the following function to perform the double reduction after the double groupby.
Steps:
Collect all the sequences of groups. Inside items, we calculate the total combinations of elements inside that list of items which generates a Seq[Seq[String]] of groups where the Seq[String] is a unique combination. This is flattened because the (1 to group.length) operation generates a Seq of Seq[Seq[String]]. We then flatten all the mappings together accross all the lists in the vector you have to get a Seq[Seq[String]]
The groupMapReduce function is used to calculate how often a certain combination appears, and then each combination is given a value of 1 to be summed up. This gives a frequency on how often any certain combination shows up.
The groups are grouped again, but this time by the number of occurences. So if "A" and "B" both occur 10 times, they will be grouped together.
The final map reduces the groups that were accumulated
val combos = items.flatMap(group => (1 to group.length).flatMap(i => group.combinations(i).map(_.sorted)).distinct) // Seq[Seq[String]]
.groupMapReduce(identity)(_ => 1)(_ + _) // Map[Seq[String, Int]]
.groupMapReduce(_._2)(v => Seq(v))(_ ++ _) // Map[Int, Seq[(Seq[String], Int)]]
.map { case (total, groups) => (groupReduction(groups), total)} // reduction function to determine how you want to double reduce these groups.
This double reduction function I've defined as follows. It converted a group like Seq("A","B") into ""A","B"" and then if Seq("A","B") has the same count as another group Seq("C"), then the group is concatenated together as ""A","B"","C""
def groupReduction(groups: Seq[(Seq[String], Int)]): String = {
groups.map(_._1.map(v => s"""$v""").sorted.mkString(",")).sorted.mkString(",")
}
This filter can be adjusted for particular groups of interest in the (1 to group.length) clause. If limited from 3 to 3, then the groups would be
List(List(B, D, P), List(A, D, P), List(D, F, P)): 2
List(List(A, B, F)): 10
List(List(B, D, F), List(A, D, F), List(A, B, D)): 4
List(List(A, F, P), List(B, F, P), List(A, B, P)): 3
List((List(B, D, T), List(A, F, T), List(B, F, T), List(A, D, T), List(A, B, T), List(D, F, T)): 1
As you can see in your example, `List(B, D, F)` and `List(A, D, F)` are also associated with your second line "A,B,D".
Here it is:
scala> def count(seq: Seq[Seq[String]]): Map[Seq[String], Int] =
| seq.flatMap(_.toSet.subsets.filter(_.nonEmpty)).groupMapReduce(identity)(_ => 1)(_ + _)
| .toSeq.sortBy(-_._1.size).foldLeft(Map.empty[Set[String], Int]){ case (r, (p, i)) =>
| if(r.exists{ (q, j) => i == j && p.subsetOf(q)}) r else r.updated(p, i)
| }.map{ case(k, v) => (k.toSeq, v) }
def count(seq: Seq[Seq[String]]): Map[Seq[String], Int]
scala> count(Seq(
| Seq("B", "D", "A", "P", "F"),
| Seq("B", "A", "F"),
| Seq("B", "D", "A", "F"),
| Seq("B", "D", "A", "T", "F"),
| Seq("B", "A", "P", "F"),
| Seq("B", "D", "A", "P", "F"),
| Seq("B", "A", "F"),
| Seq("B", "A", "F"),
| Seq("B", "A", "F"),
| Seq("B", "A", "F")
| ))
val res1: Map[Seq[String], Int] =
HashMap(List(F, A, B) -> 10,
List(F, A, B, P, D) -> 2,
List(T, F, A, B, D) -> 1,
List(F, A, B, D) -> 4,
List(F, A, B, P) -> 3)
As you can see, the "A, B, D" and "A, B, P" are reduced in the result, since the are subset of "ABDF" and "ABPDF" ...

Split the list when duplicate is found scala

I have a list of elements in Scala and I am looking for a way to split the list when a duplicate is found.
For example: List(x,y,z,e,r,y,g,a) would be converted to List(List(x,y,z,e,r),List(y,g,a))
or List(x,y,z,x,y,z) to List(x,y,z), List(x,y,z)
and List(x,y,z,y,g,x) to List(x,y,z), List(y,g,x)
Is there a more efficient way than iterating and and cheking for every element separately?
Quick and dirty O(n) using O(n) additional memory:
import scala.collection.mutable.HashSet
import scala.collection.mutable.ListBuffer
val list = List("x", "y", "z", "e", "r", "y", "g", "a", "x", "m", "z")
var result = new ListBuffer[ListBuffer[String]]()
var partition = new ListBuffer[String]()
list.foreach { i =>
if (partition.contains(i)) {
result += partition
partition = new ListBuffer[String]()
}
partition += i
}
if (partition.nonEmpty) {
result += partition
}
result
ListBuffer(ListBuffer(x, y, z, e, r), ListBuffer(y, g, a, x, m, z))
This solution comes with a few caveats:
I'm not making a claim as to 'performance', though I think it's better than O(n^2), which is the brute-force.
This is assuming you are splitting when you find a duplicate, where 'duplicate' means 'something that exists in the previous split'. I cheat a little by only checking the last segment. The reason is that I think it clarifies how to use foldLeft a little, which is a natural way to go about this.
Everything here is reversed, but maintains order. This can be easily corrected, but adds an additional O(n) (cumulative) call, and may not actually be needed (depending on what you're doing with it).
Here is the code:
def partition(ls: List[String]): List[ListSet[String]] = {
ls.foldLeft(List(ListSet.empty[String]))((partitionedLists, elem:String) => {
if(partitionedLists.head.contains(elem)) {
ListSet(elem) :: partitionedLists
} else {
(partitionedLists.head + elem) :: partitionedLists.tail
}
})
}
partition(List("x","y","z","e","r","y","g","a"))
// res0: List[scala.collection.immutable.ListSet[String]] = List(ListSet(r, e, z, y, x), ListSet(a, g, y))
I'm using ListSet to get both the benefits of a Set and ordering, which is appropriate to your use case.
foldLeft is a function that takes an accumulator value (in this case the List(ListSet.empty[String])) and modifies it as it moves through the elements of your collection. If we structure that accumulator, as done here, to be a list of segments, by the time we're done it will have all the ordered segments of the original list.
One statement tail-recursive version (but not very efficient because of the contains on the list)
var xs = List('x','y','z','e','r','y','g','a')
def splitAtDuplicates[A](splits: List[List[A]], right: List[A]): List[List[A]] =
if (right.isEmpty)// done
splits.map(_.reverse).reverse
else if (splits.head contains right.head) // need to split here
splitAtDuplicates(List()::splits, right)
else // continue building current sublist
splitAtDuplicates((right.head :: splits.head)::splits.tail, right.tail)
Speed it up with a Set to track what we've seen so far:
def splitAtDuplicatesOptimised[A](seen: Set[A],
splits: List[List[A]],
right: List[A]): List[List[A]] =
if (right.isEmpty)
splits.map(_.reverse).reverse
else if (seen(right.head))
splitAtDuplicatesOptimised(Set(), List() :: splits, right)
else
splitAtDuplicatesOptimised(seen + right.head,
(right.head :: splits.head) :: splits.tail,
right.tail)
You will basically need to iterate with a look-up table. I can provide help with the follwoing immutable and functional tailrec implementation.
import scala.collection.immutable.HashSet
import scala.annotation.tailrec
val list = List("x","y","z","e","r","y","g","a", "x", "m", "z", "ll")
def splitListOnDups[A](list: List[A]): List[List[A]] = {
#tailrec
def _split(list: List[A], cList: List[A], hashSet: HashSet[A], lists: List[List[A]]): List[List[A]] = {
list match {
case a :: Nil if hashSet.contains(a) => List(a) +: (cList +: lists)
case a :: Nil => (a +: cList) +: lists
case a :: tail if hashSet.contains(a) => _split(tail, List(a), hashSet, cList +: lists)
case a :: tail => _split(tail, a +: cList, hashSet + a, lists)
}
}
_split(list, List[A](), HashSet[A](), List[List[A]]()).reverse.map(_.reverse)
}
def splitListOnDups2[A](list: List[A]): List[List[A]] = {
#tailrec
def _split(list: List[A], cList: List[A], hashSet: HashSet[A], lists: List[List[A]]): List[List[A]] = {
list match {
case a :: Nil if hashSet.contains(a) => List(a) +: (cList +: lists)
case a :: Nil => (a +: cList) +: lists
case a :: tail if hashSet.contains(a) => _split(tail, List(a), HashSet[A](), cList +: lists)
case a :: tail => _split(tail, a +: cList, hashSet + a, lists)
}
}
_split(list, List[A](), HashSet[A](), List[List[A]]()).reverse.map(_.reverse)
}
splitListOnDups(list)
// List[List[String]] = List(List(x, y, z, e, r), List(y, g, a), List(x, m), List(z, ll))
splitListOnDups2(list)
// List[List[String]] = List(List(x, y, z, e, r), List(y, g, a, x, m, z, ll))

How to sum/combine each value in List of List in scala

Given the following Scala List:
val l = List(List("a1", "b1", "c1"), List("a2", "b2", "c2"), List("a3", "b3", "c3"))
How can I get:
List("a1a2a3","b1b2b3","c1c2c3")
is it possible to use zipped.map(_ + _) on list that have more than two lists ? or there are any other way to solve this?
You can use the .transpose method :
scala> val l = List(List("a1", "b1", "c1"), List("a2", "b2", "c2"), List("a3", "b3", "c3"))
l: List[List[String]] = List(List(a1, b1, c1), List(a2, b2, c2), List(a3, b3, c3))
scala> l.transpose
res0: List[List[String]] = List(List(a1, a2, a3), List(b1, b2, b3), List(c1, c2, c3))
and then map over the outer list, creating each String using mkString :
scala> l.transpose.map(_.mkString)
res1: List[String] = List(a1a2a3, b1b2b3, c1c2c3)
other solution
scala> val l = List(List("a1", "b1", "c1"), List("a2", "b2", "c2"), List("a3", "b3", "c3"))
scala> l.reduce[List[String]]{ case (acc, current) => acc zip current map { case (a, b) => a + b } }
res2: List[String] = List(a1a2a3, b1b2b3, c1c2c3)

Scala - Combine two lists in an alternating fashion

How do I merge 2 lists in such a way that the resulting list contains the elements of 2 lists in alternating fashion in Scala.
Input:
val list1 = List("Mary", "a", "lamb")
val list2 = List("had", "little")
Output:
List("Mary", "had", "a", "little", "lamb")
What you're looking for is usually called "intersperse" or "intercalate" and there are a few ways to do it:
def intersperse[A](a : List[A], b : List[A]): List[A] = a match {
case first :: rest => first :: intersperse(b, rest)
case _ => b
}
You can also use scalaz
import scalaz._
import Scalaz._
val lst1 = ...
val lst2 = ...
lst1 intercalate lst2
Edit: You can also do the following:
lst1.zipAll(lst2,"","") flatMap { case (a, b) => Seq(a, b) }
Come to think of it, I believe the last solution is my favorite since it's most concise while still being clear. If you're already using Scalaz, I'd use the second solution. The first is also very readable however.
And just to make this answer more complete, adding #Travis Brown's solution that is generic:
list1.map(List(_)).zipAll(list2.map(List(_)), Nil, Nil).flatMap(Function.tupled(_ ::: _))
val list1 = List("Mary", "a", "lamb")
val list2 = List("had", "little")
def merge1(list1: List[String], list2: List[String]): List[String] = {
if (list1.isEmpty) list2
else list1.head :: merge(list2, list1.tail)
}
def merge2(list1: List[String], list2: List[String]): List[String] = list1 match {
case List() => list2
case head :: tail => head :: merge(list2, tail)
}
merge1(list1, list2)
merge2(list1, list2)
//> List[String] = List(Mary, had, a, little, lamb)
list1.zipAll(list2,"","").flatMap(_.productIterator.toList).filter(_ != "")
You could do something like this:
def alternate[A]( a: List[A], b: List[A] ): List[A] = {
def first( a: List[A], b: List[A] ): List[A] = a match {
case Nil => Nil
case x :: xs => x :: second( xs, b )
}
def second( a: List[A], b: List[A] ): List[A] = b match {
case Nil => Nil
case y :: ys => y :: first( a, ys )
}
first( a, b )
}

How to merge tuples by same elements in Scala

For example, if I have the following tuples:
(1, "a", "l")
(1, "a", "m")
(1, "a", "n")
I want to merge them like this:
(1, "a", List("l", "m", "n"))
In my case, the lists are a result from an inner join using Slick.
So, the first and second elements (1 and "a") should be the same.
If somebody knows how to merge like that in case of using Slick, let me know please.
Or more generally, the way to merge tuples with inner lists by the same elements.
(1, "a", "l")
(1, "a", "m")
(1, "b", "n")
(1, "b", "o")
// to like this
List( (1, "a", List("l", "m")), (1, "b", List("n", "o")) )
How about:
val l = ??? // Your list
val groups = l groupBy { case (a, b, c) => (a,b) }
val tups = groups map { case ((a,b), l) => (a,b,l.map(_._3)) }
tups.toList
You could try foldRight
val l = List((1, "a", "l"), (1, "a", "m"), (1, "a", "n"), (1, "b", "n"), (1, "b", "o"))
val exp = List((1, "a", List("l", "m", "n")), (1, "b", List("n", "o")))
val result = l.foldRight(List.empty[(Int, String, List[String])]) {
(x, acc) =>
val (n, s1, s2) = x
acc match {
case (n_, s1_, l_) :: t if (n == n_ && s1 == s1_) =>
(n_, s1_, (s2 :: l_)) :: t
case _ =>
(n, s1, List(s2)) :: acc
}
}
println(result)
println(result == exp)
Update
If the input list is not sorted:
val result = l.sorted.foldRight(...)