I have a Seq[List[String]]. Example :
Vector(
["B","D","A","P","F"],
["B","A","F"],
["B","D","A","F"],
["B","D","A","T","F"],
["B","A","P","F"],
["B","D","A","P","F"],
["B","A","F"],
["B","A","F"],
["B","A","F"],
["B","A","F"]
)
I would like to get the count of different combinations (like "A","B") in a Map[String,Int] where the key (String) is the element combinations and value (Int) is the count of Lists having this combinations. If "A" and "B" and "F" are appearing in all the 10 records, instead of having "A", 10 and "B", 10 and "C", 10 would like to consolidate that into ""A","B","F"" , 10
Sample (not all combinations included) result for the above Seq[List[String]]
Map(
""A","B","F"" -> 10,
""A","B","D"" -> 4,
""A","B","P"" -> 2,
...
...
..
)
Would appreciate if I can be given any scala code / solution to get this output.
Assuming the data with different order counted as one group, e.g: BAF, and ABF will be in one group, the solution would be.
//define the data
val a = Seq(
List("B","D","A","P","F"),
List("B","A","F"),
List("B","D","A","F"),
List("B","D","A","T","F"),
List("B","A","P","F"),
List("B","D","A","P","F"),
List("B","A","F"),
List("B","A","F"),
List("B","A","F"),
List("A","B","F")
)
//you need to sorted so B,A,F will be counted as the same as A,B,F
//as all other data with different sequence
val b = a.map(_.sorted)
//group by identity, and then count the length
b.groupBy(identity).collect{case (x, y) => (x, y.length)}
the output will be like this:
res1: scala.collection.immutable.Map[List[String],Int] = HashMap(List(A, B, F, P) -> 1, List(A, B, D, F) -> 1, List(A, B, D, F, T) -> 1, List(A, B, D, F, P) -> 2, List(A, B, F) -> 5)
to understand more about how Scala's groupBy identity works, you can go to this post
The format of your vector is not correct scala syntax, which I think you mean something like this:
val items = Seq(
Seq("B", "D", "A", "P", "F"),
Seq("B", "A", "F"),
Seq("B", "D", "A", "F"),
Seq("B", "D", "A", "T", "F"),
Seq("B", "A", "P", "F"),
Seq("B", "D", "A", "P", "F"),
Seq("B", "A", "F"),
Seq("B", "A", "F"),
Seq("B", "A", "F"),
Seq("B", "A", "F")
)
It sounds like what you are trying to accomplish is two group by clauses. First, you would like to get all combinations from each list, then get the most frequent combinations accross the sets, get how often they occur, and then for groups that occur at the same frequency, do another group by and merge those together.
For this you will need the following function to perform the double reduction after the double groupby.
Steps:
Collect all the sequences of groups. Inside items, we calculate the total combinations of elements inside that list of items which generates a Seq[Seq[String]] of groups where the Seq[String] is a unique combination. This is flattened because the (1 to group.length) operation generates a Seq of Seq[Seq[String]]. We then flatten all the mappings together accross all the lists in the vector you have to get a Seq[Seq[String]]
The groupMapReduce function is used to calculate how often a certain combination appears, and then each combination is given a value of 1 to be summed up. This gives a frequency on how often any certain combination shows up.
The groups are grouped again, but this time by the number of occurences. So if "A" and "B" both occur 10 times, they will be grouped together.
The final map reduces the groups that were accumulated
val combos = items.flatMap(group => (1 to group.length).flatMap(i => group.combinations(i).map(_.sorted)).distinct) // Seq[Seq[String]]
.groupMapReduce(identity)(_ => 1)(_ + _) // Map[Seq[String, Int]]
.groupMapReduce(_._2)(v => Seq(v))(_ ++ _) // Map[Int, Seq[(Seq[String], Int)]]
.map { case (total, groups) => (groupReduction(groups), total)} // reduction function to determine how you want to double reduce these groups.
This double reduction function I've defined as follows. It converted a group like Seq("A","B") into ""A","B"" and then if Seq("A","B") has the same count as another group Seq("C"), then the group is concatenated together as ""A","B"","C""
def groupReduction(groups: Seq[(Seq[String], Int)]): String = {
groups.map(_._1.map(v => s"""$v""").sorted.mkString(",")).sorted.mkString(",")
}
This filter can be adjusted for particular groups of interest in the (1 to group.length) clause. If limited from 3 to 3, then the groups would be
List(List(B, D, P), List(A, D, P), List(D, F, P)): 2
List(List(A, B, F)): 10
List(List(B, D, F), List(A, D, F), List(A, B, D)): 4
List(List(A, F, P), List(B, F, P), List(A, B, P)): 3
List((List(B, D, T), List(A, F, T), List(B, F, T), List(A, D, T), List(A, B, T), List(D, F, T)): 1
As you can see in your example, `List(B, D, F)` and `List(A, D, F)` are also associated with your second line "A,B,D".
Here it is:
scala> def count(seq: Seq[Seq[String]]): Map[Seq[String], Int] =
| seq.flatMap(_.toSet.subsets.filter(_.nonEmpty)).groupMapReduce(identity)(_ => 1)(_ + _)
| .toSeq.sortBy(-_._1.size).foldLeft(Map.empty[Set[String], Int]){ case (r, (p, i)) =>
| if(r.exists{ (q, j) => i == j && p.subsetOf(q)}) r else r.updated(p, i)
| }.map{ case(k, v) => (k.toSeq, v) }
def count(seq: Seq[Seq[String]]): Map[Seq[String], Int]
scala> count(Seq(
| Seq("B", "D", "A", "P", "F"),
| Seq("B", "A", "F"),
| Seq("B", "D", "A", "F"),
| Seq("B", "D", "A", "T", "F"),
| Seq("B", "A", "P", "F"),
| Seq("B", "D", "A", "P", "F"),
| Seq("B", "A", "F"),
| Seq("B", "A", "F"),
| Seq("B", "A", "F"),
| Seq("B", "A", "F")
| ))
val res1: Map[Seq[String], Int] =
HashMap(List(F, A, B) -> 10,
List(F, A, B, P, D) -> 2,
List(T, F, A, B, D) -> 1,
List(F, A, B, D) -> 4,
List(F, A, B, P) -> 3)
As you can see, the "A, B, D" and "A, B, P" are reduced in the result, since the are subset of "ABDF" and "ABPDF" ...
I have a list of String as below:-
val a = listOf("G", "F", "E", "D", "C", "B", "A")
I will get another list from the server. For example:-
val b = listOf("A", "G", "C")
List from the server may contain fewer elements or more elements but will not contain elements other than the first list.
So, after sorting output should be like
// G, C, A
You are not trying to sort, you are trying to filter
fun filterByServer(server: List<String>, local: List<String>)
= local.filter { value -> server.contains(value) }
filter takes a predicate in this case if your local value is contained on the server list
You can use map and sorted to achieve that easily on the condition that a do not have repetition -
val a = listOf("G", "F", "E", "D", "C", "B", "A")
val b = listOf("A", "G", "C")
val there = b.map{ v -> a.indexOf(v)}.sorted().map{v -> a[v]}
println(there)
Output:: [G, C, A]
Alternate sorter way as pointed by #jsamol in comment -
val there = b.sortedBy { a.indexOf(it) }
You can create a custom comparator based on the indices of letters in the a list. Then use the List.sortedWith function to sort the b list.
e.g.
val a = listOf("G", "F", "E", "D", "C", "B", "A")
val b = listOf("A", "G", "C")
val indexedA: Map<String, Int> = a.mapIndexed { index, s -> s to index }.toMap()
val comparator = object: Comparator<String> {
override fun compare(s1: String, s2: String): Int {
val i1 = indexedA[s1]
val i2 = indexedA[s2]
return if (i1 == null || i2 == null) throw RuntimeException("unable to compare $s1 to $s2")
else i1.compareTo(i2)
}
}
val c = b.sortedWith(comparator)
System.out.println(c)
I've converted the list a to a map: Letter to Index as an optimization.
If I understand the requirement, what we're really doing here is filtering the first list, according to whether its elements are in the second. So another approach would be to do that explicitly.
val a = listOf("G", "F", "E", "D", "C", "B", "A")
val b = listOf("A", "G", "C").toSet() // or just setOf(…)
val there = a.filter{ it in b }
There's a bit more processing in creating the set, but the rest is faster and scales better, as there's no sorting or scanning, and checking for presence in a set is very fast.
(In fact, that would work fine if b were a list; but that wouldn't perform as well for big lists.)