Cleanest way to map 2 Keys to the same value in Scala - regex

I have a map where there are two kinds of strings that have to map to the same value. For example, the key either has to be exactly "Test 1" or exactly "Test 1 Extra" and they are both mapped to the value 1
val result = Map(
"Test 1" -> 1,
"Test 1 Extra" -> 1,
"Test 2" -> 2,
"Test 2 Extra" -> 2,
"Test 3" -> 3,
"Test 3 Extra" -> 3
)
With the above, it's a bit unwieldy especially if there were more similar key value pairs. I can imagine there's an easier way to do this, possibly with Regular Expressions to take into account the string Extra that may be in the input?

Just as an inspiration:
case class DoubleKey[X](x1: X, x2: X) {
def ->[A](a: A): List[(X, A)] = List(x1, x2).map((_, a))
}
def twice(extra: String)(s: String) = DoubleKey(s, s + extra)
val withExtra = twice(" extra") _
val res = List(
withExtra("Test 1") -> 1,
withExtra("Test 2") -> 2,
withExtra("Test 3") -> 3
).flatten.toMap
println(res)
No regular expressions needed.

Related

How do I group items of sorted stream with SubFlows?

Could you guys explain how to use new groupBy in akka-streams ? Documentation seems to be quite useless. groupBy used to return (T, Source) but not anymore. Here is my example (I mimicked one from docs):
Source(List(
1 -> "1a", 1 -> "1b", 1 -> "1c",
2 -> "2a", 2 -> "2b",
3 -> "3a", 3 -> "3b", 3 -> "3c",
4 -> "4a",
5 -> "5a", 5 -> "5b", 5 -> "5c",
6 -> "6a", 6 -> "6b",
7 -> "7a",
8 -> "8a", 8 -> "8b",
9 -> "9a", 9 -> "9b",
))
.groupBy(3, _._1)
.map { case (aid, raw) =>
aid -> List(raw)
}
.reduce[(Int, List[String])] { case (l: (Int, List[String]), r: (Int, List[String])) =>
(l._1, l._2 ::: r._2)
}
.mergeSubstreams
.runForeach { case (aid: Int, items: List[String]) =>
println(s"$aid - ${items.length}")
}
This simply hangs. Perhaps it hangs because number of substreams is lower than number of unique keys. But what should I do if I have infinite stream ? I'd like to group until key changes.
In my real stream data is always sorted by value I'm grouping by. Perhaps I don't need groupBy at all ?
A year later, Akka Stream Contrib has a AccumulateWhileUnchanged class that does this:
libraryDependencies += "com.typesafe.akka" %% "akka-stream-contrib" % "0.9"
and:
import akka.stream.contrib.AccumulateWhileUnchanged
source.via(new AccumulateWhileUnchanged(_._1))
You could also achieve it using statefulMapConcat which will be a bit less expensive given that it does not do any sub-materialisations (but you have to live with the shame of using vars):
source.statefulMapConcat { () =>
var prevKey: Option[Int] = None
var acc: List[String] = Nil
{ case (newKey, str) =>
prevKey match {
case Some(`newKey`) | None =>
prevKey = Some(newKey)
acc = str :: acc
Nil
case Some(oldKey) =>
val accForOldKey = acc.reverse
prevKey = Some(newKey)
acc = str :: Nil
(oldKey -> accForOldKey) :: Nil
}
}
}.runForeach(println)
If your stream data is always sorted, you can leverage it for grouping this way:
val source = Source(List(
1 -> "1a", 1 -> "1b", 1 -> "1c",
2 -> "2a", 2 -> "2b",
3 -> "3a", 3 -> "3b", 3 -> "3c",
4 -> "4a",
5 -> "5a", 5 -> "5b", 5 -> "5c",
6 -> "6a", 6 -> "6b",
7 -> "7a",
8 -> "8a", 8 -> "8b",
9 -> "9a", 9 -> "9b",
))
source
// group elements by pairs
// the last one will be not a pair, but a single element
.sliding(2,1)
// when both keys in a pair are different, we split the group into a subflow
.splitAfter(pair => (pair.headOption, pair.lastOption) match {
case (Some((key1, _)), Some((key2, _))) => key1 != key2
})
// then we cut only the first element of the pair
// to reconstruct the original stream, but grouped by sorted key
.mapConcat(_.headOption.toList)
// then we fold the substream into a single element
.fold(0 -> List.empty[String]) {
case ((_, values), (key, value)) => key -> (value +: values)
}
// merge it back and dump the results
.mergeSubstreams
.runWith(Sink.foreach(println))
At the end you'll get these results:
(1,List(1c, 1b, 1a))
(2,List(2b, 2a))
(3,List(3c, 3b, 3a))
(4,List(4a))
(5,List(5c, 5b, 5a))
(6,List(6b, 6a))
(7,List(7a))
(8,List(8b, 8a))
(9,List(9a))
But compared to groupBy, you're not limited by the number of distinct keys.
I ended up implementing custom stage
class GroupAfterKeyChangeStage[K, T](keyForItem: T ⇒ K, maxBufferSize: Int) extends GraphStage[FlowShape[T, List[T]]] {
private val in = Inlet[T]("GroupAfterKeyChangeStage.in")
private val out = Outlet[List[T]]("GroupAfterKeyChangeStage.out")
override val shape: FlowShape[T, List[T]] =
FlowShape(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) with InHandler with OutHandler {
private val buffer = new ListBuffer[T]
private var currentKey: Option[K] = None
// InHandler
override def onPush(): Unit = {
val nextItem = grab(in)
val nextItemKey = keyForItem(nextItem)
if (currentKey.forall(_ == nextItemKey)) {
if (currentKey.isEmpty)
currentKey = Some(nextItemKey)
if (buffer.size == maxBufferSize)
failStage(new RuntimeException(s"Maximum buffer size is exceeded on key $nextItemKey"))
else {
buffer += nextItem
pull(in)
}
} else {
val result = buffer.result()
buffer.clear()
buffer += nextItem
currentKey = Some(nextItemKey)
push(out, result)
}
}
// OutHandler
override def onPull(): Unit = {
if (isClosed(in))
failStage(new RuntimeException("Upstream finished but there was a truncated final frame in the buffer"))
else
pull(in)
}
// InHandler
override def onUpstreamFinish(): Unit = {
val result = buffer.result()
if (result.nonEmpty) {
emit(out, result)
completeStage()
} else
completeStage()
// else swallow the termination and wait for pull
}
override def postStop(): Unit = {
buffer.clear()
}
setHandlers(in, out, this)
}
}
If you don't want to copy-paste it I've added it to helper library that I maintain. In order to use you need to add
Resolver.bintrayRepo("cppexpert", "maven")
to your resolvers. Add add foolowingto your dependencies
"com.walkmind" %% "scala-tricks" % "2.15"
It's implemented in com.walkmind.akkastream.FlowExt as flow
def groupSortedByKey[K, T](keyForItem: T ⇒ K, maxBufferSize: Int): Flow[T, List[T], NotUsed]
For my example it would be
source
.via(FlowExt.groupSortedByKey(_._1, 128))

Scala Summing Values of a list of Tuples in style (String,Int) based on _._1

I create a list of tuples via taking a String name and matching it to an accompanying int value.
I want to be able to sum those int values in the tuple in the case that there are multiple strings of the same name. My current approach follows this utilization of groupby which if I understand right is returning me a Map with keys based upon _ . _ 1 and list of values:
def mostPopular(data: List[List[String]]): (String, Int) = {
//take the data and create a list[(String,Int)]
val nameSums = data.map(x => x(1) -> x(2).toInt)
//sum the values in _._2 based on same elements in _._1
val grouped = nameSums.groupBy(_._1).foldLeft(0)(_+_._2)
}
I've seen other solution that have dealt with averaging different values of tuples but they haven't explained how to sum values that fall under the same name
In your case value (see below code snippet) is a list of (String, Int) do value.map(_._2).sum or value.foldLeft(0)((r, c) => r + (c._2))
nameSums.groupBy(_._1).map { case (key, value) => key -> (value.map(_._2)).sum}
Scala REPL
scala> val nameSums = List(("apple", 10), ("ball", 20), ("apple", 20), ("cat", 100))
nameSums: List[(String, Int)] = List((apple,10), (ball,20), (apple,20), (cat,100))
scala> nameSums.groupBy(_._1).map { case (key, value) => key -> (value.map(_._2)).sum}
res15: scala.collection.immutable.Map[String,Int] = Map(cat -> 100, apple -> 30, ball -> 20)

How to get the 1st element of List which is itself a member of Map of type Map[int, Any]

Example:
var a = Map(1 -> List(7,8,9), 2 -> 15)
The type of variable a is scala.collection.immutable.Map[Int,Any].
I want to get the 1st element of the List(7, 8, 9).
a(1)(0) gives me an error : Any does not take parameters.
How can I typecast Any into List?
Please help.
Similar to #EndeNeu yet covering the case of empty lists, where for well-defining the problem we assume value 0,
a.collect {
case (i, Nil) => (i, 0)
case (i, x::_) => (i, x)
case p # (_, _) => p
}
Note # binds the tuple to p so that in the partial mapping we need not replicate the entire tuple.
Using collect should work:
scala> var a = Map(1 -> List(7,8,9), 2 -> 15)
a: scala.collection.immutable.Map[Int,Any] = Map(1 -> List(7, 8, 9), 2 -> 15)
scala>
| a collect {
| case (i: Int, l: List[_]) if l.nonEmpty => (i, l.head)
| case (i: Int, j: Int) => (i, j)
| }
res1: scala.collection.immutable.Map[Int,Any] = Map(1 -> 7, 2 -> 15)
But I'd warn you against getting a collection with an Any in it, you lost all the type safety and that collect could not fetch what you want because the match is not exhaustive, I would review my approach if I were you, maybe using two collections depending on your application logic.

Scalding convert one row into multiple

So, I have a scalding pipe that contains entries of the form
(String, Map[String, Int]). I need to convert each instance of this row into multiple rows. That is, if I had
( "Type A", ["a1" -> 2, "a2" ->2, "a3" -> 3] )
I need as output 3 rows
("Type A", "a1", 2)
("Type A", "a2", 2)
("Type A", "a3", 3)
Its the inverse of the groupBy operation essentially I guess. Does anyone know of a way to do this?
You can use flatmap, like so:
class TestJob(args: Args) extends Job(args)
{
val inputPipe: TypedPipe[Input]
val out: TypedPipe[(String, String, Int)]= inputPipe.flatMap { rec =>
rec.map.map{pair => (rec.kind, pair._1, pair._2)}
}
}
case class Input(kind: String, map: Map[String, Int])

How to convert a List(List[String]) into a Map[String, Int]?

I have a List(List("aba, 4"), List("baa, 2"))and I want to convert it into a map:
val map : Map[String, Int] = Map("aba" -> 4, "baa" -> 2)
What's the best way to archive this?
UPDATE:
I do a database query to retrieve the data:
val (_, myData) = DB.runQuery(...)
This returns a Pair but I'm only interested in the second part, which gives me:
myData: List[List[String]] = List(List(Hello, 19), List(World, 14), List(Foo, 13), List(Bar, 13), List(Bar, 12), List(Baz, 12), List(Baz, 11), ...)
scala> val pat = """\((.*),\s*(.*)\)""".r
pat: scala.util.matching.Regex = \((.*),\s*(.*)\)
scala> list.flatten.map{case pat(k, v) => k -> v.toInt }.toMap
res1: scala.collection.immutable.Map[String,Int] = Map(aba -> 4, baa -> 2)
Yet another take:
List(List("aba, 4"), List("baa, 2")).
flatten.par.collect(
_.split(",").toList match {
case k :: v :: Nil => (k, v.trim.toInt)
}).toMap
Differences to the other answers:
uses .par to parallelize the creation of the pairs, which allows us to profit from multiple cores.
uses collect with a PartialFunction to ignore strings that are not of the form "key, value"
Edit: .par does not destroy the order as the answer state previously. There is only no guarantee for the execution order of the list processing, so the functions should be side-effect free (or side-effects shouldn't care about the ordering).
My take:
List(List("aba, 4"), List("baa, 2")) map {_.head} map {itemList => itemList split ",\\s*"} map {itemArr => (itemArr(0), itemArr(1).toInt)} toMap
In steps:
List(List("aba, 4"), List("baa, 2")).
map(_.head). //List("aba, 4", "baa, 2")
map(itemList => itemList split ",\\s*"). //List(Array("aba", "4"), Array("baa", "2"))
map(itemArr => (itemArr(0), itemArr(1).toInt)). //List(("aba", 4), ("baa", 2))
toMap //Map("aba" -> 4, "baa" -> 2)
Your input data structure is a bit awkward so I don't think you can optimize it/shorten it any further.
List(List("aba, 4"), List("baa, 2")).
flatten. //get rid of those weird inner Lists
map {s=>
//split into key and value
//Array extractor guarantees we get exactly 2 matches
val Array(k,v) = s.split(",");
//make a tuple out of the splits
(k, v.trim.toInt)}.
toMap // turns an collection of tuples into a map