Scalding convert one row into multiple - mapreduce

So, I have a scalding pipe that contains entries of the form
(String, Map[String, Int]). I need to convert each instance of this row into multiple rows. That is, if I had
( "Type A", ["a1" -> 2, "a2" ->2, "a3" -> 3] )
I need as output 3 rows
("Type A", "a1", 2)
("Type A", "a2", 2)
("Type A", "a3", 3)
Its the inverse of the groupBy operation essentially I guess. Does anyone know of a way to do this?

You can use flatmap, like so:
class TestJob(args: Args) extends Job(args)
{
val inputPipe: TypedPipe[Input]
val out: TypedPipe[(String, String, Int)]= inputPipe.flatMap { rec =>
rec.map.map{pair => (rec.kind, pair._1, pair._2)}
}
}
case class Input(kind: String, map: Map[String, Int])

Related

Cleanest way to map 2 Keys to the same value in Scala

I have a map where there are two kinds of strings that have to map to the same value. For example, the key either has to be exactly "Test 1" or exactly "Test 1 Extra" and they are both mapped to the value 1
val result = Map(
"Test 1" -> 1,
"Test 1 Extra" -> 1,
"Test 2" -> 2,
"Test 2 Extra" -> 2,
"Test 3" -> 3,
"Test 3 Extra" -> 3
)
With the above, it's a bit unwieldy especially if there were more similar key value pairs. I can imagine there's an easier way to do this, possibly with Regular Expressions to take into account the string Extra that may be in the input?
Just as an inspiration:
case class DoubleKey[X](x1: X, x2: X) {
def ->[A](a: A): List[(X, A)] = List(x1, x2).map((_, a))
}
def twice(extra: String)(s: String) = DoubleKey(s, s + extra)
val withExtra = twice(" extra") _
val res = List(
withExtra("Test 1") -> 1,
withExtra("Test 2") -> 2,
withExtra("Test 3") -> 3
).flatten.toMap
println(res)
No regular expressions needed.

How to retain some key value pairs of a List[T] in scala?

I have a List[T] where the datas inside are like below:
[0]- id : "abc", name: "xyz", others: "value"
[1]- id : "bcd", name: "dsf", others: "value"
Now I want to return the same List[T] but with the id and names i.e the returning List will be:
[0]- id : "abc", name: "xyz"
[1]- id : "bcd", name: "dsf"
I tried with below code:
var temps = templates.map(x => List("id" -> x.id, "name" -> x.name))
But it it produces List inside List i.e:
[0]-
[0] id : "abc"
[1] name: "xyz"
[1]-
[0] id : "bcd"
[1] name: "dsf"
I tried with tuple also but in vain. How can i just map my list such that everything is cleaned out except the id and name value pair??
Unless you want to define a new class with just the id and name fields, I think tuples would be your best bet:
scala> case class obj(id: String, name: String, others: String)
defined class obj
scala> val l = List(new obj("abc", "xyz", "value"), new obj("bcd", "dsf", "value"))
l: List[obj] = List(obj(abc,xyz,value), obj(bcd,dsf,value))
scala> l.map(x => (x.id, x.name))
res0: List[(String, String)] = List((abc,xyz), (bcd,dsf))
Also you are actually using tuples in your example, the -> syntax creates tuples in Scala:
scala> "a" -> "b"
res1: (String, String) = (a,b)
Here is the "define another class" option:
scala> case class obj2(id: String, name: String){
| def this(other: obj) = this(other.id, other.name)
| }
defined class obj2
scala> l.map(new obj2(_))
res2: List[obj2] = List(obj2(abc,xyz), obj2(bcd,dsf))
Given that the List[T] is a List[Map], then you may able to do the following:
//Remove the "others" key.
val temps = templates.map( map => {map - "others"})
I think its pretty clear that your list contains objects with three members: id, name, others. Hence you access them by using
{x => x.name}
What I am not so sure about, is how you imagine your end result. You obviously need some data structure, that holds the members.
You realized yourself, that its not very nice to store each objects members in a list inside the new list, but you seem to not be ok with tuples?
I can imagine, that tuples are just what you want.
val newList = oldList.map{e => e.id -> e.name}
results in alist like this:
List(("id_a", "name_a"), ("id_b", "name_b"), ("id_c","name_c"))
and can be accessed like this(for example):
newList.head._1
for the first tuples ids, and
newList.head._2
for the first tuples name.
Another option could be mapping into a map, since this looks pretty much like, what you want in the first place:
newMap = oldList.map{e => e.id -> e.name}.toMap
This way you can access members like this newMap("key") or safer: newMap.get("key") which returns an option and wont end up in an exception, if the key doesn't exist.

how can i get a string of an element of a list of classes in scala

I have a List of a class where the class is defined like:
case class Role (role_id, elem2, elem3)
well sort of...
so if I have a List of those as roles: List[Role]
how can I get a string of the role_id's so that if my list had 4 Roles in it my string might look like
"3 6 8 9" ?
or better still how can i add some string to it so i can get "3, 6, 8, 9" ?
im having to craft some sql and want set based operations instead of looping. I feel i should flatten or something but i cant think
Thank you
Martin
Try something like this:
scala> case class Role(role_id: Int, elem2: String, elem3: String)
defined class Role
scala> val l = List(Role(1, "", ""), Role(2, "", ""), Role(3, "", ""))
l: List[Role] = List(Role(1,,), Role(2,,), Role(3,,))
scala> l.map({ case Role(id, _, _) => id }).mkString(", ")
res2: String = 1, 2, 3

How to convert integer list to map removing duplicates?

Let's say that I have a Map of strings -> List of Integers. I would like to create a function which takes in as a parameter a List of strings and returns all the integers correlating to all the string in that list. I.e. if the Map X contains the following mappings:
database = [("Bob",[1,2,3]),("John",[1,5,6]),("Trevor",[4,5,7])]
If this function takes in ["Bob","John"] as the list of names, it should return,
[1,2,3,5,6]
Since Bob correlates to 1,2,3 and John correlates to 1,5,6 (same entries for both names aren't duplicated). I also would like to not introduce a mutable variable if I don't have to, thus leading me to believe a for comprehension that yields this list of number values would be the best way to achieve this, but I'm not sure how.
If you want to use a for-comprehension you can so this:
val result = for {
key <- keys
nums <- map.get(key).toSeq
num <- nums
} yield num
result.distinct
Explanation:
for each key in the list try to get an entry and convert it to a Seq (necessary because flatMap expects a Seq in this case) and add every number in the list to the result. If the key is not present in the map the collection will be empty and therefore not yield any results. At the end call distinct to remove the duplicates.
val myMap = Map("Bob" -> List(1,2,3), "John" -> List(1,5,6), "Trevor" -> List(4,5,7))
val names = List("Bob", "John")
You can add default value to Map using method withDefaultValue:
val mapWithDefaul = myMap withDefaultValue Nil
Then you could use Map as function in flatMap:
names.flatMap(mapWithDefaul).distinct
// List(1, 2, 3, 5, 6)
Let
val db = Map("Bob" -> List(1,2,3), "John" -> List(1,5,6), "Trevor" -> List(4,5,7))
val names = List("Bob", "John")
Then a similar approach to #senia's using flatMap,
implicit class mapCorr[A,B](val db: Map[A,List[B]]) extends AnyVal {
def corr(keys: List[A]): List[B] = {
keys.flatMap{ k => db get k }.flatten.distinct
}
}
and
scala> db.corr(keys)
res0: List[Int] = List(1, 2, 3, 5, 6)
Here we allow for key lists of type A and maps from type A to type List[B] .
val myset = Set("Bob","John")
val database = Map(("Bob"->List(1,2,3)),("John"->List(1,5,6)),("Trevor"->List(4,5,7)))
val ids = database.filter(m => myset.contains(m._1)).map(_._2).flatten.toList.distinct
outputs:
ids: List[Int] = List(1, 2, 3, 5, 6)
Something like:
val result = database.filter(elem => list.contains(elem._1)).foldLeft(List())((res,elem) => res ++ elem._2)
where list is the input list of names.

How to convert a List(List[String]) into a Map[String, Int]?

I have a List(List("aba, 4"), List("baa, 2"))and I want to convert it into a map:
val map : Map[String, Int] = Map("aba" -> 4, "baa" -> 2)
What's the best way to archive this?
UPDATE:
I do a database query to retrieve the data:
val (_, myData) = DB.runQuery(...)
This returns a Pair but I'm only interested in the second part, which gives me:
myData: List[List[String]] = List(List(Hello, 19), List(World, 14), List(Foo, 13), List(Bar, 13), List(Bar, 12), List(Baz, 12), List(Baz, 11), ...)
scala> val pat = """\((.*),\s*(.*)\)""".r
pat: scala.util.matching.Regex = \((.*),\s*(.*)\)
scala> list.flatten.map{case pat(k, v) => k -> v.toInt }.toMap
res1: scala.collection.immutable.Map[String,Int] = Map(aba -> 4, baa -> 2)
Yet another take:
List(List("aba, 4"), List("baa, 2")).
flatten.par.collect(
_.split(",").toList match {
case k :: v :: Nil => (k, v.trim.toInt)
}).toMap
Differences to the other answers:
uses .par to parallelize the creation of the pairs, which allows us to profit from multiple cores.
uses collect with a PartialFunction to ignore strings that are not of the form "key, value"
Edit: .par does not destroy the order as the answer state previously. There is only no guarantee for the execution order of the list processing, so the functions should be side-effect free (or side-effects shouldn't care about the ordering).
My take:
List(List("aba, 4"), List("baa, 2")) map {_.head} map {itemList => itemList split ",\\s*"} map {itemArr => (itemArr(0), itemArr(1).toInt)} toMap
In steps:
List(List("aba, 4"), List("baa, 2")).
map(_.head). //List("aba, 4", "baa, 2")
map(itemList => itemList split ",\\s*"). //List(Array("aba", "4"), Array("baa", "2"))
map(itemArr => (itemArr(0), itemArr(1).toInt)). //List(("aba", 4), ("baa", 2))
toMap //Map("aba" -> 4, "baa" -> 2)
Your input data structure is a bit awkward so I don't think you can optimize it/shorten it any further.
List(List("aba, 4"), List("baa, 2")).
flatten. //get rid of those weird inner Lists
map {s=>
//split into key and value
//Array extractor guarantees we get exactly 2 matches
val Array(k,v) = s.split(",");
//make a tuple out of the splits
(k, v.trim.toInt)}.
toMap // turns an collection of tuples into a map