I'm trying to get element from a list:
data =List(("2001",13.1),("2009",3.1),("2004",24.0),("2011",1.11))
Any help? Task is to print separatly Strings and numbers like:
print(x._1+" "+x._2)
but this is not working.
One good practice with functional programming is to do as much as possible with side-effect-free transformations of immutable objects.
What that means (in this case) is that you can convert the list of tuples to a list of strings, then limit your side-effect (the println) to a single step at the very end.
val data = List(("2001",13.1),("2009",3.1),("2004",24.0),("2011",1.11))
val lines = data map { case(a,b) => a + " " + b.toString }
println(lines mkString "\n")
scala> val data =List(("2001",13.1),("2009",3.1),("2004",24.0),("2011",1.11))
data: List[(java.lang.String, Double)] = List((2001,13.1), (2009,3.1), (2004,24.0), (2011,1.11))
scala> data.foreach(x => println(x._1+" "+x._2))
2001 13.1
2009 3.1
2004 24.0
2011 1.11
val list = List(("2001",13.1),("2009",3.1),("2004",24.0),("2011",1.11))
println(list map (_.productIterator mkString " ") mkString "\n")
2001 13.1
2009 3.1
2004 24.0
2011 1.11
I would use pattern matching which yields a programming pattern that scales better for larger tuples and more complex elements:
data.foreach { case (b,c) => println(b + " " + c) }
for the Strings, use List((1,"aoeu")).foreach(((_:Tuple2[String,_])._1) andThen print)
for the numbers, use List(("aoeu",13.0)).foreach(((_:Tuple2[_,Double])._2) andThen print)
Related
I want to create a simple program that calculates someone's age after x years. so first you assign someone's current age to a variable, and then I want to use map to display the future ages.
What I have so far is:
val age = 18
val myList = (1 to 2000).toList
Basically, I want the numbers from the list and make it a map key. And for the value, it's a sum of variable and key. so the map would look like this:
1 -> 19, 2 -> 20, 3 -> 21......
How can I accomplish this?
Consider mapping to tuples
val age = 18
val ageBy: Map[Int, Int] = (1 to 2000).map(i => i -> (age + i)).toMap
ageBy(24) // res1: Int = 42
I am new to Spark and Scala coming from R background.After a few transformations of RDD, I get a RDD of type
Description: RDD[(String, Int)]
Now I want to apply a Regular expression on the String RDD and extract substrings from the String and add just substring in a new coloumn.
Input Data :
BMW 1er Model,278
MINI Cooper Model,248
Output I am looking for :
Input | Brand | Series
BMW 1er Model,278, BMW , 1er
MINI Cooper Model ,248 MINI , Cooper
where Brand and Series are newly calculated substrings from String RDD
What I have done so far.
I could achieve this for a String using regular expression, but I cani apply fro all lines.
val brandRegEx = """^.*[Bb][Mm][Ww]+|.[Mm][Ii][Nn][Ii]+.*$""".r //to look for BMW or MINI
Then I can use
brandRegEx.findFirstIn("hello this mini is bmW testing")
But how can I use it for all the lines of RDD and to apply different regular expression to achieve the output as above.
I read about this code snippet, but not sure how to put it altogether.
val brandRegEx = """^.*[Bb][Mm][Ww]+|.[Mm][Ii][Nn][Ii]+.*$""".r
def getBrand(Col4: String) : String = Col4 match {
case brandRegEx(str) =>
case _ => ""
return 'substring
}
Any help would be appreciated !
Thanks
To apply your regex to each item in the RDD, you should use the RDD map function, which transforms each row in the RDD using some function (in this case, a Partial Function in order to extract to two parts of the tuple which makes up each row):
import org.apache.spark.{SparkContext, SparkConf}
object Example extends App {
val sc = new SparkContext(new SparkConf().setMaster("local").setAppName("Example"))
val data = Seq(
("BMW 1er Model",278),
("MINI Cooper Model",248))
val dataRDD = sc.parallelize(data)
val processedRDD = dataRDD.map{
case (inString, inInt) =>
val brandRegEx = """^.*[Bb][Mm][Ww]+|.[Mm][Ii][Nn][Ii]+.*$""".r
val brand = brandRegEx.findFirstIn(inString)
//val seriesRegEx = ...
//val series = seriesRegEx.findFirstIn(inString)
val series = "foo"
(inString, inInt, brand, series)
}
processedRDD.collect().foreach(println)
sc.stop()
}
Note that I think you have some problems in your regular expression, and you also need a regular expression for finding the series. This code outputs:
(BMW 1er Model,278,BMW,foo)
(MINI Cooper Model,248,NOT FOUND,foo)
But if you correct your regexes for your needs, this is how you can apply them to each row.
hi I was just looking for aother question and got this question. The above problem can be done using normal transformations.
val a=sc.parallelize(collection)
a.map{case (x,y)=>(x.split (" ")(0)+" "+x.split(" ")(1))}.collect
I have a small film "database" which is just a list and each element in the list is a tuple.
I would like to display the list in a string as an easy to read format so for example each data item in the list should be displayed as follows:
Casino Royale
Daniel Craig, Eva Green
2006
Garry, Dave
Titanic
Leonardo DiCaprio, Kate Winslet
1997
Zoe, Amy
Here is the code I am using:
type Title = String
type Actors = [String]
type Year = Int
type Fans = [String]
type Film = (Title, Actors, Year, Fans)
type Database = [Film]
testDatabase :: Database
testDatabase = [("Casino Royale", ["Daniel Craig", "Eva Green"], 2006,["Garry", "Dave"]),
("Titanic", ["Leonardo DiCaprio", "Kate Winslet"], 1997, ["Zoe", "Amy"]),
....
]
One way would be to write your own Show instance for Film. But you need to enable some extensions:
{-#LANGUAGE TypeSynonymInstances#-}
{-#LANGUAGE FlexibleInstances#-}
{-#LANGUAGE OverlappingInstances#-}
Then you create an instance of Show:
instance Show Film where
show (t,a,y,f) = t ++ "\n" ++ actors ++ "\n" ++ (show y) ++ "\n" ++ fans
where actors = intercalate ", " a
fans = intercalate ", " f
Demo in ghci:
λ> mapM_ (\x -> (putStrLn $ show x) >> putStrLn "") testDatabase
Casino Royale
Daniel Craig, Eva Green
2006
Garry, Dave
Titanic
Leonardo DiCaprio, Kate Winslet
1997
Zoe, Amy
I would suggest you to break your (a,b,c,d) type to a record data structure. That is more preferable.
I was able to produce the outcome that I wanted by using #karakfa's approach in the comments of the original post:
pp :: Film -> String; pp (t,a,y,f) = i "\n" [t, i ", " a, show y, i ", " f] where i x = intercalate x
I am counting values in each window and find the top values and want to save only the top 10 frequent values of each window to hdfs rather than all the values.
eegStreams(a) = KafkaUtils.createStream(ssc, zkQuorum, group, Map(args(a) -> 1),StorageLevel.MEMORY_AND_DISK_SER).map(_._2)
val counts = eegStreams(a).map(x => (math.round(x.toDouble), 1)).reduceByKeyAndWindow(_ + _, _ - _, Seconds(4), Seconds(4))
val sortedCounts = counts.map(_.swap).transform(rdd => rdd.sortByKey(false)).map(_.swap)
ssc.sparkContext.parallelize(rdd.take(10)).saveAsTextFile("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))}
//sortedCounts.foreachRDD(rdd =>println("\nTop 10 amplitudes:\n" + rdd.take(10).mkString("\n")))
sortedCounts.map(tuple => "%s,%s".format(tuple._1, tuple._2)).saveAsTextFiles("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))
I can print top 10 as above (commented).
I have also tried
sortedCounts.foreachRDD{ rdd => ssc.sparkContext.parallelize(rdd.take(10)).saveAsTextFile("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))}
but I get the following error. My Array is not serializable
15/01/05 17:12:23 ERROR actor.OneForOneStrategy:
org.apache.spark.streaming.StreamingContext
java.io.NotSerializableException:
org.apache.spark.streaming.StreamingContext
Can you try this?
sortedCounts.foreachRDD(rdd => rdd.filterWith(ind => ind)((v, ind) => ind <= 10).saveAsTextFile(...))
Note: I didn't test the snippet...
Your first version should work. Just declare #transient ssc = ... where the Streaming Context is first created.
The second version won't work b/c StreamingContext cannot be serialized in a closure.
I am trying to create a code that takes the user input, compares it to a list of tuples (shares.py) and then prints the values in a the list. for example if user input was aia, this code would return:
Please list portfolio: aia
Code Name Price
AIA Auckair 1.50
this works fine for one input, but what I want to do is make it work for multiple inputs.
For example if user input was aia, air, amp - this input would return:
Please list portfolio: aia, air, amp
Code Name Price
AIA Auckair 1.50
AIR AirNZ 5.60
AMP Amp 3.22
This is what I have so far. Any help would be appreciated!
import shares
a=input("Please input")
s1 = a.replace(' ' , "")
print ('Please list portfolio: ' + a)
print (" ")
n=["Code", "Name", "Price"]
print ('{0: <6}'.format(n[0]) + '{0:<20}'.format(n[1]) + '{0:>8}'.format(n[2]))
z = shares.EXCHANGE_DATA[0:][0]
b=s1.upper()
c=b.split()
f=shares.EXCHANGE_DATA
def find(f, a):
return [s for s in f if a.upper() in s]
x= (find(f, str(a)))
print ('{0: <6}'.format(x[0][0]) + '{0:<20}'.format(x[0][1]) + ("{0:>8.2f}".format(x[0][2])))
shares.py contains this
EXCHANGE_DATA = [('AIA', 'Auckair', 1.5),
('AIR', 'Airnz', 5.60),
('AMP', 'Amp',3.22),
('ANZ', 'Anzbankgrp', 26.25),
('ARG', 'Argosy', 12.22),
('CEN', 'Contact', 11.22)]
I am assuming a to contain values in the following format 'aia air amp'
raw = a # just in case you want the original string at a later point
toDisplay = []
a = a.split() # a now looks like ['aia','air','amp']
for i in a:
temp = find(f, i)
if(temp):
toDisplay.append(temp)
for i in toDisplay:
print ('{0: <6}'.format(i[0][0]) + '{0:<20}'.format(i[0][1]) + ("{0:>8.2f}".format(i[0][2])))
Essentially what I'm trying to do is
Split the input into a list
Do exactly what you were doing for a single input for each item in that list
Hope this helps!