spark 1.0.0 DStream.reduce is not behaving properly - mapreduce

Has anyone got DStream.reduce working using spark 1.0.0?
I have some code which seems perfectly reasonable.
val word1 = messages.map {
word =>
val key = word
(key, 1)
}
val wordcount = word1.reduce(reduceCount)
private def reduceCount(count1: Int, count2: Int) : Int = {
count1 + count2
}
The reduce statement gets a compilation error: type mismatch; found : Integer required: (String, Int)
Why would it have this complaint? reduceCount should just operate on the int count, and reduce should be returning the same type as word1, which is (String, int). I tried a lot of variations to get around this error, but it just seems to be behaving incorrectly.
If you call reduceByKeyAndWindow instead, then there is no compilation error.
val wordcount = word1.reduceByKeyAndWindow(reduceCount, batchDuration)

The operation DStream.reduce has the following signature:
def reduce(reduceFunc: (T, T) => T): DStream[T]
Semantically, it takes an associative function of 2 elements of the stream and produces one element.
Given messagesDstream is a stream of Strings, after mapping it like this:
val word1 = messagesDstream.map {word => (word,1)}
the type of word1 is Tuple2[String, Int]. This means that reduce should take a reduce function with a signature: f(x:(String,Int), y:(String,Int)): (String, Int). In the code provided on the question, the reduce function is f(x:Int, y:Int):Int.
The operation you would like to use in this case it Dstream.reduceByKey(_ + _) as it will apply the reduce function after grouping values by keys, which is what a word count is about.

Related

SMLNJ function that returns as a pair a string at the beginning of a list

So im really confused as i am new to sml and I am having trouble with syntax of how i want to create my function.
the instructions are as follows...
numberPrefix: char list → string * char list
Write a function named numberPrefix that returns (as a pair) a string representing the digit characters at the
beginning of the input list and the remaining characters after this prefix. You may use the Char.isDigit and
String.implode functions in your implementation.
For example,
numberPrefix [#"a", #"2", #"c", #" ", #"a"];
val it = ("", [#"a", #"2", #"c", #" ", #"a") : string * char list
numberPrefix [#"2", #"3", #" ", #"a"];
val it = ("23", [#" ", #"a"]) : string * char list
Here is my code so far...
fun numberPrefix(c:char list):string*char list =
case c of
[] => []
|(first::rest) => if isDigit first
then first::numberPrefix(rest)
else
;
I guess what i am trying to do is append first to a seperate list if it is indeed a digit, once i reach a member of the char list then i would like to return that list using String.implode, but I am banging my head on the idea of passing in a helper function or even just using the "let" expression. How can I essentially create a seperate list while also keeping track of where i am in the original list so that I can return the result in the proper format ?
First of all, the function should produce a pair, not a list.
The base case should be ("", []), not [], and you can't pass the recursive result around "untouched".
(You can pretty much tell this from the types alone. Pay attention to types; they want to help you.)
If you bind the result of recursing in a let, you can access its parts separately and rearrange them.
A directly recursive take might look like this:
fun numberPrefix [] = ("", [])
| numberPrefix (cs as (x::xs)) =
if Char.isDigit x
then let val (number, rest) = numberPrefix xs
in
((str x) ^ number, rest)
end
else ("", cs);
However, splitting a list in two based on a predicate – let's call it "splitOn", with the type ('a -> bool) -> 'a list -> 'a list * 'a list – is a reasonably useful operation, and if you had that function you would only need something like this:
fun numberPrefix xs = let val (nums, notnums) = splitOn Char.isDigit xs
in
(String.implode nums, notnums)
end;
(Splitting left as an exercise. I suspect that you have already implemented this splitting function, or its close relatives "takeWhile" and "dropWhile".)

How can I get the elements of list using ListBuffer

I am facing a very weird problem while getting elements of a list
Below is the piece of code where I am passing arguments as "bc" and "mn"
val list1 = List("abc", "def", "mnp")
val list2 = List(args(0), args(1))
val header1=list1.filter(x => list2.exists(y => x.contains(y)))
println(header1)
Output-List("abc","mnp")
I am trying to do it in a different way (by passing the same arguments)but getting an empty List
val list1 = List("abc", "def", "mnp")
//val list2 = List(args(0), args(1))
val ipList1= new ListBuffer[Any]
for(i <- 0 to 1){
ipList1 +=args(i)
}
val list2=ipList1.toList
println(list2)
val header1=list1.filter(x => list2.exists(y => x.contains(y)))
println(header1)
Output-List(bc, mn)
List()-->This is the empty List I am getting
Can Someone please tell where I am doing it wrong and How to make it right?
The problem is that x.contains(y) does not mean what you think it means. String has a contains method that checks whether another String is a substring of this String. But in your code y doesn't have type String, but type Any. So the contains method of String isn't called. It's the contains method of WrappedString which treats the String x as though it's a Seq[Char]. That method doesn't check whether any substring is equal to y but whether any character is equal to y.
The solution, obviously, is to use a ListBuffer[String].
The problem is that you are using a ListBuffer[Any] thus the elements lost their type information from String to Any and apparently that changes the semantics of the code.
You may either do this:
val ipList1 = new ListBuffer[String]
for (i <- 0 to 1) {
ipList1 += args(i).toString
}
val list2 = ipList1.toList
Or even better just:
val list2 = args.slice(0, 2).toList

Fetch elements from Option[Any] of List

scala> val a = jsonMap.get("L2_ID")
a: Option[Any] = Some(List(24493, 22774, 23609, 20517, 22829, 23646, 22779, 23578, 22765, 23657))
I want to fetch the first element of list i.e 24493. So, tried below code:
scala> var b = a.map(_.toString)
b: Option[String] = Some(List(24493, 22774, 23609, 20517, 22829, 23646, 22779, 23578, 22765, 23657))
scala>
scala> var c = b.map(_.split(",")).toList.flatten
c: List[String] = List(List(24493, " 22774", " 23609", " 20517", " 22829", " 23646", " 22779", " 23578", " 22765", " 23657)")
scala> c(0)
res34: String = List(24493
This is not returning as expected.
I suggest you use pattern matching.
To be defensive, i also added a Try to protect against the case of your json not being a List of numbers.
Code below returns an Option[Int] and you can call .getOrElse(0) on it - or some other default value, if you like.
import scala.util.Try
val first = a match {
case Some(h :: _) => Try(h.toString.toInt).toOption
case _ => None
}
So, you have an Option, and List inside of it. Then scala> var b = a.map(_.toString) converts the contents of the Option (a List) into a String. That's not what you want.
Look at the types of the results of your transformations, they are there to provide pretty good hints for you. b: Option[String], for example, tells you that you have lost the list ...
a.map(_.map(_.toString))
has the type Option[List[String]] on the other hand: you have converted every element of the list to a string.
If you are just looking for the first element, there is no need to convert all of them though. Something like this will do:
a
.flatMap(_.headOption) // Option[Int], containing first element or None if list was empty or id a was None
.map(_.toString) // convert Int inside of Option (if any) to String
.getOrElse("") // get the contents of the Option, or empty string if it was None
If you are certain that it's a Some, and that the list is non-empty, then you can unwrap the option and get the List[Int] using .get. Then you can access the first element of the list using .head:
val x: Option[List[Int]] = ???
x.get.head
If you are not in the REPL, and if you aren't sure whether it's a Some or None, and whether the List has any elements, then use
x.flatMap(_.headOption).getOrElse(yourDefaultValueEg0)
"Stringly-typed" programming is certainly not necessary in a language with such a powerful type system, so converting everything to string and splitting by commas was a seriously flawed approach.

How to split a string in two, recursively

I am writing a recursive ML function, that takes a string, and an index value, and splits the string at the given index. The function should return a list containing two strings.
I understand that I need two base cases one to check if the index has been reached, and one to check if the string is out of characters. I am stuck on how I assign the characters to different strings. Note, I used a helper function to clean up the initial call, so that explode will not need to be typed on every function call.
fun spliatHelp(S, num) =
if null S then nil
else if num = 0 then hd(S) :: (*string2 and call with tl(S)*)
else hd(S) :: (*string1 and call with tl(S)*)
fun spliat(S, num) =
spliatHelp(explode(S), num);
From an input of spliat("theString", 3);
My ideal output would be ["the", "String"];
For the num = 0 case, you just need to return [nil, S] or (equivalently) nil :: S :: nil.
For the other case, you need to make the recursive call spliatHelp (tl S, num - 1) and then examine the result. You can use either a let expression or a case expression for that, as you prefer. The case expression version would look like this:
case spliatHelp (tl S, num - 1)
of nil => nil (* or however you want to handle this *)
| [first, second] => [hd S :: first, second]
| raise (Fail "unexpected result")
Incidentally, rather than returning a string list with either zero or two elements, I think it would be better and clearer to return a (string * string) option. (Or even just a string * string, raising an exception if the index is out of bounds.)

Filter list of lists

I'm very new with Haskell, only starting to learn it.
I'm using "Learn You a Haskell for Great Good!" tutorial for start, and saw example of solving "3n+1" problem:
chain :: (Integral a) => a -> [a]
chain 1 = [1]
chain n
| even n = n:chain (n `div` 2)
| odd n = n:chain (n*3 + 1)
numLongChains :: Int
numLongChains = length (filter isLong (map chain [1..100]))
where isLong xs = length xs > 15
so, numLongChains counts all chains that longer 15 steps, for all numbers from 1 to 100.
Now, I wanna my own:
numLongChains' :: [Int]
numLongChains' = filter isLong (map chain [1..100])
where isLong xs = length xs > 15
so now, I wanna not to count these chains, but return filtered list with these chains.
But now I get error when compiling:
Couldn't match expected type `Int' with actual type `[a0]'
Expected type: Int -> Bool
Actual type: [a0] -> Bool
In the first argument of `filter', namely `isLong'
In the expression: filter isLong (map chain [1 .. 100])
What can be the problem?
The type signature of numLongChains is probably not correct. Depending on what you want to do, one of the following is needed:
You simply want to count those chains, your function numLongChains obviously shall return a number, change the first line to length $ filter isLong (map chain [1..100]) and the type to Int
You want to return a list of lengths of the long chains. In this case, the type signature is fine, but you need to return a length. I'd suggest you, the calculate the length before filtering and filter on it. The function's body becomes filter (>15) (map (length . chain) [1..100]).
You want to return all chains that are longer than 15 chars. Just change the signature to [[Int]] (A list of chains (lists) of Ints) and you're fine.
FUZxxl is right. You are going to want to change the type signature of your function to [[Int]]. As you are filtering a list of lists and only selecting the ones that are sufficiently long, you will have returned a lists of lists.
One note about reading Haskell compile-time debugger/errors. This error may seem strange. It says you had [a0] -> Bool but you were expecting Int -> Bool. This is because that the type checker assumes that, from the signature of your numLongChains' function, you are going to need a filter function that checks Ints and returns a list of acceptable ones. The only way to filter over a list and get [Int] back is to have a function that takes Ints and returns Bools (Int -> Bool). Instead, it sees a function that checks length. Length takes a list, so it guesses that you wrote a function that checks lists. ([a0] -> Bool). Sometimes, the checker is not as friendly as you would like it to be but if you look hard enough, you will see that 9 times out of 10, a hard to decipher error is the result of such as assumptions.