Task not serializable - Regex - regex

i have a movie which has a title. In this title is the year of the movie like "Movie (Year)". I want to extract the Year and i'm using a regex for this.
case class MovieRaw(movieid:Long,genres:String,title:String)
case class Movie(movieid:Long,genres:Set[String],title:String,year:Int)
val regexYear = ".*\\((\\d*)\\)".r
moviesRaw.map{case MovieRaw(i,g,t) => Movie(i,g,t,t.trim() match { case regexYear(y) => Integer.parseInt(y)})}
When executing the last command i get the following Error:
java.io.NotSerializableException: org.apache.spark.SparkConf
Running in the Spark/Scala REPL, with this SparkContext:
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
val sc = new SparkContext(conf)

As Dean explained, the reason of the problem is that the REPL creates a class out of the code added to the REPL and, in this case, the other variables in the same context are being "pulled" in the closure by the regex declaration.
Given the way you're creating the context, a simple way to avoid that serialization issue would be to declare the SparkConf and SparkContext transient:
#transient val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
#transient val sc = new SparkContext(conf)
You don't even need to recreate the spark context in the REPL for the only purpose of connecting to Cassandra:
spark-shell --conf spark.cassandra.connection.host=localhost

You probably have this code in a larger Scala class or object (a type), right? If so, in order to serialize the regexYear, the whole enclosing type gets serialized, but you probably have the SparkConf defined in that type.
This is a very common and confusing problem and efforts are underway to prevent it, given the constraints of the JVM and languages on top of it, like Java.
The solution (for now) is to put regexYear inside a method or another object:
object MyJob {
def main(...) = {
case class MovieRaw(movieid:Long,genres:String,title:String)
case class Movie(movieid:Long,genres:Set[String],title:String,year:Int)
val regexYear = ".*\\((\\d*)\\)".r
moviesRaw.map{case MovieRaw(i,g,t) => Movie(i,g,t,t.trim() match { case regexYear(y) => Integer.parseInt(y)})}
...
}
}
or
...
object small {
case class MovieRaw(movieid:Long,genres:String,title:String)
case class Movie(movieid:Long,genres:Set[String],title:String,year:Int)
val regexYear = ".*\\((\\d*)\\)".r
moviesRaw.map{case MovieRaw(i,g,t) => Movie(i,g,t,t.trim() match { case regexYear(y) => Integer.parseInt(y)})}
}
Hope this helps.

Try passing in the cassandra option on the command line for spark-shell like this:
spark-shell [other options] --conf spark.cassandra.connection.host=localhost
And that way you won't have to recreate the SparkContext -- you can use the SparkContext (sc) that gets instantiated automatically with spark-shell.

Related

In akka testkit,why use a event rather then a normal method get the actor state?

In book of akka in actor's chapter three. It use a message event to test silent actor's state.
The actor as this:
object SilentActorProtocol {
case class SilentMessage(data: String)
case class GetState(receiver: ActorRef)
}
class SilentActor extends Actor {
import SilentActorProtocol._
var internalState = Vector[String]()
def receive = {
case SilentMessage(data) =>
internalState = internalState :+ data
case GetState(receiver) => receiver ! internalState
}
}
Test code as this:
"change internal state when it receives a message, multi" in {
import SilentActorProtocol._
val silentActor = system.actorOf(Props[SilentActor], "s3")
silentActor ! SilentMessage("whisper1")
silentActor ! SilentMessage("whisper2")
silentActor ! GetState(testActor)
expectMsg(Vector("whisper1", "whisper2"))
}
Inner the test code why use GetState get result of above SilentMessage event.
Why not use slientActor.internalState get the result straightly?
Update
Some friends seems mislead my problem.For detail,
The books said
use the internalState variable will encounter concurrency problem, so there should use a GetState event tell actor to get the actor's inner state rather then use internalState straightly.
I don't know why it should encounter concurrency problem and why use GetState can fix the problem
Explain
slientActor.internalState can't get inner variable straightly , instand, use silentActor.underlyingActor.internalState can get it.So sorry for the terrible question.
If I understand your question correctly, the answer is that the silentActor in the test code is not the actor, it is the ActorRef instance, therefore it does not have the internalState var to be referenced.
If you want to write unit tests for specific variables and methods on the ActorRef, you need to use the underlyingActor technique as described (with its caveats) in the documentation. (see section on TestActorRef.

Akka persistence receiveRecover receives snapshots that are from other actor instances

I am experiencing unexpected behaviour when using Akka persistence. I am fairly new to Akka so apologies in advance if I have missed something obvious.
I have an actor called PCNProcessor. I create an actor instance for every PCN id I have. The problem I experience is that when I create the first actor instance, all works fine and I receive the Processed response. However, when I create further PCNProcessor instances using different PCN ids, I get the Already processed PCN response.
Essentially, for some reason the snapshot stored as part of the first PCN id processor is reapplied to the subsequent PCN id instances even though it does not relate to that PCN and the PCN id is different. To confirm this behaviour, I printed out a log in the receiveRecover, and every subsequent PCNProcessor instance receives snapshots that do not belong to it.
My question is:
Should I be storing the snapshots in a specific way so that they are keyed against the PCN Id? And then should I be filtering away snapshots that are not related to the PCN in context?
Or should the Akka framework be taking care of this behind the scenes, and I should not have to worry about storing snapshots against the PCN id.
Source code for the actor is below. I do use sharding.
package com.abc.pcn.core.actors
import java.util.UUID
import akka.actor._
import akka.persistence.{AtLeastOnceDelivery, PersistentActor, SnapshotOffer}
import com.abc.common.AutoPassivation
import com.abc.pcn.core.events.{PCNNotProcessedEvt, PCNProcessedEvt}
object PCNProcessor {
import akka.contrib.pattern.ShardRegion
import com.abc.pcn.core.PCN
val shardName = "pcn"
val idExtractor: ShardRegion.IdExtractor = {
case ProcessPCN(pcn) => (pcn.id.toString, ProcessPCN(pcn))
}
val shardResolver: ShardRegion.ShardResolver = {
case ProcessPCN(pcn) => pcn.id.toString
}
// shard settings
def props = Props(classOf[PCNProcessor])
// command and response
case class ProcessPCN(pcn: PCN)
case class NotProcessed(reason: String)
case object Processed
}
class PCNProcessor
extends PersistentActor
with AtLeastOnceDelivery
with AutoPassivation
with ActorLogging {
import com.abc.pcn.core.actors.PCNProcessor._
import scala.concurrent.duration._
context.setReceiveTimeout(10.seconds)
private val pcnId = UUID.fromString(self.path.name)
private var state: String = "not started"
override def persistenceId: String = "pcn-processor-${pcnId.toString}"
override def receiveRecover: Receive = {
case SnapshotOffer(_, s: String) =>
log.info("Recovering. PCN ID: " + pcnId + ", State to restore: " + s)
state = s
}
def receiveCommand: Receive = withPassivation {
case ProcessPCN(pcn)
if state == "processed" =>
sender ! Left(NotProcessed("Already processed PCN"))
case ProcessPCN(pcn)
if pcn.name.isEmpty =>
val error: String = "Name is invalid"
persist(PCNNotProcessedEvt(pcn.id, error)) { evt =>
state = "invalid"
saveSnapshot(state)
sender ! Left(NotProcessed(error))
}
case ProcessPCN(pcn) =>
persist(PCNProcessedEvt(pcn.id)) { evt =>
state = "processed"
saveSnapshot(state)
sender ! Right(Processed)
}
}
}
Update:
After logging out the metadata for the received snapshot, I can see the problem is that the snapshotterId is not resolving properly and is always being set to pcn-processor-${pcnId.toString} without resolving the bit in italics.
[INFO] [06/06/2015 09:10:00.329] [ECP-akka.actor.default-dispatcher-16] [akka.tcp://ECP#127.0.0.1:2551/user/sharding/pcn/16b3d4dd-9e0b-45de-8e32-de799d21e7c5] Recovering. PCN ID: 16b3d4dd-9e0b-45de-8e32-de799d21e7c5, Metadata of snapshot SnapshotMetadata(pcn-processor-${pcnId.toString},1,1433577553585)
I think you are misusing the Scala string interpolation feature.
Try in the following way:
override def persistenceId: String = s"pcn-processor-${pcnId.toString}"
Please note the use of s before the string literal.
Ok fixed this by changing the persistence id to the following line:
override def persistenceId: String = "pcn-processor-" + pcnId.toString
The original in string version:
override def persistenceId: String = "pcn-processor-${pcnId.toString}"
only works for persisting to journal but not for snapshots.

Issue when mocking Logger

I am having a strange issue when mocking the log field of a class. Running the same test twice shows an error the second time. This is an example of code:
class AccountConfigJSON {
static Logger log = Logger.getLogger(AccountConfigJSON.class)
def AccountConfigJSON(String jsonString) {
if (jsonString) {
json = new JSONObject(jsonString)
} else {
log.debug("No JSON string for account config. Will not parse")
}
}
}
and this is the specification
class AccountConfigJSONUnitSpec extends UnitSpec {
def loggerMock
def setup(){
loggerMock = Mock(org.apache.log4j.Logger)
org.apache.log4j.Logger.metaClass.static.getLogger = { Class clazz -> loggerMock }
}
def 'If jsonString is null, a log is written'(){
when:
new AccountConfigJSON("")
then:
1 * loggerMock.debug("No JSON string for account config. Will not parse")
}
def 'If jsonString is empty, a log is written'(){
when:
new AccountConfigJSON("")
then:
1 * loggerMock.debug("No JSON string for account config. Will not parse")
}
}
The second test fails showing
| Too few invocations for:
1 * loggerMock.debug("No JSON string for account config. Will not parse") (0 invocations)
but debugging the app using Idea, clearly it runs this sentence. Any idea?
Looks odd that the actual call is executed but the interaction in not recorded. You can get around with it by explicitly assigning the mocked logger to the class as below:
def setup(){
loggerMock = Mock(org.apache.log4j.Logger)
AccountConfigJSON.log = loggerMock
}
From the definition of "interaction", I think the above setup is the best way to go.
Is an Interaction Just a Regular Method Invocation?
Not quite. While an interaction looks similar to a regular method
invocation, it is simply a way to express which method invocations are
expected to occur. A good way to think of an interaction is as a
regular expression that all incoming invocations on mock objects are
matched against. Depending on the circumstances, the interaction may
match zero, one, or multiple invocations.
This only happens while dealing with static object properties in a class. The moment logger is defined non-static in the class under test, everything works as expected without the work around.

Scala testing mocking implicit parameters?

I'm having a bit of a tough time trying to understand how to write tests in Scala when implicit parameters are involved.
I have the following (short version) of my code and test:
Implementation (Scala 2.10, Spray and Akka):
import spray.httpx.SprayJsonSupport._
import com.acme.ResultJsonFormat._
case class PerRequestIndexingActor(ctx: RequestContext) extends Actor with ActorLogging {
def receive = LoggingReceive {
case AddToIndexRequestCompleted(result) =>
ctx.complete(result)
context.stop(self)
}
}
object ResultJsonFormat extends DefaultJsonProtocol {
implicit val resultFormat = jsonFormat2(Result)
}
case class Result(code: Int, message: String)
Test (Using ScalaTest and Mockito):
"Per Request Indexing Actor" should {
"send the HTTP Response when AddToIndexRequestCompleted message is received" in {
val request = mock[RequestContext]
val result = mock[Result]
val perRequestIndexingActor = TestActorRef(Props(new PerRequestIndexingActor(request)))
perRequestIndexingActor ! AddToIndexRequestCompleted(result)
verify(request).complete(result)
}
}
This line, verify(request).complete(result) uses an implicit Marshaller to turn Result into JSON.
I can bring a marshaller in to scope by adding implicit val marshaller: Marshaller[Result] = mock[Marshaller[Result]] but when I run the test a different instance of Marshaller is used, so the verification fails.
Even explicitly passing the mock Marshaller to complete fails.
So, can any one advise how to create a mock object for an implicit parameter and make sure that instance is the one used?
This is a perfect situation to use a Matcher from Mockito for the marshaller arg. You should not need to mock out the implicit marshaller. All you really want to do is verify that complete was called with a result matching what you expected and also some instance of the marshaller. First, if you haven't already done it, bring the Mockito matchers into scope with an import like so:
import org.mockito.Matchers._
Then, if you wanted reference matching on the result, you could verify like so:
verify(request).complete(same(result))(any[classOf[Marshaller[Result]]])
Or, if you wanted equals matching on result you could do:
verify(request).complete(eq(result))(any(classOf[Marshaller[Result]]))
The trick with matchers is that once you use one for one arg, you have to use them for all args, so that's why we have to use one for result too.

How to call function from hashmap in Scala

I'm pretty new to scala and basically I want to have a couple of functions coupled to a string in a hashmap.
However I get an error at subscribers.get(e.key)(e.EventArgs); stating Option[EventArgs => Unit] does not take parameters...
Example code:
object Monitor {
val subscribers = HashMap.empty[String, (EventArgs) => Unit ]
def trigger(e : Event){
subscribers.get(e.key)(e.EventArgs);
}
def subscribe(key: String, e: (EventArgs) => Unit) {
subscribers += key -> e;
}
}
The get method of a Map gives you an Option of the value, not the value. Thus, if the key if found in the map, you get Some(value), if not, you get None. So you need to first "unroll" that option to make sure there is actually a value of a function which you can invoke (call apply on):
def trigger(e: Event): Unit =
subscribers.get(e.key).foreach(_.apply(e.EventArgs))
or
def trigger(e: Event): Unit =
subscribers.get(e.key) match {
case Some(value) => value(e.EventArgs)
case None =>
}
There are many posts around explaining Scala's Option type. For example this one or this one.
Also note Luigi's remark about using an immutable map (the default Map) with a var instead.
Since the get method returns Option, you can use 'map' on that:
subscribers.get(e.key).map(f => f(e.EventArgs))
or even shorter:
subscribers.get(e.key) map (_(e.EventArgs))
get only takes one argument. So subscribers.get(e.key) returns an Option, and you're trying to feed (e.EventArgs) to that Option's apply method (which doesn't exist).
Also, try making the subscribers a var (or choosing a mutable collection type). At the moment you have an immutable collection and an immutable variable, so your map cannot change. A more idiomatic way to declare it would be
var subscribers = Map[String, EventArgs => Unit]()
HashMap.get() in Scala works in a bit different way, than in Java. Instead of returning value itself, get() returns Option. Option is a special type, that can have 2 values - Some(x) and None. In first case it tells "there's some value with such a key in a map". In second case it tells "nope, there's nothing (none) for this key in a map". This is done to force programmers check whether map actually has an object or not and avoid NullPointerException, which appears so frequently in Java code.
So you need something like this:
def trigger(e: Event) {
val value = subscribers.get(e.key)
value match {
case None => throw new Exception("Oops, no such subscriber...")
case Some(f) => f(e.EventArgs)
}
}
You can find more info about Option type and pattern matching in Scala here.