I'm pretty new to scala, and I'm trying to use regex to solve a problem, but I'm getting an error.
I copied an example directly from the docs (2.10.4), and I'm getting the same error.
What am I missing here:
println(scala.tools.nsc.Properties.versionString)
val p1 = "ab*c".r
val p2 = "a(b*)c".r
val p1Matches = "abbbc" match {
case p1() => true
case _ => false
}
produces:
version 2.10.4
Exception in thread "main" java.lang.NoSuchMethodError: scala.util.matching.Regex.unapplySeq(Ljava/lang/CharSequence;)Lscala/Option;
at SparkGrep$.argsParser(SparkGrep.scala:67)
at SparkGrep$.main(SparkGrep.scala:23)
at SparkGrep.main(SparkGrep.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
...
line 67 is the line that says:
case p1() => true
for reference this example is copied directly from the unappplyseq example at http://www.scala-lang.org/api/2.10.4/index.html#scala.util.matching.Regex
Update, I have created an example project where this does, in fact work. I've figured out that it has something to do withmy build process, but I have no idea why.
Here's the build.sbt for the broken project
name := "logdata-demo-07-22-2015"
version := "1.0"
scalaVersion := "2.10.4"
resolvers ++= Seq(
"Typesafe Releases" at "http://repo.typesafe.com/ typesafe/releases",
"spray repo" at "http://repo.spray.io"
)
resolvers += "spray" at "http://repo.spray.io/"
val sparkVersion = "1.3.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-mllib" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"com.databricks" %% "spark-csv" % "1.1.0",
"com.github.scopt" %% "scopt" % "3.3.0"
)
assemblyMergeStrategy in assembly := {
case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
case "log4j.properties" => MergeStrategy.discard
case m if m.toLowerCase.startsWith("meta-inf/services/") => MergeStrategy.filterDistinctLines
case "reference.conf" => MergeStrategy.concat
case _ => MergeStrategy.first
}
assemblyOutputPath in assembly := new File("output/logdata-demo.jar")
I tried it on Scala 2.11.6 everything works fine.
Could be connected to the Scala version.
Best action would be to update to 2.11.x. If you are stuck with 2.10 you could try 2.10.5.
scala> :paste
// Entering paste mode (ctrl-D to finish)
println(scala.tools.nsc.Properties.versionString)
val p1 = "ab*c".r
val p2 = "a(b*)c".r
val p1Matches = "abbbc" match {
case p1() => true
case _ => false
}
// Exiting paste mode, now interpreting.
version 2.11.6
p1: scala.util.matching.Regex = ab*c
p2: scala.util.matching.Regex = a(b*)c
p1Matches: Boolean = true
I tried this code in Scala 2.10.4 and it worked fine.
You should try getting the Scala version in the following way to make sure you're using 2.10.4:
scala.util.Properties.scalaPropOrElse("version.number", "unknown")
val p1 = "ab*c".r
val p2 = "a(b*)c".r
val p1Matches = "abbbc" match {
case p1() => true
case _ => false
}
This was caused by inappropriate placement of the class files. They were in a scala 2.11 folder at build time. moving them to a scala-2.10 folder fixed the issue.
SBT is weird
Related
This works pretty well for a single file
val inputSteamFromFile = getClass.getResourceAsStream("/file1")
val inputStreamAsString : Array[String] = scala.io.Source
.fromInputStream(inputSteamFromFile)
.mkString.split(",")
.filter(queryStr => queryStr.trim.length != 0)
But if i want to the same for multiple files say
filePathList = ["/File1", "/File2"] the following is not wrorking!
val inputStreamAsString : Array[String] = filePathList
.foreach(
files => scala.io.Source.fromInputStream(getClass.getResourceAsStream(files))
.mkString.split(",")
.filter(queryStr => queryStr.trim.length != 0)
)
I'm trying to convert the following ES6 script to bucklescript and I cannot for the life of me figure out how to create a "closure" in bucklescript
import {Socket, Presence} from "phoenix"
let socket = new Socket("/socket", {
params: {user_id: window.location.search.split("=")[1]}
})
let channel = socket.channel("room:lobby", {})
let presence = new Presence(channel)
function renderOnlineUsers(presence) {
let response = ""
presence.list((id, {metas: [first, ...rest]}) => {
let count = rest.length + 1
response += `<br>${id} (count: ${count})</br>`
})
document.querySelector("main[role=main]").innerHTML = response
}
socket.connect()
presence.onSync(() => renderOnlineUsers(presence))
channel.join()
the part I cant figure out specifically is let response = "" (or var in this case as bucklescript always uses vars):
function renderOnlineUsers(presence) {
let response = ""
presence.list((id, {metas: [first, ...rest]}) => {
let count = rest.length + 1
response += `<br>${id} (count: ${count})</br>`
})
document.querySelector("main[role=main]").innerHTML = response
}
the closest I've gotten so far excludes the result declaration
...
...
let onPresenceSync ev =
let result = "" in
let listFunc = [%raw begin
{|
(id, {metas: [first, ...rest]}) => {
let count = rest.length + 1
result += `${id} (count: ${count})\n`
}
|}
end
] in
let _ =
presence |. listPresence (listFunc) in
[%raw {| console.log(result) |} ]
...
...
compiles to:
function onPresenceSync(ev) {
var listFunc = (
(id, {metas: [first, ...rest]}) => {
let count = rest.length + 1
result += `${id} (count: ${count})\n`
}
);
presence.list(listFunc);
return ( console.log(result) );
}
result is removed as an optimization beacuse it is considered unused. It is generally not a good idea to use raw code that depends on code generated by BuckleScript, as there's quite a few surprises you can encounter in the generated code.
It is also not a great idea to mutate variables considered immutable by the compiler, as it will perform optimizations based on the assumption that the value will never change.
The simplest fix here is to just replace [%raw {| console.log(result) |} ] with Js.log result, but it might be enlightening to see how listFunc could be written in OCaml:
let onPresenceSync ev =
let result = ref "" in
let listFunc = fun [#bs] id item ->
let count = Js.Array.length item##meta in
result := {j|$id (count: $count)\n|j}
in
let _ = presence |. (listPresence listFunc) in
Js.log !result
Note that result is now a ref cell, which is how you specify a mutable variable in OCaml. ref cells are updated using := and the value it contains is retrieved using !. Note also the [#bs] annotation used to specify an uncurried function needed on functions passed to external higher-order functions. And the string interpolation syntax used: {j| ... |j}
My version of RegEx is being greedy and now working as it suppose to. I need extract each message with timestamp and user who created it. Also if user has two or more consecutive messages it should go inside one match / block / group. How to solve it?
https://regex101.com/r/zD5bR6/1
val pattern = "((a\.b|c\.d)\n(.+\n)+)+?".r
for(m <- pattern.findAllIn(str).matchData; e <- m.subgroups) println(e)
UPDATE
ndn solution throws StackOverflowError when executed:
Exception in thread "main" java.lang.StackOverflowError
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4708)
.......
Code:
val pattern = "(?:.+(?:\\Z|\\n))+?(?=\\Z|\\w\\.\\w)".r
val array = (pattern findAllIn str).toArray.reverse foreach{println _}
for(m <- pattern.findAllIn(str).matchData; e <- m.subgroups) println(e)
I don't think a regular expression is the right tool for this job. My solution below uses a (tail) recursive function to loop over the lines, keep the current username and create a Message for every timestamp / message pair.
import java.time.LocalTime
case class Message(user: String, timestamp: LocalTime, message: String)
val Timestamp = """\[(\d{2})\:(\d{2})\:(\d{2})\]""".r
def parseMessages(lines: List[String], usernames: Set[String]) = {
#scala.annotation.tailrec
def go(
lines: List[String], currentUser: Option[String], messages: List[Message]
): List[Message] = lines match {
// no more lines -> return parsed messages
case Nil => messages.reverse
// found a user -> keep as currentUser
case user :: tail if usernames.contains(user) =>
go(tail, Some(user), messages)
// timestamp and message on next line -> create a Message
case Timestamp(h, m, s) :: msg :: tail if currentUser.isDefined =>
val time = LocalTime.of(h.toInt, m.toInt, s.toInt)
val newMsg = Message(currentUser.get, time, msg)
go(tail, currentUser, newMsg :: messages)
// invalid line -> ignore
case _ =>
go(lines.tail, currentUser, messages)
}
go(lines, None, Nil)
}
Which we can use as :
val input = """
a.b
[10:12:03]
you can also get commands
[10:11:26]
from the console
[10:11:21]
can you check if has been resolved
[10:10:47]
ah, okay
c.d
[10:10:39]
anyways startsLevel is still 4
a.b
[10:09:25]
might be a dead end
[10:08:56]
that need to be started early as well
"""
val lines = input.split('\n').toList
val users = Set("a.b", "c.d")
parseMessages(lines, users).foreach(println)
// Message(a.b,10:12:03,you can also get commands)
// Message(a.b,10:11:26,from the console)
// Message(a.b,10:11:21,can you check if has been resolved)
// Message(a.b,10:10:47,ah, okay)
// Message(c.d,10:10:39,anyways startsLevel is still 4)
// Message(a.b,10:09:25,might be a dead end)
// Message(a.b,10:08:56,that need to be started early as well)
The idea is to take as little characters as possible that will be followed by a username or the end of the string:
(?:.+(?:\Z|\n))+?(?=\Z|\w\.\w)
See it in action
I am counting values in each window and find the top values and want to save only the top 10 frequent values of each window to hdfs rather than all the values.
eegStreams(a) = KafkaUtils.createStream(ssc, zkQuorum, group, Map(args(a) -> 1),StorageLevel.MEMORY_AND_DISK_SER).map(_._2)
val counts = eegStreams(a).map(x => (math.round(x.toDouble), 1)).reduceByKeyAndWindow(_ + _, _ - _, Seconds(4), Seconds(4))
val sortedCounts = counts.map(_.swap).transform(rdd => rdd.sortByKey(false)).map(_.swap)
ssc.sparkContext.parallelize(rdd.take(10)).saveAsTextFile("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))}
//sortedCounts.foreachRDD(rdd =>println("\nTop 10 amplitudes:\n" + rdd.take(10).mkString("\n")))
sortedCounts.map(tuple => "%s,%s".format(tuple._1, tuple._2)).saveAsTextFiles("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))
I can print top 10 as above (commented).
I have also tried
sortedCounts.foreachRDD{ rdd => ssc.sparkContext.parallelize(rdd.take(10)).saveAsTextFile("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))}
but I get the following error. My Array is not serializable
15/01/05 17:12:23 ERROR actor.OneForOneStrategy:
org.apache.spark.streaming.StreamingContext
java.io.NotSerializableException:
org.apache.spark.streaming.StreamingContext
Can you try this?
sortedCounts.foreachRDD(rdd => rdd.filterWith(ind => ind)((v, ind) => ind <= 10).saveAsTextFile(...))
Note: I didn't test the snippet...
Your first version should work. Just declare #transient ssc = ... where the Streaming Context is first created.
The second version won't work b/c StreamingContext cannot be serialized in a closure.
I am facing a major issue with my design at this juncture. My method is trying to accomplish the follows:
Insert the passed in object into the database.
Get the autoincremented id from the insert and use it to call webservice1 along with the object.
Get the result from webservice1 and call webservice2 with the original object and some response from webservice1.
Combine the results from webservice1 and 2 and write it into the database.
Get the resulting autoincremented id from the last insert and call webservice3 with the original object that would eventually result into the success or failure of the operation.
I want to design this in a flexible manner since the requirements are in a flux and I do not want to keep on modifying my logic based on any changing. I do realize some amount of change is inevitable but I would like to minimize the damage and respect the open-closed principle.
My initial take was as follows:
def complexOperation(someObject:T) =
dbService.insertIntoDb(someObject) match {
case Left(e:Exception) => Left(e)
case Right(id:Int) => webService.callWebService1(id,someObject) match {
case Left(e:Exception) => Left(e)
case Right(r:SomeResponse1) => webService.callWebservice2(r,someObject) match {
case Left(e:Exception) => webService.rollbackService1();Left(e)
case Right(context:ResponseContext) => dbService.insertContextIntoDb(context) match {
case Left(e:Exception) => Left(e)
case Right(id:Int) => webService.callWebservice3(id,someObject) match {
case Left(e:Exception) => webService.rollbackService3();Left(e)
case Right(r:Response) => Right(r)
}
}
}
}
As you can see, this is a tangled mess. I can neither unit test it, nor extend it nor very easily debug it if things spiral out of control. This code serves its purpose but it will be great to get some ideas on how I should refactor it to make the lives of the people who inherit my code a little more easier.
Thanks
Have a look at scala.util.Try. It's available in Scala 2.10, which may or may not be available to you as an option, but the idea of it is perfect for your scenario.
What you have in your code example is what I like calling the "pyramid" of nesting. The best solution to this is to use flat-mapping wherever you can. But obviously that's an issue when you have stuff like Either[Exception, Result] at every step. That's where Try comes in. Try[T] is essentially a replacement for Either[Exception, T], and it comes with all of the flatMap-ing goodness that you need.
Assuming you can either change the return type of those webService calls, or provide some implicit conversion from Either[Exception, Result] to Try[Result], your code block would become something more like...
for {
id <- dbService.insertIntoDb(someObject)
r <- webService.callWebService1(id,someObject)
context <- webService.callWebservice2(r,someObject)
id2 <- dbService.insertContextIntoDb(context)
response <- webService.callWebservice3(id,someObject).recoverWith {
case e: Exception => webService.rollbackService3(); Failure(e)
}
} yield response
Lift has a similar mechanism in net.liftweb.common.Box. It's like Option, but with a container for Exceptions too.
edit: It looks like you can use the left or right method of an Either, and it will let you use flatMap-ing almost exactly the way I described with Try. The only difference is that the end result is an Either[Exception, Result] instead of a Try[Result]. Check out LeftProjection for details/examples.
You can use for comprehension to reduce the noise in the code.
#Dylan had the right idea above. Let me see if I can help translate what you want to do into idiomatic Scala 2.9.1 code.
This version doesn't attempt any rollbacks:
// 1: No rollbacks, just returns the first exception in Left
def complexOperation1(someObject:T): Either[Exception, Response] = {
for {
id <- dbService.insertIntoDb(someObject).right
r <- webService.callWebService1(id, someObject).right
context <- webService.callWebservice2(idResp, someObject).right
id2 <- dbService.insertContextIntoDb(context).right
response <- webService.callWebservice3(id,someObject).right
} yield response
}
Now, let's try to do the rollbacks exactly as you had them above:
// 2: Rolls back all web services and returns first exception in Left
def complexOperation1(someObject:T): Either[Exception, Response] = {
for {
id <- dbService.insertIntoDb(someObject).right
r <- webService.callWebService1(id, someObject).right
context <- webService.callWebservice2(idResp, someObject).left.map { e =>
webService.rollbackService1()
e
}.right
id2 <- dbService.insertContextIntoDb(context).right
response <- webService.callWebservice3(id,someObject).left.map { e =>
webService.rollbackService3()
e
}.right
} yield response
}
If you define a function which does the effect (the rollback) on the left, it get's a little cleaner and easier to test, for example:
// 3: Factor out the side-effect of doing the follbacks on Left
def rollbackIfLeft[T](f: => Either[Exception, T], r: => Unit): Either[Exception, T] = {
val result = f
result.left.foreach(_ => r) // do the rollback if any exception occured
result
}
def complexOperation1(someObject:T): Either[Exception, Response] = {
for {
id <- dbService.insertIntoDb(someObject).right
r <- webService.callWebService1(id, someObject).right
context <- rollbackIfLeft(webService.callWebservice2(idResp, someObject),
webService.rollbackService1()).right
id2 <- dbService.insertContextIntoDb(context).right
response <- rollbackIfLeft(webService.callWebservice3(id,someObject),
webService.rollbackService3()).right
} yield response
}
You can try out rollbackIfLeft in the scala REPL to get a sense of it:
scala> rollbackIfLeft(Right(42), println("hey"))
res28: Either[Exception,Int] = Right(42)
scala> rollbackIfLeft(Left(new RuntimeException), println("ERROR!"))
ERROR!
res29: Either[Exception,Nothing] = Left(java.lang.RuntimeException)
Hope this helps!