How to extract messages using regex in Scala? - regex

My version of RegEx is being greedy and now working as it suppose to. I need extract each message with timestamp and user who created it. Also if user has two or more consecutive messages it should go inside one match / block / group. How to solve it?
https://regex101.com/r/zD5bR6/1
val pattern = "((a\.b|c\.d)\n(.+\n)+)+?".r
for(m <- pattern.findAllIn(str).matchData; e <- m.subgroups) println(e)
UPDATE
ndn solution throws StackOverflowError when executed:
Exception in thread "main" java.lang.StackOverflowError
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4708)
.......
Code:
val pattern = "(?:.+(?:\\Z|\\n))+?(?=\\Z|\\w\\.\\w)".r
val array = (pattern findAllIn str).toArray.reverse foreach{println _}
for(m <- pattern.findAllIn(str).matchData; e <- m.subgroups) println(e)

I don't think a regular expression is the right tool for this job. My solution below uses a (tail) recursive function to loop over the lines, keep the current username and create a Message for every timestamp / message pair.
import java.time.LocalTime
case class Message(user: String, timestamp: LocalTime, message: String)
val Timestamp = """\[(\d{2})\:(\d{2})\:(\d{2})\]""".r
def parseMessages(lines: List[String], usernames: Set[String]) = {
#scala.annotation.tailrec
def go(
lines: List[String], currentUser: Option[String], messages: List[Message]
): List[Message] = lines match {
// no more lines -> return parsed messages
case Nil => messages.reverse
// found a user -> keep as currentUser
case user :: tail if usernames.contains(user) =>
go(tail, Some(user), messages)
// timestamp and message on next line -> create a Message
case Timestamp(h, m, s) :: msg :: tail if currentUser.isDefined =>
val time = LocalTime.of(h.toInt, m.toInt, s.toInt)
val newMsg = Message(currentUser.get, time, msg)
go(tail, currentUser, newMsg :: messages)
// invalid line -> ignore
case _ =>
go(lines.tail, currentUser, messages)
}
go(lines, None, Nil)
}
Which we can use as :
val input = """
a.b
[10:12:03]
you can also get commands
[10:11:26]
from the console
[10:11:21]
can you check if has been resolved
[10:10:47]
ah, okay
c.d
[10:10:39]
anyways startsLevel is still 4
a.b
[10:09:25]
might be a dead end
[10:08:56]
that need to be started early as well
"""
val lines = input.split('\n').toList
val users = Set("a.b", "c.d")
parseMessages(lines, users).foreach(println)
// Message(a.b,10:12:03,you can also get commands)
// Message(a.b,10:11:26,from the console)
// Message(a.b,10:11:21,can you check if has been resolved)
// Message(a.b,10:10:47,ah, okay)
// Message(c.d,10:10:39,anyways startsLevel is still 4)
// Message(a.b,10:09:25,might be a dead end)
// Message(a.b,10:08:56,that need to be started early as well)

The idea is to take as little characters as possible that will be followed by a username or the end of the string:
(?:.+(?:\Z|\n))+?(?=\Z|\w\.\w)
See it in action

Related

Yesod Persistent using Aeson to parse UTCTime into record

I have my model from models.persistentmodels
...
Thing
title Text
price Int
kosher Bool
optionalstuff [Text] Maybe
createdat UTCTime
updatedat UTCTime
deriving Show
...
It contains two time fields, which are UTCTime.
I am receiving via AJAX what is almost a Thing, in JSON. But the user JSON should not have createdat and updatedat or kosher. So we need to fill them in.
postNewEventR = do
inputjson <- requireCheckJsonBody :: Handler Value
...
-- get rawstringofthings from inputjson
...
let objectsMissingSomeFields = case (decode (BL.fromStrict $ TE.encodeUtf8 rawstringofthings) :: Maybe [Object]) of
Nothing -> error "Failed to get a list of raw objects."
Just x -> x
now <- liftIO getCurrentTime
-- Solution needs to go here:
let objectsWithAllFields = objectsMissingSomeFields
-- We hope to be done
let things = case (eitherDecode $ encode objectsWithAllFields) :: Either String [Thing] of
Left err -> error $ "Failed to get things because: " <> err
Right xs -> xs
The error "Failed to get things" comes here because the JSON objects we parsed are missing fields that are needed in the model.
Solution
let objectsWithAllFields = Import.map (tackOnNeccessaryThingFields now True) objectsMissingSomeFields
So we take the current object and tack on the missing fields e.g. kosher and createdat.
But there is some strange difference in the way UTCTime is read vs aeson's way to parse UTCTime. So when I print UTCTime in to a Aeson String, I needed to print out the UTCTime into the format that it is expecting later:
tackOnNeccessaryThingFields :: UTCTime -> Bool -> Object -> Object
tackOnNeccessaryThingFields t b hm = G.fromList $ (G.toList hm) <> [
("createdat", String (pack $ formatTime defaultTimeLocale "%FT%T%QZ" t)),
("updatedat", String (pack $ formatTime defaultTimeLocale "%FT%T%QZ" t)),
("kosher", Bool b)
]
tackOnNeccessaryThingFields _ _ _ = error "This isn't an object."
After this fix, the object has all the fields needed to make the record, so the code gives [Thing].
And also the code runs without runtime error, instead of failing to parse the tshow t as UTCTime.
Note:
This aeson github issue about this problem seems to be closed but it seems to be not any more permissive: https://github.com/bos/aeson/issues/197
Thanks to Artyom:
https://artyom.me/aeson#records-and-json-generics
Thanks to Pbrisbin:
https://pbrisbin.com/posts/writing_json_apis_with_yesod/
Thanks to Snoyman:
For everything

Define a closure variable in buckle script

I'm trying to convert the following ES6 script to bucklescript and I cannot for the life of me figure out how to create a "closure" in bucklescript
import {Socket, Presence} from "phoenix"
let socket = new Socket("/socket", {
params: {user_id: window.location.search.split("=")[1]}
})
let channel = socket.channel("room:lobby", {})
let presence = new Presence(channel)
function renderOnlineUsers(presence) {
let response = ""
presence.list((id, {metas: [first, ...rest]}) => {
let count = rest.length + 1
response += `<br>${id} (count: ${count})</br>`
})
document.querySelector("main[role=main]").innerHTML = response
}
socket.connect()
presence.onSync(() => renderOnlineUsers(presence))
channel.join()
the part I cant figure out specifically is let response = "" (or var in this case as bucklescript always uses vars):
function renderOnlineUsers(presence) {
let response = ""
presence.list((id, {metas: [first, ...rest]}) => {
let count = rest.length + 1
response += `<br>${id} (count: ${count})</br>`
})
document.querySelector("main[role=main]").innerHTML = response
}
the closest I've gotten so far excludes the result declaration
...
...
let onPresenceSync ev =
let result = "" in
let listFunc = [%raw begin
{|
(id, {metas: [first, ...rest]}) => {
let count = rest.length + 1
result += `${id} (count: ${count})\n`
}
|}
end
] in
let _ =
presence |. listPresence (listFunc) in
[%raw {| console.log(result) |} ]
...
...
compiles to:
function onPresenceSync(ev) {
var listFunc = (
(id, {metas: [first, ...rest]}) => {
let count = rest.length + 1
result += `${id} (count: ${count})\n`
}
);
presence.list(listFunc);
return ( console.log(result) );
}
result is removed as an optimization beacuse it is considered unused. It is generally not a good idea to use raw code that depends on code generated by BuckleScript, as there's quite a few surprises you can encounter in the generated code.
It is also not a great idea to mutate variables considered immutable by the compiler, as it will perform optimizations based on the assumption that the value will never change.
The simplest fix here is to just replace [%raw {| console.log(result) |} ] with Js.log result, but it might be enlightening to see how listFunc could be written in OCaml:
let onPresenceSync ev =
let result = ref "" in
let listFunc = fun [#bs] id item ->
let count = Js.Array.length item##meta in
result := {j|$id (count: $count)\n|j}
in
let _ = presence |. (listPresence listFunc) in
Js.log !result
Note that result is now a ref cell, which is how you specify a mutable variable in OCaml. ref cells are updated using := and the value it contains is retrieved using !. Note also the [#bs] annotation used to specify an uncurried function needed on functions passed to external higher-order functions. And the string interpolation syntax used: {j| ... |j}

Saving partial spark DStream window to HDFS

I am counting values in each window and find the top values and want to save only the top 10 frequent values of each window to hdfs rather than all the values.
eegStreams(a) = KafkaUtils.createStream(ssc, zkQuorum, group, Map(args(a) -> 1),StorageLevel.MEMORY_AND_DISK_SER).map(_._2)
val counts = eegStreams(a).map(x => (math.round(x.toDouble), 1)).reduceByKeyAndWindow(_ + _, _ - _, Seconds(4), Seconds(4))
val sortedCounts = counts.map(_.swap).transform(rdd => rdd.sortByKey(false)).map(_.swap)
ssc.sparkContext.parallelize(rdd.take(10)).saveAsTextFile("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))}
//sortedCounts.foreachRDD(rdd =>println("\nTop 10 amplitudes:\n" + rdd.take(10).mkString("\n")))
sortedCounts.map(tuple => "%s,%s".format(tuple._1, tuple._2)).saveAsTextFiles("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))
I can print top 10 as above (commented).
I have also tried
sortedCounts.foreachRDD{ rdd => ssc.sparkContext.parallelize(rdd.take(10)).saveAsTextFile("hdfs://ec2-23-21-113-136.compute-1.amazonaws.com:9000/user/hduser/output/" + (a+1))}
but I get the following error. My Array is not serializable
15/01/05 17:12:23 ERROR actor.OneForOneStrategy:
org.apache.spark.streaming.StreamingContext
java.io.NotSerializableException:
org.apache.spark.streaming.StreamingContext
Can you try this?
sortedCounts.foreachRDD(rdd => rdd.filterWith(ind => ind)((v, ind) => ind <= 10).saveAsTextFile(...))
Note: I didn't test the snippet...
Your first version should work. Just declare #transient ssc = ... where the Streaming Context is first created.
The second version won't work b/c StreamingContext cannot be serialized in a closure.

Calling a web service that depends on a result from another web service in play framework

I want to call a second web service with the result from the first one.
Below is some code that highlights my intent.
By the way it compiles fine in IntelliJ(Probable a bug in the IDE).
def get = {
for {
respA <- WS.url(url1).get
id <- respA.body.split(",").take(2)
respB <- WS.url(url2 + id).get // Here is a compile error
} yield {
getMyObjects(respB.xml)
}
}
respA = is a comma separated list with ids used in the next call.
respB = is an XML response that I parse in the yield method
The compile error Play Framework gives me:
type mismatch;
found : scala.concurrent.Future[Seq[MyObject]]
required: scala.collection.GenTraversableOnce[?]
I find the compile error strange.
How can a Future of [Seq[MyObject]] exist at that line?
It shouldn't be any different from the line two lines up that compiles?
WS.url(url1).get returns Future[Response] so all your generators in the for comprehension should be futures. In your code snippet, you are mixing Array[String] and Future[Response]. See Type Mismatch on Scala For Comprehension for some background on for comprehension.
So I would try something like this:
for {
respA <- WS.url(url1).get
ids = respA.body.split(",").take(2)
responsesB <- Future.sequence(ids.map(id => WS.url(url2 + id).get))
} yield {
responsesB.map(respB => getMyObjects(respB.xml))
}
So the types are:
respA: Response
ids: Array[String]
ids.map(id => WS.url(url2 + id).get): Array[Future[Response]]
Future.sequence(ids.map(id => WS.url(url2 + id).get)): Future[Array[Response]]
responsesB: Array[Response]
And the return type of the for comprehension is a Future of an array of whatever getMyObjects returns.
Note that if sequence does not work on Future[Array[_]] try to do ids = respA.body.split(",").toList.take(2).

Calling external services in scala code with dependencies

I am facing a major issue with my design at this juncture. My method is trying to accomplish the follows:
Insert the passed in object into the database.
Get the autoincremented id from the insert and use it to call webservice1 along with the object.
Get the result from webservice1 and call webservice2 with the original object and some response from webservice1.
Combine the results from webservice1 and 2 and write it into the database.
Get the resulting autoincremented id from the last insert and call webservice3 with the original object that would eventually result into the success or failure of the operation.
I want to design this in a flexible manner since the requirements are in a flux and I do not want to keep on modifying my logic based on any changing. I do realize some amount of change is inevitable but I would like to minimize the damage and respect the open-closed principle.
My initial take was as follows:
def complexOperation(someObject:T) =
dbService.insertIntoDb(someObject) match {
case Left(e:Exception) => Left(e)
case Right(id:Int) => webService.callWebService1(id,someObject) match {
case Left(e:Exception) => Left(e)
case Right(r:SomeResponse1) => webService.callWebservice2(r,someObject) match {
case Left(e:Exception) => webService.rollbackService1();Left(e)
case Right(context:ResponseContext) => dbService.insertContextIntoDb(context) match {
case Left(e:Exception) => Left(e)
case Right(id:Int) => webService.callWebservice3(id,someObject) match {
case Left(e:Exception) => webService.rollbackService3();Left(e)
case Right(r:Response) => Right(r)
}
}
}
}
As you can see, this is a tangled mess. I can neither unit test it, nor extend it nor very easily debug it if things spiral out of control. This code serves its purpose but it will be great to get some ideas on how I should refactor it to make the lives of the people who inherit my code a little more easier.
Thanks
Have a look at scala.util.Try. It's available in Scala 2.10, which may or may not be available to you as an option, but the idea of it is perfect for your scenario.
What you have in your code example is what I like calling the "pyramid" of nesting. The best solution to this is to use flat-mapping wherever you can. But obviously that's an issue when you have stuff like Either[Exception, Result] at every step. That's where Try comes in. Try[T] is essentially a replacement for Either[Exception, T], and it comes with all of the flatMap-ing goodness that you need.
Assuming you can either change the return type of those webService calls, or provide some implicit conversion from Either[Exception, Result] to Try[Result], your code block would become something more like...
for {
id <- dbService.insertIntoDb(someObject)
r <- webService.callWebService1(id,someObject)
context <- webService.callWebservice2(r,someObject)
id2 <- dbService.insertContextIntoDb(context)
response <- webService.callWebservice3(id,someObject).recoverWith {
case e: Exception => webService.rollbackService3(); Failure(e)
}
} yield response
Lift has a similar mechanism in net.liftweb.common.Box. It's like Option, but with a container for Exceptions too.
edit: It looks like you can use the left or right method of an Either, and it will let you use flatMap-ing almost exactly the way I described with Try. The only difference is that the end result is an Either[Exception, Result] instead of a Try[Result]. Check out LeftProjection for details/examples.
You can use for comprehension to reduce the noise in the code.
#Dylan had the right idea above. Let me see if I can help translate what you want to do into idiomatic Scala 2.9.1 code.
This version doesn't attempt any rollbacks:
// 1: No rollbacks, just returns the first exception in Left
def complexOperation1(someObject:T): Either[Exception, Response] = {
for {
id <- dbService.insertIntoDb(someObject).right
r <- webService.callWebService1(id, someObject).right
context <- webService.callWebservice2(idResp, someObject).right
id2 <- dbService.insertContextIntoDb(context).right
response <- webService.callWebservice3(id,someObject).right
} yield response
}
Now, let's try to do the rollbacks exactly as you had them above:
// 2: Rolls back all web services and returns first exception in Left
def complexOperation1(someObject:T): Either[Exception, Response] = {
for {
id <- dbService.insertIntoDb(someObject).right
r <- webService.callWebService1(id, someObject).right
context <- webService.callWebservice2(idResp, someObject).left.map { e =>
webService.rollbackService1()
e
}.right
id2 <- dbService.insertContextIntoDb(context).right
response <- webService.callWebservice3(id,someObject).left.map { e =>
webService.rollbackService3()
e
}.right
} yield response
}
If you define a function which does the effect (the rollback) on the left, it get's a little cleaner and easier to test, for example:
// 3: Factor out the side-effect of doing the follbacks on Left
def rollbackIfLeft[T](f: => Either[Exception, T], r: => Unit): Either[Exception, T] = {
val result = f
result.left.foreach(_ => r) // do the rollback if any exception occured
result
}
def complexOperation1(someObject:T): Either[Exception, Response] = {
for {
id <- dbService.insertIntoDb(someObject).right
r <- webService.callWebService1(id, someObject).right
context <- rollbackIfLeft(webService.callWebservice2(idResp, someObject),
webService.rollbackService1()).right
id2 <- dbService.insertContextIntoDb(context).right
response <- rollbackIfLeft(webService.callWebservice3(id,someObject),
webService.rollbackService3()).right
} yield response
}
You can try out rollbackIfLeft in the scala REPL to get a sense of it:
scala> rollbackIfLeft(Right(42), println("hey"))
res28: Either[Exception,Int] = Right(42)
scala> rollbackIfLeft(Left(new RuntimeException), println("ERROR!"))
ERROR!
res29: Either[Exception,Nothing] = Left(java.lang.RuntimeException)
Hope this helps!