Scala Regex pattern matching issue when using | - regex

This is my example code.
object Patterns {
val workingPattern = """^thisworks[\w]+""".r
val problemPattern = """^(fail|error|bs|meh)[\w]+""".r
}
object TestMaker {
var works = scala.collection.mutable.Set[String]()
var needsWork = scala.collection.mutable.Set[String]()
var junk = scala.collection.mutable.Set[String]()
def add(someInput: String) = someInput match {
case Patterns.workingPattern() => works.update(someInput, true)
case Patterns.problemPattern() => needsWork.update(someInput, true)
case _ => junk.update(someInput, true)
}
}
When I call TestMaker.add("thisworks1234"), the string "thisworks1234" gets inserted into TestMaker's works set. It works as expected.
When I call TestMaker.add("this_is_just_junk"), the string "this_is_just_junk" gets inserted into the junk set - also as expected.
Here's the problem. When I call TestMaker.add("fail1234"), that string will also be inserted into the junk set. It should however be inserted into the needsWork set.
Where's my mistake?

You should use a non-capturing group with the second regex:
val problemPattern = """^(?:fail|error|bs|meh)[\w]+""".r
^^^
This is required because you are not referencing the captured value in your case.
Note that you can still use capturing groups within your patterns to ignore them later while matching with _*:
case Patterns.workingPattern(_*) => works.update(someInput, true)
case Patterns.problemPattern(_*) => needsWork.update(someInput, true)
case _ => junk.update(someInput, true)
See the IDEONE demo:
object Main extends App {
TestMaker.add("this_is_just_junk")
TestMaker.add("fail1234")
println(TestMaker.needsWork) // => Set(fail1234)
println(TestMaker.junk) // => Set(this_is_just_junk)
}
object Patterns {
val workingPattern = """^thisworks[\w]+""".r
val problemPattern = """^(fail|error|bs|meh)[\w]+""".r
}
object TestMaker {
var works = scala.collection.mutable.Set[String]()
var needsWork = scala.collection.mutable.Set[String]()
var junk = scala.collection.mutable.Set[String]()
def add(someInput: String) = someInput match {
case Patterns.workingPattern(_*) => works.update(someInput, true)
case Patterns.problemPattern(_*) => needsWork.update(someInput, true)
case _ => junk.update(someInput, true)
}
}

Related

Swift: Finding an Object Property via regex

Target: The following function shall iterate over an array of objects and check a specific property of all objects. This property is a string and shall be matched with a user input via regex. If there's a match the object shall be added to an array which will further be passed to another function.
Problem: I don't know how to set up regex in Swift 3. I'm rather new in Swift at all, so an easily understandable solution would be very helpful :)
How it currently looks like:
func searchItems() -> [Item] {
var matches: [Item] = []
if let input = readLine() {
for item in Storage.storage.items { //items is a list of objects
if let query = //regex with query and item.name goes here {
matches.append(item)
}
}
return matches
} else {
print("Please type in what you're looking for.")
return searchItems()
}
}
This is what Item looks like (snippet):
class Item: CustomStringConvertible {
var name: String = ""
var amount: Int = 0
var price: Float = 0.00
var tags: [String] = []
var description: String {
if self.amount > 0 {
return "\(self.name) (\(self.amount) pcs. in storage) - \(price) €"
} else {
return "\(self.name) (SOLD OUT!!!) - \(price) €"
}
}
init(name: String, price: Float, amount: Int = 0) {
self.name = name
self.price = price
self.amount = amount
}
}
extension Item: Equatable {
static func ==(lhs: Item, rhs: Item) -> Bool {
return lhs.name == rhs.name
}
}
Solved. I just edited this post to get a badge :D
For the purpose of letting the answer to be generic and clear, I will assume that the Item model is:
struct Item {
var email = ""
}
Consider that the output should be a filtered array of items that contains items with only valid email.
For such a functionality, you should use NSRegularExpression:
The NSRegularExpression class is used to represent and apply regular
expressions to Unicode strings. An instance of this class is an
immutable representation of a compiled regular expression pattern and
various option flags.
According to the following function:
func isMatches(_ regex: String, _ string: String) -> Bool {
do {
let regex = try NSRegularExpression(pattern: regex)
let matches = regex.matches(in: string, range: NSRange(location: 0, length: string.characters.count))
return matches.count != 0
} catch {
print("Something went wrong! Error: \(error.localizedDescription)")
}
return false
}
You can decide if the given string does matches the given regex.
Back to the example, consider that you have the following array of Item Model:
let items = [Item(email: "invalid email"),
Item(email: "email#email.com"),
Item(email: "Hello!"),
Item(email: "example#example.net")]
You can get the filtered array by using filter(_:) method:
Returns an array containing, in order, the elements of the sequence
that satisfy the given predicate.
as follows:
let emailRegex = "[A-Z0-9a-z._%+-]+#[A-Za-z0-9.-]+\\.[A-Za-z]{2,}"
let emailItems = items.filter {
isMatches(emailRegex, $0.email)
}
print(emailItems) // [Item(email: "email#email.com"), Item(email: "example#example.net")]
Hope this helped.
You can do the same with filter function
let matches = Storage.storage.items.filter({ $0.yourStringPropertyHere == input })

Get object of case class from regex match

i'm working on scraping data from a webpage with scala regex-es, but i encountered problem with parsing result to object of some case class-es.
In following snippet i managed to scrape all the data, but i have no clue how to parse 3 elements from an iterator. I thought about something like:
val a :: b :: c :: _ = result.group(0).iDontKnowWha
Any ideas what can i do?
import model.FuneralSchedule
import play.api.libs.json.Json
import scala.io.Source
var date = "2015-05-05"
val source = Source.fromURL("http://zck.krakow.pl/?pageId=16&date=" + date).mkString
val regex = "(?s)<table>.+?(Cmentarz.+?)<.+?</table>".r
var thing: List[FuneralSchedule] = List()
var jsonFeed: List[Funeral] = List()
val regMatcher = "("
case class Funeral(hour: String, who: String, age: String) {
override def toString: String = {
"Cos"
}
}
//implicit val format = Json.format[Funeral]
val out = regex.findAllIn(source).matchData foreach { table =>
thing ::= FuneralSchedule(table.group(1), clearStrings(table.group(0)))
"""<tr\s?>.+?</\s?tr>""".r.findAllIn(clearStrings(table.group(0))).matchData foreach { tr =>
//TODO: Naprawic bo szlak trafia wydajnosc
val temp = """<td\s?>.+?</\s?td>""".r.findAllIn(tr.group(0)).matchData.foreach {
elem => println(elem)
}
//println(Json.toJson(thingy))
}
println("Koniec tabeli")
}
thing
//Json.toJson(jsonFeed)
println(removeMarkers("<td > <td> Marian Debil </ td>"))
def removeMarkers(s: String) = {
s.replaceAll( """(</?\s?td\s?>)""", "")
}
def clearStrings(s: String) = {
val regex = "((class=\".+?\")|(id=\".+?\")|(style=\".+?\")|(\\n))"
s.replaceAll(regex, "")
}
One way of doing it would be converting it to a Stream and matching it using stream's operators like this:
val a #:: b #:: c #:: _ = """([a-z]){1}""".r.findAllIn("a b c").toStream
then a, b and c is what you're looking for

Calling arbitrary number of WS.url().get() in sequence

I have a List[String] of URLs that I want to load and process (parse, store to database) in sequence.
I found only fixed-length examples, like:
def readUrls = Action {
implicit request => {
implicit val context = scala.concurrent.ExecutionContext.Implicits.global
val url1 = "http://some-website.com"
val url2 = "http://other-website.com"
Async {
for {
result1 <- WS.url(url1).get()
result2 <- WS.url(url2).get()
} yield {
Ok(result1.body + result2.body)
}
}
}
But instead of url1 and url2, I need to process this puppy:
val urls = List("http://some-website.com", "http://other-website.com")
Thanks a bunch for any tips and advice!
If you want to chain Futures together arbitrarily in sequence, foldLeft ought to do the job:
urls.foldLeft(Future.successful[String]("")){ case (left, nextUrl) =>
left.flatMap{ aggregatedResult =>
WS.url(nextUrl).get().map( newResult =>
aggregatedResult + newResult.body
)
}
}
Since you're just combining the request bodies together, I gave the foldLeft an initial value of a Future empty String, which each step in the fold will then add on the next response body.
def executeUrls(urls: List[String]): Future[String] = {
urls.foldLeft(Future(""))((accumulator, url) => {
accumulator.flatMap(acc => {
WS.url(url).get().map(response => {
acc + response.body
})
}
})
}
This should be what you're looking for, note that it returns a new Future.
Edit: apparently LimbSoup was faster.

Scala - Loop + format

Hey guys I´m completely new to Scala and need some Help.My goal is to write a programm wich takes a List and a Command as Input.Then it should either return the list, the average Length of the list or the"longest" Entry. Furthermore it shuld ask over and over again for input, and this is what I dont know how to write. Also I have some problems with the formatting ("%.1f"). Does somebody know how to solve these Problems. Thank you very much. This is my code:
import scala.io.Source
var input = readLine("Enter a List")
val cmd = readLine("Entera command")
input=input.replace(" ","")
var input2=input.split(",").toList
def exercise() {
cmd match {
case "maxLength" => println(getMaxLength(input2))
case "list" => getList(input2)
case "averageLength" => println("%.1f".format(getAverageLeng(input2)))
case "exit" => sys.exit()
case _ => println("unknown command")
}
}
def getMaxLength(list:List[String]): String = {
list match {
case Nil => return ""
case _ => return list.fold("")((l, v) => if (l.length > v.length) l else v)
}
}
def getAverageLeng(list:List[String]): Number = {
list match {
case Nil => return 0.0
case _ => return list.map(_.length()).sum.asInstanceOf[Int] / list.length
}
}
def getList(list:List[String]):Unit = {
list match {
case Nil => return
case _ => list foreach println
}
}
exercise()
}
You need to put
var input = readLine("Enter a List")
val cmd = readLine("Entera command")
input=input.replace(" ","")
var input2=input.split(",").toList
part into exercise() function and call it recursively.
This is for asking until You type exit
The second problem is getAverageLeng signature it should return Double not Number,
and change sum.asInstanceOf[Int] to sum.asInstanceOf[Double] in this function.

Scala - replaceAllIn

First off, I'm new to Scala.
I'm trying to make a template parser in Scala (similar to Smarty (PHP)). It needs to search through the document, replacing anything inside "{{ }}" tags, with anything provided in the HashMap.
I'm currently stuck here:
import scala.collection.mutable.HashMap
import scala.io.Source
class Template(filename: String, vars: HashMap[Symbol, Any]) {
def parse() = {
var contents = Source.fromFile(filename, "ASCII").mkString
var rule = """\{\{(.*)\}\}""".r
//for(rule(v) <- rule findAllIn contents) {
// yield v
//}
//rule.replaceAllIn(contents, )
}
}
var t = new Template("FILENAME", new HashMap[Symbol, Any])
println(t.parse)
The part's that I've commented are things that I've thought about doing.
Thanks
I've come a little further...
import scala.collection.mutable.HashMap
import scala.io.Source
import java.util.regex.Pattern
import java.util.regex.Matcher
class Template(filename: String, vars: HashMap[Symbol, Any]) {
def findAndReplace(m: Matcher)(callback: String => String):String = {
val sb = new StringBuffer
while (m.find) {
m.appendReplacement(sb, callback(m.group(1)))
}
m.appendTail(sb)
sb.toString
}
def parse() = {
var contents = Source.fromFile(filename, "ASCII").mkString
val m = Pattern.compile("""\{\{(.*)\}\}""").matcher(contents)
findAndReplace(m){ x => x }
}
}
var t = new Template("FILENAME.html", new HashMap[Symbol, Any])
println(t.parse)
At the moment it just currently adds whatever was inside of the tag, back into the document. I'm wondering if there is an easier way of doing a find-and-replace style regexp in Scala?
I'd do it like this (String as key instead of Symbol):
var s : String = input // line, whatever
val regexp = """pattern""".r
while(regexp findFirstIn s != None) {
s = regexp replaceFirstIn (s, vars(regexp.findFirstIn(s).get))
}
If you prefer not using var, go recursive instead of using while. And, of course, a stringbuilder would be more efficient. In that case, I might do the following:
val regexp = """^(.*?)(?:{{(pattern)}})?""".r
for(subs <- regexp findAllIn s)
subs match {
case regexp(prefix, var) => sb.append(prefix); if (var != null) sb.append("{{"+vars(var)+"}}")
case _ => error("Shouldn't happen")
}
That way you keep appending the non-changing part, followed by the next part to be replaced.
There is a flavor of replaceAllIn in util.matching.Regex that accepts a replacer callback. A short example:
import util.matching.Regex
def replaceVars(r: Regex)(getVar: String => String) = {
def replacement(m: Regex.Match) = {
import java.util.regex.Matcher
require(m.groupCount == 1)
Matcher.quoteReplacement( getVar(m group 1) )
}
(s: String) => r.replaceAllIn(s, replacement _)
}
This is how we would use it:
val r = """\{\{([^{}]+)\}\}""".r
val m = Map("FILENAME" -> "aaa.txt",
"ENCODING" -> "UTF-8")
val template = replaceVars(r)( m.withDefaultValue("UNKNOWN") )
println( template("""whatever input contains {{FILENAME}} and
unknown key {{NOVAL}} and {{FILENAME}} again,
and {{ENCODING}}""") )
Note Matcher.quoteReplacement escapes $ characters in the replacement string. Otherwise you may get java.lang.IllegalArgumentException: Illegal group reference, replaceAll and dollar signs. See the blog post on why this may happen.
Here is also interesting way how to do the same using functions compose:
val Regexp = """\{\{([^{}]+)\}\}""".r
val map = Map("VARIABLE1" -> "VALUE1", "VARIABLE2" -> "VALUE2", "VARIABLE3" -> "VALUE3")
val incomingData = "I'm {{VARIABLE1}}. I'm {{VARIABLE2}}. And I'm {{VARIABLE3}}. And also {{VARIABLE1}}"
def replace(incoming: String) = {
def replace(what: String, `with`: String)(where: String) = where.replace(what, `with`)
val composedReplace = Regexp.findAllMatchIn(incoming).map { m => replace(m.matched, map(m.group(1)))(_) }.reduceLeftOption((lf, rf) => lf compose rf).getOrElse(identity[String](_))
composedReplace(incomingData)
}
println(replace(incomingData))
//OUTPUT: I'm VALUE1. I'm VALUE2. And I'm VALUE3. And also VALUE1