Scala - replaceAllIn - regex

First off, I'm new to Scala.
I'm trying to make a template parser in Scala (similar to Smarty (PHP)). It needs to search through the document, replacing anything inside "{{ }}" tags, with anything provided in the HashMap.
I'm currently stuck here:
import scala.collection.mutable.HashMap
import scala.io.Source
class Template(filename: String, vars: HashMap[Symbol, Any]) {
def parse() = {
var contents = Source.fromFile(filename, "ASCII").mkString
var rule = """\{\{(.*)\}\}""".r
//for(rule(v) <- rule findAllIn contents) {
// yield v
//}
//rule.replaceAllIn(contents, )
}
}
var t = new Template("FILENAME", new HashMap[Symbol, Any])
println(t.parse)
The part's that I've commented are things that I've thought about doing.
Thanks
I've come a little further...
import scala.collection.mutable.HashMap
import scala.io.Source
import java.util.regex.Pattern
import java.util.regex.Matcher
class Template(filename: String, vars: HashMap[Symbol, Any]) {
def findAndReplace(m: Matcher)(callback: String => String):String = {
val sb = new StringBuffer
while (m.find) {
m.appendReplacement(sb, callback(m.group(1)))
}
m.appendTail(sb)
sb.toString
}
def parse() = {
var contents = Source.fromFile(filename, "ASCII").mkString
val m = Pattern.compile("""\{\{(.*)\}\}""").matcher(contents)
findAndReplace(m){ x => x }
}
}
var t = new Template("FILENAME.html", new HashMap[Symbol, Any])
println(t.parse)
At the moment it just currently adds whatever was inside of the tag, back into the document. I'm wondering if there is an easier way of doing a find-and-replace style regexp in Scala?

I'd do it like this (String as key instead of Symbol):
var s : String = input // line, whatever
val regexp = """pattern""".r
while(regexp findFirstIn s != None) {
s = regexp replaceFirstIn (s, vars(regexp.findFirstIn(s).get))
}
If you prefer not using var, go recursive instead of using while. And, of course, a stringbuilder would be more efficient. In that case, I might do the following:
val regexp = """^(.*?)(?:{{(pattern)}})?""".r
for(subs <- regexp findAllIn s)
subs match {
case regexp(prefix, var) => sb.append(prefix); if (var != null) sb.append("{{"+vars(var)+"}}")
case _ => error("Shouldn't happen")
}
That way you keep appending the non-changing part, followed by the next part to be replaced.

There is a flavor of replaceAllIn in util.matching.Regex that accepts a replacer callback. A short example:
import util.matching.Regex
def replaceVars(r: Regex)(getVar: String => String) = {
def replacement(m: Regex.Match) = {
import java.util.regex.Matcher
require(m.groupCount == 1)
Matcher.quoteReplacement( getVar(m group 1) )
}
(s: String) => r.replaceAllIn(s, replacement _)
}
This is how we would use it:
val r = """\{\{([^{}]+)\}\}""".r
val m = Map("FILENAME" -> "aaa.txt",
"ENCODING" -> "UTF-8")
val template = replaceVars(r)( m.withDefaultValue("UNKNOWN") )
println( template("""whatever input contains {{FILENAME}} and
unknown key {{NOVAL}} and {{FILENAME}} again,
and {{ENCODING}}""") )
Note Matcher.quoteReplacement escapes $ characters in the replacement string. Otherwise you may get java.lang.IllegalArgumentException: Illegal group reference, replaceAll and dollar signs. See the blog post on why this may happen.

Here is also interesting way how to do the same using functions compose:
val Regexp = """\{\{([^{}]+)\}\}""".r
val map = Map("VARIABLE1" -> "VALUE1", "VARIABLE2" -> "VALUE2", "VARIABLE3" -> "VALUE3")
val incomingData = "I'm {{VARIABLE1}}. I'm {{VARIABLE2}}. And I'm {{VARIABLE3}}. And also {{VARIABLE1}}"
def replace(incoming: String) = {
def replace(what: String, `with`: String)(where: String) = where.replace(what, `with`)
val composedReplace = Regexp.findAllMatchIn(incoming).map { m => replace(m.matched, map(m.group(1)))(_) }.reduceLeftOption((lf, rf) => lf compose rf).getOrElse(identity[String](_))
composedReplace(incomingData)
}
println(replace(incomingData))
//OUTPUT: I'm VALUE1. I'm VALUE2. And I'm VALUE3. And also VALUE1

Related

scala code for regex pattern matching

I am new to scala. I am trying something on the regular expression pattern matching. I am following the example from here: https://alvinalexander.com/scala/how-to-extract-parts-strings-match-regular-expression-regex-scala
Given below is the code that I have written which works but is obviously not the best way.
Scenario: I have a regex pattern with me.
"([a-z0-9]+)_([0-9]+)_([v|V][0-9]+)_(\\d{4})(\\d{2})(\\d{2}).(xls|xlsx)".r
I have a string that defines what I am expecting for a given scenario. val param = "manufacturer/order/version"
Question: I don't want to pass hardcoded values in case pattern(manufacturer, order, version) but get the output in the variables manufacturer, order and version? One way is defining all the variables initially, but that would mean changing the code every time i need to change a string. Is there a way to do it dynamically or a better way of using regex in scala.
package com.testing
class DynamicFolder() {
def dynamicPath(fileName: String): Map[String, String] = {
println("File Name: " + fileName)
val param = "manufacturer/order/version"
var patternString = param.replaceAll("/", ", ")
println(patternString)
val pattern = "([a-z0-9]+)_([0-9]+)_([v|V][0-9]+)_(\\d{4})(\\d{2})(\\d{2}).(xls|xlsx)".r
val paramMap: Map[String, String] = fileName match {
case pattern(manufacturer, order, version) => {
println(s"Manufacturer: $manufacturer, Order: $order, version: $version")
Map("manufacturer" -> manufacturer, "order" -> order, "version" -> version)
}
case pattern(manufacturer, order, version, yyyy, mm, dd, format) => {
println(s"Manufacturer: $manufacturer, Order: $order, version: $version")
Map("manufacturer" -> manufacturer, "order" -> order, "version" -> version)
}
case _ => throw new IllegalArgumentException
}
paramMap
}
}
object hello {
def main(args: Array[String]): Unit = {
var dynamicFolder = new DynamicFolder
val fileName = "man1_18356_v1_20180202.xls"
val tgtParams = dynamicFolder.dynamicPath(fileName)
var tgtPath = ""
for ((k, v) <- tgtParams) {
printf("key: %s, value: %s\n", k, v)
tgtPath = tgtPath + "/" + tgtParams(k)
}
println ("Target path: "+tgtPath)
}
}
Output of the code:
File Name: man1_18356_v1_20180202.xls
manufacturer, version, order
Manufacturer: man1, Order: 18356, version: v1
key: manufacturer, value: man1
key: order, value: 18356
key: version, value: v1
Target path: /man1/18356/v1
Thanks!
This is how you can collect all groups and process them yourself:
val paramMap: Map[String, String] = fileName match {
case pattern(groups#_*) if groups.nonEmpty => {
// Access group with groups(0), groups(1) etc.
}
case _ => throw new IllegalArgumentException
}

How to replace a string with its uppercase with NSRegularExpression in Swift?

I was trying to change hello_world to helloWorld by this snippet of code (Swift 3.0):
import Foundation
let oldLine = "hello_world"
let fullRange = NSRange(location: 0, length: oldLine.characters.count)
let newLine = NSMutableString(string: oldLine)
let regex = try! NSRegularExpression(pattern: "(_)(\\w)", options: [])
regex.replaceMatches(in: newLine, options: [], range: fullRange,
withTemplate: "\\L$2")
The result was newLine = "helloLworld"
I used "\\L$2" as template because I saw this answer: https://stackoverflow.com/a/20742304/5282792 saying \L$2 is the pattern for the second group's uppercase in replacement template. But it didn't work in NSRegularExpression.
So can I replace a string with its uppercase with a replacement template pattern in NSRegularExpression.
One way to work with your case is subclassing NSRegularExpression and override replacementString(for:in:offset:template:) method.
class ToUpperRegex: NSRegularExpression {
override func replacementString(for result: NSTextCheckingResult, in string: String, offset: Int, template templ: String) -> String {
guard result.numberOfRanges > 2 else {
return ""
}
let matchingString = (string as NSString).substring(with: result.rangeAt(2)) as String
return matchingString.uppercased()
}
}
let oldLine = "hello_world"
let fullRange = NSRange(0..<oldLine.utf16.count) //<-
let tuRegex = try! ToUpperRegex(pattern: "(_)(\\w)")
let newLine = tuRegex.stringByReplacingMatches(in: oldLine, range: fullRange, withTemplate: "")
print(newLine) //->helloWorld
This doesn't answer the question pertaining regex, but might be of interest for readers not necessarily needing to use regex to perform this task (rather, using native Swift)
extension String {
func camelCased(givenSeparators separators: [Character]) -> String {
let charChunks = characters.split { separators.contains($0) }
guard let firstChunk = charChunks.first else { return self }
return String(firstChunk).lowercased() + charChunks.dropFirst()
.map { String($0).onlyFirstCharacterUppercased }.joined()
}
// helper (uppercase first char, lowercase rest)
var onlyFirstCharacterUppercased: String {
let chars = characters
guard let firstChar = chars.first else { return self }
return String(firstChar).uppercased() + String(chars.dropFirst()).lowercased()
}
}
/* Example usage */
let oldLine1 = "hello_world"
let oldLine2 = "fOo_baR BAX BaZ_fOX"
print(oldLine1.camelCased(givenSeparators: ["_"])) // helloWorld
print(oldLine2.camelCased(givenSeparators: ["_", " "])) // fooBarBazBazFox

Scala Regex pattern matching issue when using |

This is my example code.
object Patterns {
val workingPattern = """^thisworks[\w]+""".r
val problemPattern = """^(fail|error|bs|meh)[\w]+""".r
}
object TestMaker {
var works = scala.collection.mutable.Set[String]()
var needsWork = scala.collection.mutable.Set[String]()
var junk = scala.collection.mutable.Set[String]()
def add(someInput: String) = someInput match {
case Patterns.workingPattern() => works.update(someInput, true)
case Patterns.problemPattern() => needsWork.update(someInput, true)
case _ => junk.update(someInput, true)
}
}
When I call TestMaker.add("thisworks1234"), the string "thisworks1234" gets inserted into TestMaker's works set. It works as expected.
When I call TestMaker.add("this_is_just_junk"), the string "this_is_just_junk" gets inserted into the junk set - also as expected.
Here's the problem. When I call TestMaker.add("fail1234"), that string will also be inserted into the junk set. It should however be inserted into the needsWork set.
Where's my mistake?
You should use a non-capturing group with the second regex:
val problemPattern = """^(?:fail|error|bs|meh)[\w]+""".r
^^^
This is required because you are not referencing the captured value in your case.
Note that you can still use capturing groups within your patterns to ignore them later while matching with _*:
case Patterns.workingPattern(_*) => works.update(someInput, true)
case Patterns.problemPattern(_*) => needsWork.update(someInput, true)
case _ => junk.update(someInput, true)
See the IDEONE demo:
object Main extends App {
TestMaker.add("this_is_just_junk")
TestMaker.add("fail1234")
println(TestMaker.needsWork) // => Set(fail1234)
println(TestMaker.junk) // => Set(this_is_just_junk)
}
object Patterns {
val workingPattern = """^thisworks[\w]+""".r
val problemPattern = """^(fail|error|bs|meh)[\w]+""".r
}
object TestMaker {
var works = scala.collection.mutable.Set[String]()
var needsWork = scala.collection.mutable.Set[String]()
var junk = scala.collection.mutable.Set[String]()
def add(someInput: String) = someInput match {
case Patterns.workingPattern(_*) => works.update(someInput, true)
case Patterns.problemPattern(_*) => needsWork.update(someInput, true)
case _ => junk.update(someInput, true)
}
}

Get object of case class from regex match

i'm working on scraping data from a webpage with scala regex-es, but i encountered problem with parsing result to object of some case class-es.
In following snippet i managed to scrape all the data, but i have no clue how to parse 3 elements from an iterator. I thought about something like:
val a :: b :: c :: _ = result.group(0).iDontKnowWha
Any ideas what can i do?
import model.FuneralSchedule
import play.api.libs.json.Json
import scala.io.Source
var date = "2015-05-05"
val source = Source.fromURL("http://zck.krakow.pl/?pageId=16&date=" + date).mkString
val regex = "(?s)<table>.+?(Cmentarz.+?)<.+?</table>".r
var thing: List[FuneralSchedule] = List()
var jsonFeed: List[Funeral] = List()
val regMatcher = "("
case class Funeral(hour: String, who: String, age: String) {
override def toString: String = {
"Cos"
}
}
//implicit val format = Json.format[Funeral]
val out = regex.findAllIn(source).matchData foreach { table =>
thing ::= FuneralSchedule(table.group(1), clearStrings(table.group(0)))
"""<tr\s?>.+?</\s?tr>""".r.findAllIn(clearStrings(table.group(0))).matchData foreach { tr =>
//TODO: Naprawic bo szlak trafia wydajnosc
val temp = """<td\s?>.+?</\s?td>""".r.findAllIn(tr.group(0)).matchData.foreach {
elem => println(elem)
}
//println(Json.toJson(thingy))
}
println("Koniec tabeli")
}
thing
//Json.toJson(jsonFeed)
println(removeMarkers("<td > <td> Marian Debil </ td>"))
def removeMarkers(s: String) = {
s.replaceAll( """(</?\s?td\s?>)""", "")
}
def clearStrings(s: String) = {
val regex = "((class=\".+?\")|(id=\".+?\")|(style=\".+?\")|(\\n))"
s.replaceAll(regex, "")
}
One way of doing it would be converting it to a Stream and matching it using stream's operators like this:
val a #:: b #:: c #:: _ = """([a-z]){1}""".r.findAllIn("a b c").toStream
then a, b and c is what you're looking for

Calling arbitrary number of WS.url().get() in sequence

I have a List[String] of URLs that I want to load and process (parse, store to database) in sequence.
I found only fixed-length examples, like:
def readUrls = Action {
implicit request => {
implicit val context = scala.concurrent.ExecutionContext.Implicits.global
val url1 = "http://some-website.com"
val url2 = "http://other-website.com"
Async {
for {
result1 <- WS.url(url1).get()
result2 <- WS.url(url2).get()
} yield {
Ok(result1.body + result2.body)
}
}
}
But instead of url1 and url2, I need to process this puppy:
val urls = List("http://some-website.com", "http://other-website.com")
Thanks a bunch for any tips and advice!
If you want to chain Futures together arbitrarily in sequence, foldLeft ought to do the job:
urls.foldLeft(Future.successful[String]("")){ case (left, nextUrl) =>
left.flatMap{ aggregatedResult =>
WS.url(nextUrl).get().map( newResult =>
aggregatedResult + newResult.body
)
}
}
Since you're just combining the request bodies together, I gave the foldLeft an initial value of a Future empty String, which each step in the fold will then add on the next response body.
def executeUrls(urls: List[String]): Future[String] = {
urls.foldLeft(Future(""))((accumulator, url) => {
accumulator.flatMap(acc => {
WS.url(url).get().map(response => {
acc + response.body
})
}
})
}
This should be what you're looking for, note that it returns a new Future.
Edit: apparently LimbSoup was faster.