scala code for regex pattern matching - regex

I am new to scala. I am trying something on the regular expression pattern matching. I am following the example from here: https://alvinalexander.com/scala/how-to-extract-parts-strings-match-regular-expression-regex-scala
Given below is the code that I have written which works but is obviously not the best way.
Scenario: I have a regex pattern with me.
"([a-z0-9]+)_([0-9]+)_([v|V][0-9]+)_(\\d{4})(\\d{2})(\\d{2}).(xls|xlsx)".r
I have a string that defines what I am expecting for a given scenario. val param = "manufacturer/order/version"
Question: I don't want to pass hardcoded values in case pattern(manufacturer, order, version) but get the output in the variables manufacturer, order and version? One way is defining all the variables initially, but that would mean changing the code every time i need to change a string. Is there a way to do it dynamically or a better way of using regex in scala.
package com.testing
class DynamicFolder() {
def dynamicPath(fileName: String): Map[String, String] = {
println("File Name: " + fileName)
val param = "manufacturer/order/version"
var patternString = param.replaceAll("/", ", ")
println(patternString)
val pattern = "([a-z0-9]+)_([0-9]+)_([v|V][0-9]+)_(\\d{4})(\\d{2})(\\d{2}).(xls|xlsx)".r
val paramMap: Map[String, String] = fileName match {
case pattern(manufacturer, order, version) => {
println(s"Manufacturer: $manufacturer, Order: $order, version: $version")
Map("manufacturer" -> manufacturer, "order" -> order, "version" -> version)
}
case pattern(manufacturer, order, version, yyyy, mm, dd, format) => {
println(s"Manufacturer: $manufacturer, Order: $order, version: $version")
Map("manufacturer" -> manufacturer, "order" -> order, "version" -> version)
}
case _ => throw new IllegalArgumentException
}
paramMap
}
}
object hello {
def main(args: Array[String]): Unit = {
var dynamicFolder = new DynamicFolder
val fileName = "man1_18356_v1_20180202.xls"
val tgtParams = dynamicFolder.dynamicPath(fileName)
var tgtPath = ""
for ((k, v) <- tgtParams) {
printf("key: %s, value: %s\n", k, v)
tgtPath = tgtPath + "/" + tgtParams(k)
}
println ("Target path: "+tgtPath)
}
}
Output of the code:
File Name: man1_18356_v1_20180202.xls
manufacturer, version, order
Manufacturer: man1, Order: 18356, version: v1
key: manufacturer, value: man1
key: order, value: 18356
key: version, value: v1
Target path: /man1/18356/v1
Thanks!

This is how you can collect all groups and process them yourself:
val paramMap: Map[String, String] = fileName match {
case pattern(groups#_*) if groups.nonEmpty => {
// Access group with groups(0), groups(1) etc.
}
case _ => throw new IllegalArgumentException
}

Related

Swift: Finding an Object Property via regex

Target: The following function shall iterate over an array of objects and check a specific property of all objects. This property is a string and shall be matched with a user input via regex. If there's a match the object shall be added to an array which will further be passed to another function.
Problem: I don't know how to set up regex in Swift 3. I'm rather new in Swift at all, so an easily understandable solution would be very helpful :)
How it currently looks like:
func searchItems() -> [Item] {
var matches: [Item] = []
if let input = readLine() {
for item in Storage.storage.items { //items is a list of objects
if let query = //regex with query and item.name goes here {
matches.append(item)
}
}
return matches
} else {
print("Please type in what you're looking for.")
return searchItems()
}
}
This is what Item looks like (snippet):
class Item: CustomStringConvertible {
var name: String = ""
var amount: Int = 0
var price: Float = 0.00
var tags: [String] = []
var description: String {
if self.amount > 0 {
return "\(self.name) (\(self.amount) pcs. in storage) - \(price) €"
} else {
return "\(self.name) (SOLD OUT!!!) - \(price) €"
}
}
init(name: String, price: Float, amount: Int = 0) {
self.name = name
self.price = price
self.amount = amount
}
}
extension Item: Equatable {
static func ==(lhs: Item, rhs: Item) -> Bool {
return lhs.name == rhs.name
}
}
Solved. I just edited this post to get a badge :D
For the purpose of letting the answer to be generic and clear, I will assume that the Item model is:
struct Item {
var email = ""
}
Consider that the output should be a filtered array of items that contains items with only valid email.
For such a functionality, you should use NSRegularExpression:
The NSRegularExpression class is used to represent and apply regular
expressions to Unicode strings. An instance of this class is an
immutable representation of a compiled regular expression pattern and
various option flags.
According to the following function:
func isMatches(_ regex: String, _ string: String) -> Bool {
do {
let regex = try NSRegularExpression(pattern: regex)
let matches = regex.matches(in: string, range: NSRange(location: 0, length: string.characters.count))
return matches.count != 0
} catch {
print("Something went wrong! Error: \(error.localizedDescription)")
}
return false
}
You can decide if the given string does matches the given regex.
Back to the example, consider that you have the following array of Item Model:
let items = [Item(email: "invalid email"),
Item(email: "email#email.com"),
Item(email: "Hello!"),
Item(email: "example#example.net")]
You can get the filtered array by using filter(_:) method:
Returns an array containing, in order, the elements of the sequence
that satisfy the given predicate.
as follows:
let emailRegex = "[A-Z0-9a-z._%+-]+#[A-Za-z0-9.-]+\\.[A-Za-z]{2,}"
let emailItems = items.filter {
isMatches(emailRegex, $0.email)
}
print(emailItems) // [Item(email: "email#email.com"), Item(email: "example#example.net")]
Hope this helped.
You can do the same with filter function
let matches = Storage.storage.items.filter({ $0.yourStringPropertyHere == input })

trying to parse a Localizable.string file for a small project in swift on MacOS

I'm trying to parse a Localizable.string file for a small project in swift on MacOS.
I just want to retrieve all the keys and values inside a file to sort them into a dictionary.
To do so I used regex with the NSRegularExpression cocoa class.
Here is what those file look like :
"key 1" = "Value 1";
"key 2" = "Value 2";
"key 3" = "Value 3";
Here is my code that is supposed to get the keys and values from the file loaded into a String :
static func getDictionaryFormText(text: String) -> [String: String] {
var dict: [String : String] = [:]
let exp = "\"(.*)\"[ ]*=[ ]*\"(.*)\";"
for line in text.components(separatedBy: "\n") {
let match = self.matches(for: exp, in: line)
// Following line can be uncommented when working
//dict[match[0]] = match[1]
print("(\(match.count)) matches = \(match)")
}
return dict
}
static func matches(for regex: String, in text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex)
let nsString = text as NSString
let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
return results.map { nsString.substring(with: $0.range) }
} catch let error as NSError {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
When running this code with the provided Localizable example here is the output :
(1) matches = ["\"key 1\" = \"Value 1\";"]
(1) matches = ["\"key 2\" = \"Value 2\";"]
(1) matches = ["\"key 3\" = \"Value 3\";"]
It sounds like the match doesn't stop after the first " occurence. When i try the same expression \"(.*)\"[ ]*=[ ]*\"(.*)\"; on regex101.com the output is correct though. What am i doing wrong ?
Your function (from Swift extract regex matches ?) matches the entire pattern
only. If you are interested in the particular capture groups then
you have to access them with rangeAt() as for example in
Convert a JavaScript Regex to a Swift Regex (not yet updated for Swift 3).
However there is a much simpler solution, because .strings files actually use one possible format of property lists, and
can be directly read into a dictionary. Example:
if let url = Bundle.main.url(forResource: "Localizable", withExtension: "strings"),
let stringsDict = NSDictionary(contentsOf: url) as? [String: String] {
print(stringsDict)
}
Output:
["key 1": "Value 1", "key 2": "Value 2", "key 3": "Value 3"]
For anyone interested I got the original function working. I needed it for a small command-line script where the NSDictionary(contentsOf: URL) wasn't working.
func matches(for regex: String, in text: String) -> [String] {
do {
let regex = try NSRegularExpression(pattern: regex)
let nsString = text as NSString
guard let result = regex.firstMatch(in: text, options: [], range: NSRange(location: 0, length: nsString.length)) else {
return [] // pattern does not match the string
}
return (1 ..< result.numberOfRanges).map {
nsString.substring(with: result.range(at: $0))
}
} catch let error as NSError {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
func getParsedText(text: String) -> [(key: String, text: String)] {
var dict: [(key: String, text: String)] = []
let exp = "\"(.*)\"[ ]*=[ ]*\"(.*)\";"
for line in text.components(separatedBy: "\n") {
let match = matches(for: exp, in: line)
if match.count == 2 {
dict.append((key: match[0], text: match[1]))
}
}
return dict
}
Call it using something like this.
let text = try! String(contentsOf: url, encoding: .utf8)
let stringDict = getParsedText(text: text)
Really nice solution parsing directly to dictionary, but if someone wants to also parse the comments you can use a small library I made for this csv2strings.
import libcsv2strings
let contents: StringsFile = StringsFileParser(stringsFilePath: "path/to/Localizable.strings")?.parse()
It parses the file to a StringsFile model
/// Top level model of a Apple's strings file
public struct StringsFile {
let entries: [Translation]
/// Model of a strings file translation item
public struct Translation {
let translationKey: String
let translation: String
let comment: String?
}
}

Scala Regex pattern matching issue when using |

This is my example code.
object Patterns {
val workingPattern = """^thisworks[\w]+""".r
val problemPattern = """^(fail|error|bs|meh)[\w]+""".r
}
object TestMaker {
var works = scala.collection.mutable.Set[String]()
var needsWork = scala.collection.mutable.Set[String]()
var junk = scala.collection.mutable.Set[String]()
def add(someInput: String) = someInput match {
case Patterns.workingPattern() => works.update(someInput, true)
case Patterns.problemPattern() => needsWork.update(someInput, true)
case _ => junk.update(someInput, true)
}
}
When I call TestMaker.add("thisworks1234"), the string "thisworks1234" gets inserted into TestMaker's works set. It works as expected.
When I call TestMaker.add("this_is_just_junk"), the string "this_is_just_junk" gets inserted into the junk set - also as expected.
Here's the problem. When I call TestMaker.add("fail1234"), that string will also be inserted into the junk set. It should however be inserted into the needsWork set.
Where's my mistake?
You should use a non-capturing group with the second regex:
val problemPattern = """^(?:fail|error|bs|meh)[\w]+""".r
^^^
This is required because you are not referencing the captured value in your case.
Note that you can still use capturing groups within your patterns to ignore them later while matching with _*:
case Patterns.workingPattern(_*) => works.update(someInput, true)
case Patterns.problemPattern(_*) => needsWork.update(someInput, true)
case _ => junk.update(someInput, true)
See the IDEONE demo:
object Main extends App {
TestMaker.add("this_is_just_junk")
TestMaker.add("fail1234")
println(TestMaker.needsWork) // => Set(fail1234)
println(TestMaker.junk) // => Set(this_is_just_junk)
}
object Patterns {
val workingPattern = """^thisworks[\w]+""".r
val problemPattern = """^(fail|error|bs|meh)[\w]+""".r
}
object TestMaker {
var works = scala.collection.mutable.Set[String]()
var needsWork = scala.collection.mutable.Set[String]()
var junk = scala.collection.mutable.Set[String]()
def add(someInput: String) = someInput match {
case Patterns.workingPattern(_*) => works.update(someInput, true)
case Patterns.problemPattern(_*) => needsWork.update(someInput, true)
case _ => junk.update(someInput, true)
}
}

Get object of case class from regex match

i'm working on scraping data from a webpage with scala regex-es, but i encountered problem with parsing result to object of some case class-es.
In following snippet i managed to scrape all the data, but i have no clue how to parse 3 elements from an iterator. I thought about something like:
val a :: b :: c :: _ = result.group(0).iDontKnowWha
Any ideas what can i do?
import model.FuneralSchedule
import play.api.libs.json.Json
import scala.io.Source
var date = "2015-05-05"
val source = Source.fromURL("http://zck.krakow.pl/?pageId=16&date=" + date).mkString
val regex = "(?s)<table>.+?(Cmentarz.+?)<.+?</table>".r
var thing: List[FuneralSchedule] = List()
var jsonFeed: List[Funeral] = List()
val regMatcher = "("
case class Funeral(hour: String, who: String, age: String) {
override def toString: String = {
"Cos"
}
}
//implicit val format = Json.format[Funeral]
val out = regex.findAllIn(source).matchData foreach { table =>
thing ::= FuneralSchedule(table.group(1), clearStrings(table.group(0)))
"""<tr\s?>.+?</\s?tr>""".r.findAllIn(clearStrings(table.group(0))).matchData foreach { tr =>
//TODO: Naprawic bo szlak trafia wydajnosc
val temp = """<td\s?>.+?</\s?td>""".r.findAllIn(tr.group(0)).matchData.foreach {
elem => println(elem)
}
//println(Json.toJson(thingy))
}
println("Koniec tabeli")
}
thing
//Json.toJson(jsonFeed)
println(removeMarkers("<td > <td> Marian Debil </ td>"))
def removeMarkers(s: String) = {
s.replaceAll( """(</?\s?td\s?>)""", "")
}
def clearStrings(s: String) = {
val regex = "((class=\".+?\")|(id=\".+?\")|(style=\".+?\")|(\\n))"
s.replaceAll(regex, "")
}
One way of doing it would be converting it to a Stream and matching it using stream's operators like this:
val a #:: b #:: c #:: _ = """([a-z]){1}""".r.findAllIn("a b c").toStream
then a, b and c is what you're looking for

Scala - replaceAllIn

First off, I'm new to Scala.
I'm trying to make a template parser in Scala (similar to Smarty (PHP)). It needs to search through the document, replacing anything inside "{{ }}" tags, with anything provided in the HashMap.
I'm currently stuck here:
import scala.collection.mutable.HashMap
import scala.io.Source
class Template(filename: String, vars: HashMap[Symbol, Any]) {
def parse() = {
var contents = Source.fromFile(filename, "ASCII").mkString
var rule = """\{\{(.*)\}\}""".r
//for(rule(v) <- rule findAllIn contents) {
// yield v
//}
//rule.replaceAllIn(contents, )
}
}
var t = new Template("FILENAME", new HashMap[Symbol, Any])
println(t.parse)
The part's that I've commented are things that I've thought about doing.
Thanks
I've come a little further...
import scala.collection.mutable.HashMap
import scala.io.Source
import java.util.regex.Pattern
import java.util.regex.Matcher
class Template(filename: String, vars: HashMap[Symbol, Any]) {
def findAndReplace(m: Matcher)(callback: String => String):String = {
val sb = new StringBuffer
while (m.find) {
m.appendReplacement(sb, callback(m.group(1)))
}
m.appendTail(sb)
sb.toString
}
def parse() = {
var contents = Source.fromFile(filename, "ASCII").mkString
val m = Pattern.compile("""\{\{(.*)\}\}""").matcher(contents)
findAndReplace(m){ x => x }
}
}
var t = new Template("FILENAME.html", new HashMap[Symbol, Any])
println(t.parse)
At the moment it just currently adds whatever was inside of the tag, back into the document. I'm wondering if there is an easier way of doing a find-and-replace style regexp in Scala?
I'd do it like this (String as key instead of Symbol):
var s : String = input // line, whatever
val regexp = """pattern""".r
while(regexp findFirstIn s != None) {
s = regexp replaceFirstIn (s, vars(regexp.findFirstIn(s).get))
}
If you prefer not using var, go recursive instead of using while. And, of course, a stringbuilder would be more efficient. In that case, I might do the following:
val regexp = """^(.*?)(?:{{(pattern)}})?""".r
for(subs <- regexp findAllIn s)
subs match {
case regexp(prefix, var) => sb.append(prefix); if (var != null) sb.append("{{"+vars(var)+"}}")
case _ => error("Shouldn't happen")
}
That way you keep appending the non-changing part, followed by the next part to be replaced.
There is a flavor of replaceAllIn in util.matching.Regex that accepts a replacer callback. A short example:
import util.matching.Regex
def replaceVars(r: Regex)(getVar: String => String) = {
def replacement(m: Regex.Match) = {
import java.util.regex.Matcher
require(m.groupCount == 1)
Matcher.quoteReplacement( getVar(m group 1) )
}
(s: String) => r.replaceAllIn(s, replacement _)
}
This is how we would use it:
val r = """\{\{([^{}]+)\}\}""".r
val m = Map("FILENAME" -> "aaa.txt",
"ENCODING" -> "UTF-8")
val template = replaceVars(r)( m.withDefaultValue("UNKNOWN") )
println( template("""whatever input contains {{FILENAME}} and
unknown key {{NOVAL}} and {{FILENAME}} again,
and {{ENCODING}}""") )
Note Matcher.quoteReplacement escapes $ characters in the replacement string. Otherwise you may get java.lang.IllegalArgumentException: Illegal group reference, replaceAll and dollar signs. See the blog post on why this may happen.
Here is also interesting way how to do the same using functions compose:
val Regexp = """\{\{([^{}]+)\}\}""".r
val map = Map("VARIABLE1" -> "VALUE1", "VARIABLE2" -> "VALUE2", "VARIABLE3" -> "VALUE3")
val incomingData = "I'm {{VARIABLE1}}. I'm {{VARIABLE2}}. And I'm {{VARIABLE3}}. And also {{VARIABLE1}}"
def replace(incoming: String) = {
def replace(what: String, `with`: String)(where: String) = where.replace(what, `with`)
val composedReplace = Regexp.findAllMatchIn(incoming).map { m => replace(m.matched, map(m.group(1)))(_) }.reduceLeftOption((lf, rf) => lf compose rf).getOrElse(identity[String](_))
composedReplace(incomingData)
}
println(replace(incomingData))
//OUTPUT: I'm VALUE1. I'm VALUE2. And I'm VALUE3. And also VALUE1