Replace a few substrings in Scala - regex

Suppose I need to replace few patterns in a string :
val rules = Map("abc" -> "123", "d" -> "4", "efg" -> "5"} // string to string
def replace(input:String, rules: Map[String, String] = {...}
replace("xyz", rules) // returns xyz
replace("abc123", rules) // returns 123123
replace("dddxyzefg", rules) // returns 444xyz5
How would you implement replace in Scala ? How would you generalize the solution for rules : Map[Regex, String] ?

It's probably easier just to go straight to the general case:
val replacements = Map("abc".r -> "123", "d".r -> "4", "efg".r -> "5")
val original = "I know my abc's AND my d's AND my efg's!"
val replaced = replacements.foldLeft(original) { (s, r) => r._1.replaceAllIn(s, r._2) }
replaced: String = I know my 123's AND my 4's AND my 5's!

Related

Compare of values from two lists by use of regular expressions in Kotlin

I have two lists. The first contains original product data as following:
data class InputProductData (val optionFamilyInput: String?, val optionCodeInput: String?, val optionDescriptionInput: String?)
val inputProductData = mutableListOf(
InputProductData("AAA", "111","Chimney with red bricks"),
InputProductData(null,"222","Two wide windows in the main floor"),
InputProductData("CCCC",null,"Beautiful door in green color"),
InputProductData("DDDD",null,"House with area 120 square meters"),
InputProductData(null,"555","Old wood windows")
)
Second list consists of customizing data. The list can have many identical option ids (first column).
data class CustomizingProductOption(val id: Int, val optionName: String, val optionCategory: String, val optionFamily: String?, val optionCode: String?, val searchPattern: String?, val outputValue: String)
val customizingProductOptions = mutableListOf(
CustomizingProductOption(10001, "Chimney", "Additional options", "^AAA$", "", "^Chimney with", "Available"),
CustomizingProductOption(10002, "Windows", "Basic options", "", "^222$", "^Two wide windows", "Available"),
CustomizingProductOption(10002, "Windows", "Basic options", "", "^555$", "wood windows$", "Available"),
CustomizingProductOption(10003, "Door color", "Basic options", "^CCCC$", "", "door in green color$", "green"),
CustomizingProductOption(10004, "House area", "Basic options", "^DDD", "", "120 square meters", "120")
)
The target is to check the product input data and to identify different product options. Whitin the following loop it is done by use of a business logic. There are 2 different constelations which can occure:
Option family + regex within option description
Option code + regex within option description
data class IndicatedOptions(val id: Int, val output: String)
val indicatedOptions: MutableList<IndicatedOptions> = mutableListOf()
for (i in 0 until inputProductData.size) {
for (k in 0 until customizingProductOptions.size) {
if(inputProductData[i].optionFamilyInput.toString().contains(Regex(customizingProductOptions[k].optionFamily.toString())) == true &&
inputProductData[i].optionDescriptionInput.toString().contains(Regex(customizingProductOptions[k].searchPattern.toString())) == true ||
inputProductData[i].optionCodeInput.toString().contains(Regex(customizingProductOptions[k].optionCode.toString())) == true &&
inputProductData[i].optionDescriptionInput.toString().contains(Regex(customizingProductOptions[k].searchPattern.toString())) == true) {
indicatedOptions.add(IndicatedOptions(customizingProductOptions[k].id, customizingProductOptions[k].outputValue))
}
}
}
println("\n--- ALL INDICATED OPTIONS ---")
indicatedOptions.forEach { println(it) }
val indicatedOptionsUnique = indicatedOptions.distinct().sortedBy { it.id }
println("\n--- UNIQUE INDICATED OPTIONS ---")
indicatedOptionsUnique.forEach {println(it)}
QUESTION: Do you see any ways to optimize this codein order to get it more faster?
First, the "regex" code looks broken. Why do you test if a String contains a Regex? This is the wrong way around you would normally test a Regex to see if the target string is matched by the Regex.
Ideas for performance
Precompile your Regex in the constructor of CustomizingProductOption
Your if logic is 4 logic ANDs. The code executes first to last in a logical expressions, so arrange the first test to be the one that is most selective (i.e. have the least number of matches).
Ideas for readability
use proper streams, e.g. inputProductData.map { customizingProductOptions.filter { LOGIC } }...
Stop using unnecessary toString() on something that is already a String
Stop testing if a boolean expression ==true
Now with sample code:
# Use Regex class here
data class CustomizingProductOption(
val id: Int, val optionName: String, val optionCategory: String,
val optionFamily: Regex?, val optionCode: Regex?, val searchPattern: String?,
val outputValue: String,
)
# Instantiate like this:
CustomizingProductOption(
10001, "Chimney", "Additional options", Regex("^AAA$"),
null, "^Chimney with", "Available",
),
# main code
val indicatedOptions: List<IndicatedOptions> = inputProductData.map { productData ->
customizingProductOptions.filter { option -> // this filter will only return matching options to product data
productData.optionFamilyInput != null && option.optionFamily?.containsMatchIn(productData.optionFamilyInput) ?: false
//&& other conditions
}
.map {option -> // transform to your desired output
IndicatedOptions(
option.id,
option.outputValue,
)
}
}.flatten() // you need this to flatten List<List<IndicatedOptions>>

Scala: string pattern matching and splitting

I am new to Scala and want to create a function to split Hello123 or Hello 123 into two strings as follows:
val string1 = 123
val string2 = Hello
What is the best way to do it, I have attempted to use regex matching \\d and \\D but I am not sure how to write the function fully.
Regards
You may replace with 0+ whitespaces (\s*+) that are preceded with letters and followed with digits:
var str = "Hello123"
val res = str.split("(?<=[a-zA-Z])\\s*+(?=\\d)")
println(res.deep.mkString(", ")) // => Hello, 123
See the online Scala demo
Pattern details:
(?<=[a-zA-Z]) - a positive lookbehind that only checks (but does not consume the matched text) if there is an ASCII letter before the current position in the string
\\s*+ - matches (consumes) zero or more spaces possessively, i.e.
(?=\\d) - this check is performed only once after the whitespaces - if any - were matched, and it requires a digit to appear right after the current position in the string.
Based on the given string I assume you have to match a string and a number with any number of spaces in between
here is the regex for that
([a-zA-Z]+)\\s*(\\d+)
Now create a regex object using .r
"([a-zA-Z]+)\\s*(\\d+)".r
Scala REPL
scala> val regex = "([a-zA-Z]+)\\s*(\\d+)".r
scala> val regex(a, b) = "hello 123"
a: String = "hello"
b: String = "123"
scala> val regex(a, b) = "hello123"
a: String = "hello"
b: String = "123"
Function to handle pattern matching safely
pattern match with extractors
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
Here is the function which does Regex with Pattern matching
def matchStr(str: String): Option[(String, Int)] = {
val regex = "([a-zA-Z]+)\\s*(\\d+)".r
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
}
Scala REPL
scala> def matchStr(str: String): Option[(String, Int)] = {
val regex = "([a-zA-Z]+)\\s*(\\d+)".r
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
}
defined function matchStr
scala> matchStr("Hello123")
res41: Option[(String, Int)] = Some(("Hello", 123))
scala> matchStr("Hello 123")
res42: Option[(String, Int)] = Some(("Hello", 123))

Multi capturing groups in scala regex

I'm trying to go from
val s: String = "sometextHere[a][b][c]"
to
val x = "sometextHere"
val y = List("a", "b", "c")
The number of "[...]" is 1+.
I've got something pretty hacky but I feel like there must be a better solution
val bracketMatcher = "\\[(\\w+)\\]".r
val listMatcher = s"^(\\w+)((?:$bracketMatcher)+)".r
listMatcher.findAllIn(chunk) match {
case matchIterator if matchIterator.hasNext =>
val matchData = matchIterator.matchData.next()
val indexesMatch = bracketMatcher.findAllIn(matchData.group(2)).matchData.flatMap(_.subgroups).toList
val a = matchData.group(1) // This is "sometextHere"
val b = indexesMatch // This is List("a", "b", "c")
case _ => ...
Regexes are easier to write in triple quotes. Also, you don't have to match the entire thing at once:
def allMatches(s: String): (String, List[String]) = {
val bracketMatcher = """\[(\w+)\]""".r
val startMatcher = """^(\w+)\[""".r
val first = startMatcher.findFirstMatchIn(s).get.group(1)
val matches = bracketMatcher.findAllMatchIn(s)
val indexes = matches.map(_.group(1)).toList
(first, indexes)
}
allMatches("sometextHere[a][b][c]")
Robert gave a good warning, though. Make sure your input data has no nesting, or you won't be able to handle it with regular expressions. If you have nesting, you'll have to use a proper parser.

How to split string by delimiter in scala?

I have a string like this:
val str = "3.2.1"
And I want to do some manipulations based on it.
I will share also what I want to do and it will be nice if you can share your suggestions:
im doing automation for some website, and based on this string I need to do some actions.
So:
the first digit - I will need to choose by value: value="str[0]"
the second digit - I will need to choose by value: value="str[0]+"."+str[1]"
the third digit - I will need to choose by value: value="str[0]+"."+str[1]+"."+str[2]"
as you can see the second field i need to choose is the name firstdigit.seconddigit and the third field is firstdigit.seconddigit.thirddigit
You can use pattern matching for this.
First create regex:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: util.matching.Regex = (\d+)\.(\d+)\.(\d+)
then you can use it to pattern match:
# "3.4.342" match { case pattern(a, b, c) => println(a, b, c) }
(3,4,342)
if you don't need all numbers you can for example do this
"1.2.0" match { case pattern(a, _, _) => println(a) }
1
if you want to for example to take just first two numbers you can do
# val twoNumbers = "1.2.0" match { case pattern(a, b, _) => s"$a.$b" }
twoNumbers: String = "1.2"
Can only add to #Lukasz's answer one more variant with the values extration:
# val pattern = """(\d+)\.(\d+)\.(\d+)""".r
pattern: scala.util.matching.Regex = (\d+)\.(\d+)\.(\d+)
# val pattern(firstdigit, seconddigit, thirddigit) = "3.2.1"
firstdigit: String = "3"
seconddigit: String = "2"
thirddigit: String = "1"
This way all the values can be treated as regular vals further in the code.
val str="vaquar.khan"
val strArray=str.split("\\.")
strArray.foreach(println)
Try the following:
scala> "3.2.1".split(".")
res0: Array[java.lang.String] = Array(string1, string2, string3)
This one:
object Splitter {
def splitAndAccumulate(string: String) = {
val s = string.split("\\.")
s.tail.scanLeft(s.head){ case (acc, elem) =>
acc + "." + elem
}
}
}
passes this test:
test("Simple"){
val t = Splitter.splitAndAccumulate("1.2.3")
val answers = Seq("1", "1.2", "1.2.3")
t.zip(answers).foreach{ case (l, r) =>
assert(l == r)
}
}

Scala Regex Pattern Matching

I need to use a regex to match a pattern in Scala and I currently have a Regex that is
InputPattern: scala.util.matching.Regex = put (.*) in (.*)
When I run the follwing this happens:
scala> val InputPattern(verb, item, prep, obj) = "put a in b";
scala.MatchError: put a in b (of class java.lang.String)
... 33 elided
I would like it to end up with verb("put"), item("a"), prep("in"), and obj("b") for input "put a in b" and also verb("put"), item(""), prep("in"), and obj("") for input "put in".
Thanks
This works for your special cases :
scala> val InputPattern = "(put) (.*?) ?(in) ?(.*?)".r
InputPattern: scala.util.matching.Regex = (put) (.*) ?(in) ?(.*)
scala> val InputPattern(verb, item, prep, obj) = "put a in b"
verb: String = put
item: String = a
prep: String = in
obj: String = b
scala> val InputPattern(verb, item, prep, obj) = "put in"
verb: String = put
item: String = ""
prep: String = in
obj: String = ""
put and in here are also captured in groups to participate in pattern matching. I also used lazy regexps (.*?) to capture as less as possible, you may replace it with (\S*). ? gives you optional space to match
"put in" (with one space between put and in and no space at the end).
But be aware of this:
scala> val InputPattern(verb, item, prep, obj) = "put ainb"
verb: String = put
item: String = a
prep: String = in
obj: String = b
scala> val InputPattern(verb, item, prep, obj) = "put aininb"
verb: String = put
item: String = a
prep: String = in
obj: String = inb
scala> val InputPattern(verb, item, prep, obj) = "put ain"
verb: String = put
item: String = a
prep: String = in
obj: String = ""
If you have simple command interpreter it may be even good, otherwise you should match your special cases separately.
To process a simple (not natural) language, you may also consider StandardTokenParsers, as they are context-free (Chomsky type 2):
import scala.util.parsing.combinator.syntactical._
val p = new StandardTokenParsers {
lexical.reserved ++= List("put", "in")
def p = "put" ~ opt(ident) ~ "in" ~ opt(ident)
}
scala> p.p(new p.lexical.Scanner("put a in b"))
warning: there was one feature warning; re-run with -feature for details
res13 = [1.11] parsed: (((put~Some(a))~in)~Some(b))
scala> p.p(new p.lexical.Scanner("put in"))
warning: there was one feature warning; re-run with -feature for details
res14 = [1.7] parsed: (((put~None)~in)~None)
You can write one regexp for all cases, but I'm not sure it would be readable and maintainable. I prefer simple approach:
val pattern1 = "(put) (.*) (in) (.*)".r
val pattern2 = "(put) (in)".r
def parse(text: String) = text match {
case pattern1(verb, item, prep, obj) => (verb, item, prep, obj);
case pattern2(verb, prep) => (verb, "", prep, "")
}
scala> parse("put a in b")
res6: (String, String, String, String) = (put,a,in,b)
scala> parse("put in")
res7: (String, String, String, String) = (put,"",in,"")
And one extra notion: I hope you know what you are doing! RegEx is a Chomsky Type 3 grammar and natural language is much more complex. If you need natural language parser, you can use already available solution such as Stanford NLP parser.