Scala Replacement by Regex - regex

I've a string like
val bar = "M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo".
Now I split this string and define the pattern
val split = bar.split("-")
val pattern = ".*(A|K)\\d.*".r
and now I want to replace A9K9foo in the last entry of 'split'
val last = split.last
val suffix = last match {
case pattern(_) => last replaceFirst ("""(A\d)?(K\d)?.*""", "")
case _ => last
}
What I know is that replaceFirst is executed but it won't replace A9K9foo in my 'last' val
(replaceFirst should only executed if 'last' matches 'pattern'), the wanted result is M2.
Edit: It could happen that last is not M9A9K9foo but M9A9 or M9K9foo or maybe M9A9K9. All i want is to replace all content except the text before A\d or K\d but if there is no A\d or K\d nothing should happen.
Do you know why this replacement won't work?

You're using String.replaceFirst, and your pattern has a wildcard that consumes everything.
Maybe you want:
last replaceFirst ("""A\dK\d""", "")
where the A9K9 is not optional, and that's all you want to replace.
There are other formulations:
scala> val r = """(A\dK\d)""".r
r: scala.util.matching.UnanchoredRegex = (A\dK\d)
scala> val m = (r findAllMatchIn bar).toList.last
m: scala.util.matching.Regex.Match = A9K9
scala> s"${m.before}${m.after}"
res15: String = M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9foo
That's not the most clever.
More:
scala> val r = """(A|K)\d""".r
r: scala.util.matching.Regex = (A|K)\d
scala> val bar = "M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo"
bar: String = M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo
scala> val last = (bar split "-").last
last: String = M9A9K9foo
scala> r findFirstMatchIn last map (_.before) getOrElse last
res0: CharSequence = M9
scala> val r = """(.*?)((A|K)\d.*)?""".r
r: scala.util.matching.Regex = (.*?)((A|K)\d.*)?
scala> last match { case r(prefix, _*) => prefix }
res1: String = M9
scala> "M9" match { case r(prefix, _*) => prefix }
res2: String = M9
scala> "M9K9foo" match { case r(prefix, _*) => prefix }
res3: String = M9
scala> val r = """(.*?)(?:(?:A|K)\d.*)?""".r
r: scala.util.matching.Regex = (.*?)(?:(?:A|K)\d.*)?
scala> last match { case r(prefix) => prefix }
res4: String = M9
The diagnosis is the same; there are different ways to pull the string apart.

Related

Scala, regex matching ignore unnecessary words

My program is:
val pattern = "[*]prefix_([a-zA-Z]*)_[*]".r
val outputFieldMod = "TRASHprefix_target_TRASH"
var tar =
outputFieldMod match {
case pattern(target) => target
}
println(tar)
Basically, I try to get the "target" and ignore "TRASH" (I used *). But it has some error and I am not sure why..
Simple and straight forward standard library function (unanchored)
Use Unanchored
Solution one
Use unanchored on the pattern to match inside the string ignoring the trash
val pattern = "prefix_([a-zA-Z]*)_".r.unanchored
unanchored will only match the pattern ignoring all the trash (all the other words)
val result = str match {
case pattern(value) => value
case _ => ""
}
Example
Scala REPL
scala> val pattern = """foo\((.*)\)""".r.unanchored
pattern: scala.util.matching.UnanchoredRegex = foo\((.*)\)
scala> val str = "blahblahfoo(bar)blahblah"
str: String = blahblahfoo(bar)blahblah
scala> str match { case pattern(value) => value ; case _ => "no match" }
res3: String = bar
Solution two
Pad your pattern from both sides with .*. .* matches any char other than a linebreak character.
val pattern = ".*prefix_([a-zA-Z]*)_.*".r
val result = str match {
case pattern(value) => value
case _ => ""
}
Example
Scala REPL
scala> val pattern = """.*foo\((.*)\).*""".r
pattern: scala.util.matching.Regex = .*foo\((.*)\).*
scala> val str = "blahblahfoo(bar)blahblah"
str: String = blahblahfoo(bar)blahblah
scala> str match { case pattern(value) => value ; case _ => "no match" }
res4: String = bar
This will work, val pattern = ".*prefix_([a-z]+).*".r, but it distinguishes between target and trash via lower/upper-case letters. Whatever determines real target data from trash data will determine the real regex pattern.

Matching or regular expression in scala

I have a regular expression like ".*(A20).*|.*(B30).*|C".
I would like to write a function its returns A20 or B30 based on the match found.
val regx=".*(A20).*|.*(B30).*".r
"helloA20" match { case regx(b,_) => b; case _ => "" } // A20
"helloB30" match { case regx(b,_) => b; case _ => "" } // null
"C" match { case regx(b,_) => b case _ => "" }
It's returning null because I am not considering the second group. In my actual code, I have a lot of group like that. I would like to return the matched string. Please help me to find a solution.
Easy! It should be like this:
val regx="^(.*(B30|A20).*|(C))$".r
Demo: https://regex101.com/r/nA6dQ9/1
Then you get the second value in the array of every group.
That way you only have one group regardless of the many possibilities.
You're close:
def extract(s: String) = s match {
case regx(b, _) if b != null => b
case regx(_, b) if b != null => b
case _ => ""
}
extract("helloA20")
res3: String = A20
extract("helloB30")
res4: String = B30
extract("A30&B30")
res6: String = B30
If you have a lot of groups, it's reasonable to use for comprehension insted of pattern matching. This code will return first match or None:
val letters = ('A' to 'Z').toSeq
val regex = letters.map(_.toString).mkString("(", "|", ")").r
def extract(s: String) = {
for {
m <- regex.findFirstMatchIn(s)
} yield m.group(1)
}
extract("A=B=")
extract("dsfdsBA")
extract("C====b")
extract("a====E")
res0: Option[String] = Some(A)
res1: Option[String] = Some(B)
res2: Option[String] = Some(C)
res3: Option[String] = Some(E)

Scala: how to construct a regex including a variable?

I want to test if a regex including a variable that I defined before matches a string.
For instance I would like to do :
val value = "abc"
regex = "[^a-z]".r + value //something like that
if(regex matches ".abc") print("ok, it works")
So my question is : how can I add construct a regex including a variable in scala?
("[^a-z]" + value).r
is all you need
You must quote the value string to protect against special regex syntax.
scala> val value = "*"
value: String = *
scala> val oops = """[^a-z]""" + value r
oops: scala.util.matching.Regex = [^a-z]*
scala> ".*" match { case oops() => }
scala> ".....*" match { case oops() => }
They added quoting to the Scala API:
scala> import util.matching._
import util.matching._
scala> val ok = """[^a-z]""" + Regex.quote(value) r
ok: scala.util.matching.Regex = [^a-z]\Q*\E
scala> ".*" match { case ok() => }
scala> ".....*" match { case ok() => }
scala.MatchError: .....* (of class java.lang.String)
... 33 elided
You could also generalize the pattern and do additional checks:
scala> val r = """\W(\w+)""".r
r: scala.util.matching.Regex = \W(\w+)
scala> ".abc" match { case r(s) if s == "abc" => }
Parsing and building the regex itself is relatively expensive, so usually it's desirable to do it once with a general pattern.

Scala regex pattern match groups different from that using findAllIn

I find that the groups extracted by Pattern-matching on regex's in Scala are different from those extracted using findAllIn function.
1) Here is an example of extraction using pattern match -
scala> val fullRegex = """(.+?)=(.+?)""".r
fullRegex: scala.util.matching.Regex = (.+?)=(.+?)
scala> val x = """a='b'"""
x: String = a='b'
scala> x match { case fullRegex(l,r) => println( l ); println(r) }
a
'b'
2) And here is an example of extraction using the findAllIn function -
scala> fullRegex.findAllIn(x).toArray
res4: Array[String] = Array(a=')
I was expecting the returned Array using findAllIn to be Array(a, 'b'). Why is it not so?
This is because you have not specified to what extent the second lazy match should go. So after = it consumes just one character and stops as it is in lazy mode.
See here.
https://regex101.com/r/dU7oN5/10
Change it to .+?=.+ to get full array
In particular, the pattern match's use of unapplySeq uses Matcher.matches, while findAllIn uses Matcher.find. matches tries to match entire input.
scala> import java.util.regex._
import java.util.regex._
scala> val p = Pattern compile ".+?"
p: java.util.regex.Pattern = .+?
scala> val m = p matcher "hello"
m: java.util.regex.Matcher = java.util.regex.Matcher[pattern=.+? region=0,5 lastmatch=]
scala> m.matches
res0: Boolean = true
scala> m.group
res1: String = hello
scala> m.reset
res2: java.util.regex.Matcher = java.util.regex.Matcher[pattern=.+? region=0,5 lastmatch=]
scala> m.find
res3: Boolean = true
scala> m.group
res4: String = h
scala>

Scala: extracting part of a Strings using Regular Expressions

I have a very simple string like this one:
"Some(1234)"
I'd like to extract "1234" out from it. How can I do it?
val s = "Some(1234)"
//s: java.lang.String = Some(1234)
val Pattern = """Some\((\d+)\)""".r
//Pattern: scala.util.matching.Regex = Some\((\d+)\)
val Pattern(number) = s
//number: String = 1234
Switch out your regex for whatever you need. \d+ limits it to digits only.
scala> val s = "Some(1234)"
s: String = Some(1234)
scala> val nums = "[0-9]".r
nums: scala.util.matching.Regex = [0-9]
scala> nums.findAllIn(s).mkString
res0: String = 1234
Starting Scala 2.13, it's possible to pattern match a Strings by unapplying a string interpolator:
val s"Some($number)" = "Some(1234)"
// number: String = 1234
Also note that if the idea is to extract an Option[Int] from its toString version, you can use the interpolation extraction with a match statement:
x match { case s"Some($number)" => number.toIntOption case _ => None }
// x = "Some(1234)" => Option[Int] = Some(1234)
// x = "Some(1S3R)" => Option[Int] = None
// x = "None" => Option[Int] = None
just another way, playing with the regex. Limit to 4 digits.
def getnumber (args: Array[String]) {
val str = "Some(1234)"
val nums = "\\d{4}".r
println (nums.findAllIn(str).mkString)
}