Regex pattern to detect payload.* pattern - regex

I want to validate all strings that have 'payload.*' i.e the string must start with 'payload' followed by a period (.) and followed by minimum 1 character.
example :-
Input1 :- payload.Hello Output1 :-> Valid
Input2 :- Hipayload.Hello Output1 :-> InValid
Input3 :- payload.H Output1 :-> Valid
Input4 :- payload. Output1 :-> InValid

You can use x.matches(y) (where x and y are both Strings) to match against a Regex pattern in Scala (or use "raw" Strings):
scala> val regex: String = "^payload\\..+"
regex: String = ^payload\..+
scala> val regexAltRaw: String = raw"^payload\..+"
regexAltRaw : String = ^payload\..+
scala> val regexAltTripleQuotes: String = """^payload\..+"""
regexAltTripleQuotes : String = ^payload\..+
scala> "".matches(regex)
res0: Boolean = false
scala> "payload.Hello".matches(regex)
res1: Boolean = true
scala> "Hipayload.Hello".matches(regex)
res2: Boolean = false
scala> "payload.H".matches(regex)
res3: Boolean = true
scala> "payload.".matches(regex)
res4: Boolean = false
To explain the pattern:
^payload - starts with "payload"
\\. - "." literally (without using "raw" Strings, Scala requires you to use double back-slashes to escape rather than single slashes like you would in normal Regex)
.+ - any character, one or more times

In order to avoid validation of input payload.. you can use foolowing regex:
^payload\.[^.]+$

Input Array:
val ar = Array("payload.Hello","Hipayload.Hello","payload.H","payload.")
Regex String:
val p = """^payload\..{1,}"""
In Scala REPL:
scala> val ar = Array("payload.Hello","Hipayload.Hello","payload.H","payload.")
ar: Array[String] = Array(payload.Hello, Hipayload.Hello, payload.H, payload.)
scala> val p = """^payload\..{1,}"""
p: String = ^payload\..{1,}
Test:
scala> ar.map(x=>if(x.matches(p))"Valid" else "InValid")
res3: Array[String] = Array(Valid, InValid, Valid, InValid)

Related

Scala: string pattern matching and splitting

I am new to Scala and want to create a function to split Hello123 or Hello 123 into two strings as follows:
val string1 = 123
val string2 = Hello
What is the best way to do it, I have attempted to use regex matching \\d and \\D but I am not sure how to write the function fully.
Regards
You may replace with 0+ whitespaces (\s*+) that are preceded with letters and followed with digits:
var str = "Hello123"
val res = str.split("(?<=[a-zA-Z])\\s*+(?=\\d)")
println(res.deep.mkString(", ")) // => Hello, 123
See the online Scala demo
Pattern details:
(?<=[a-zA-Z]) - a positive lookbehind that only checks (but does not consume the matched text) if there is an ASCII letter before the current position in the string
\\s*+ - matches (consumes) zero or more spaces possessively, i.e.
(?=\\d) - this check is performed only once after the whitespaces - if any - were matched, and it requires a digit to appear right after the current position in the string.
Based on the given string I assume you have to match a string and a number with any number of spaces in between
here is the regex for that
([a-zA-Z]+)\\s*(\\d+)
Now create a regex object using .r
"([a-zA-Z]+)\\s*(\\d+)".r
Scala REPL
scala> val regex = "([a-zA-Z]+)\\s*(\\d+)".r
scala> val regex(a, b) = "hello 123"
a: String = "hello"
b: String = "123"
scala> val regex(a, b) = "hello123"
a: String = "hello"
b: String = "123"
Function to handle pattern matching safely
pattern match with extractors
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
Here is the function which does Regex with Pattern matching
def matchStr(str: String): Option[(String, Int)] = {
val regex = "([a-zA-Z]+)\\s*(\\d+)".r
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
}
Scala REPL
scala> def matchStr(str: String): Option[(String, Int)] = {
val regex = "([a-zA-Z]+)\\s*(\\d+)".r
str match {
case regex(a, b) => Some(a -> b.toInt)
case _ => None
}
}
defined function matchStr
scala> matchStr("Hello123")
res41: Option[(String, Int)] = Some(("Hello", 123))
scala> matchStr("Hello 123")
res42: Option[(String, Int)] = Some(("Hello", 123))

strange behaviour with filter?

I want to extract MIME-like headers (starting with [Cc]ontent- ) from a multiline string:
scala> val regex = "[Cc]ontent-".r
regex: scala.util.matching.Regex = [Cc]ontent-
scala> headerAndBody
res2: String =
"Content-Type:application/smil
Content-ID:0.smil
content-transfer-encoding:binary
<smil><head>
"
This fails
scala> headerAndBody.lines.filter(x => regex.pattern.matcher(x).matches).toList
res4: List[String] = List()
but the "related" cases work as expected:
scala> headerAndBody.lines.filter(x => regex.pattern.matcher("Content-").matches).toList
res5: List[String] = List(Content-Type:application/smil, Content-ID:0.smil, content-transfer-encoding:binary, <smil><head>)
and:
scala> headerAndBody.lines.filter(x => x.startsWith("Content-")).toList
res8: List[String] = List(Content-Type:application/smil, Content-ID:0.smil)
what am I doing wrong in
x => regex.pattern.matcher(x).matches
since it returns an empty List??
The reason for the failure with the first line is that you use the java.util.regex.Matcher.matches() method that requires a full string match.
To fix that, use the Matcher.find() method that searches for the match anywhere inside the input string and use the "^[Cc]ontent-" regex (note that the ^ symbol will force the match to appear at the start of the string).
Note that this line of code does not work as you expect:
headerAndBody.lines.filter(x => regex.pattern.matcher("Content-").matches).toList
You run the regex check against the pattern Content-, and it is always true (that is why you get all the lines in the result).
See this IDEONE demo:
val headerAndBody = "Content-Type:application/smil\nContent-ID:0.smil\ncontent-transfer-encoding:binary\n<smil><head>"
val regex = "^[Cc]ontent-".r
val s1 = headerAndBody.lines.filter(x => regex.pattern.matcher(x).find()).toList
println(s1)
val s2 = headerAndBody.lines.filter(x => regex.pattern.matcher("Content-").matches).toList
print (s2)
Results (the first is the fix, and the second shows that your second line of code fails):
List(Content-Type:application/smil, Content-ID:0.smil, content-transfer-encoding:binary)
List(Content-Type:application/smil, Content-ID:0.smil, content-transfer-encoding:binary, <smil><head>)
Your regexp should match all line but not only first sub-string.
val regex = "[Cc]ontent-.*".r

Scala regex pattern match groups different from that using findAllIn

I find that the groups extracted by Pattern-matching on regex's in Scala are different from those extracted using findAllIn function.
1) Here is an example of extraction using pattern match -
scala> val fullRegex = """(.+?)=(.+?)""".r
fullRegex: scala.util.matching.Regex = (.+?)=(.+?)
scala> val x = """a='b'"""
x: String = a='b'
scala> x match { case fullRegex(l,r) => println( l ); println(r) }
a
'b'
2) And here is an example of extraction using the findAllIn function -
scala> fullRegex.findAllIn(x).toArray
res4: Array[String] = Array(a=')
I was expecting the returned Array using findAllIn to be Array(a, 'b'). Why is it not so?
This is because you have not specified to what extent the second lazy match should go. So after = it consumes just one character and stops as it is in lazy mode.
See here.
https://regex101.com/r/dU7oN5/10
Change it to .+?=.+ to get full array
In particular, the pattern match's use of unapplySeq uses Matcher.matches, while findAllIn uses Matcher.find. matches tries to match entire input.
scala> import java.util.regex._
import java.util.regex._
scala> val p = Pattern compile ".+?"
p: java.util.regex.Pattern = .+?
scala> val m = p matcher "hello"
m: java.util.regex.Matcher = java.util.regex.Matcher[pattern=.+? region=0,5 lastmatch=]
scala> m.matches
res0: Boolean = true
scala> m.group
res1: String = hello
scala> m.reset
res2: java.util.regex.Matcher = java.util.regex.Matcher[pattern=.+? region=0,5 lastmatch=]
scala> m.find
res3: Boolean = true
scala> m.group
res4: String = h
scala>

Scala Replacement by Regex

I've a string like
val bar = "M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo".
Now I split this string and define the pattern
val split = bar.split("-")
val pattern = ".*(A|K)\\d.*".r
and now I want to replace A9K9foo in the last entry of 'split'
val last = split.last
val suffix = last match {
case pattern(_) => last replaceFirst ("""(A\d)?(K\d)?.*""", "")
case _ => last
}
What I know is that replaceFirst is executed but it won't replace A9K9foo in my 'last' val
(replaceFirst should only executed if 'last' matches 'pattern'), the wanted result is M2.
Edit: It could happen that last is not M9A9K9foo but M9A9 or M9K9foo or maybe M9A9K9. All i want is to replace all content except the text before A\d or K\d but if there is no A\d or K\d nothing should happen.
Do you know why this replacement won't work?
You're using String.replaceFirst, and your pattern has a wildcard that consumes everything.
Maybe you want:
last replaceFirst ("""A\dK\d""", "")
where the A9K9 is not optional, and that's all you want to replace.
There are other formulations:
scala> val r = """(A\dK\d)""".r
r: scala.util.matching.UnanchoredRegex = (A\dK\d)
scala> val m = (r findAllMatchIn bar).toList.last
m: scala.util.matching.Regex.Match = A9K9
scala> s"${m.before}${m.after}"
res15: String = M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9foo
That's not the most clever.
More:
scala> val r = """(A|K)\d""".r
r: scala.util.matching.Regex = (A|K)\d
scala> val bar = "M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo"
bar: String = M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo-M9A9K9foo
scala> val last = (bar split "-").last
last: String = M9A9K9foo
scala> r findFirstMatchIn last map (_.before) getOrElse last
res0: CharSequence = M9
scala> val r = """(.*?)((A|K)\d.*)?""".r
r: scala.util.matching.Regex = (.*?)((A|K)\d.*)?
scala> last match { case r(prefix, _*) => prefix }
res1: String = M9
scala> "M9" match { case r(prefix, _*) => prefix }
res2: String = M9
scala> "M9K9foo" match { case r(prefix, _*) => prefix }
res3: String = M9
scala> val r = """(.*?)(?:(?:A|K)\d.*)?""".r
r: scala.util.matching.Regex = (.*?)(?:(?:A|K)\d.*)?
scala> last match { case r(prefix) => prefix }
res4: String = M9
The diagnosis is the same; there are different ways to pull the string apart.

Scala: extracting part of a Strings using Regular Expressions

I have a very simple string like this one:
"Some(1234)"
I'd like to extract "1234" out from it. How can I do it?
val s = "Some(1234)"
//s: java.lang.String = Some(1234)
val Pattern = """Some\((\d+)\)""".r
//Pattern: scala.util.matching.Regex = Some\((\d+)\)
val Pattern(number) = s
//number: String = 1234
Switch out your regex for whatever you need. \d+ limits it to digits only.
scala> val s = "Some(1234)"
s: String = Some(1234)
scala> val nums = "[0-9]".r
nums: scala.util.matching.Regex = [0-9]
scala> nums.findAllIn(s).mkString
res0: String = 1234
Starting Scala 2.13, it's possible to pattern match a Strings by unapplying a string interpolator:
val s"Some($number)" = "Some(1234)"
// number: String = 1234
Also note that if the idea is to extract an Option[Int] from its toString version, you can use the interpolation extraction with a match statement:
x match { case s"Some($number)" => number.toIntOption case _ => None }
// x = "Some(1234)" => Option[Int] = Some(1234)
// x = "Some(1S3R)" => Option[Int] = None
// x = "None" => Option[Int] = None
just another way, playing with the regex. Limit to 4 digits.
def getnumber (args: Array[String]) {
val str = "Some(1234)"
val nums = "\\d{4}".r
println (nums.findAllIn(str).mkString)
}