How to manage partial matching with regex? - regex

I have a regex like this:
val myregex = "This is a (.*) text for (.*) and other thing like .*".r
If I run :
> val myregex(a,b) = "This is a test text for something and other thing like blah blah"
a: String = test
b: String = something
it is ok, and it fails is b is missing:
> val myregex(a,b) = "This is a test text for and other thing like blah blah"
scala.MatchError: This is a test text for and other thing like blah blah (of class java.lang.String)
... 33 elided
Is there a way to keep for example the value a and replace b with a fallback value (and viceversa)? Or the only solution is splitting the regex in two distincts regexs?

Your original regex requires 2 consecutive spaces between for and and.
You may change your regex to actually match the string with an optional pattern by wrapping the space and the subsequent (.*) pattern with a non-capturing group and apply the ? quantifier to it making it optional:
val myregex = "This is a (.*) text for(?: (.*))? and other thing like .*".r
val x = "This is a test text for and other thing like blah blah"
x match {
case myregex(a, b) => print(s"${a} -- ${b}");
case _ => print("none")
}
// => test -- null
See the online Scala demo. Here, there is a match, but b is just null since the second capturing group did not participate in the match (and did not get initialized).

Or the only solution is splitting the regex in two distincts regexs?
This is the only solution. Your best bet is probably to use pattern matching:
("This is a test text for something", "and other thing like blah blah") match {
case (r1(a), r2(b)) => (a, b)
case (r1(a), _) => (a, "fallback")
}

Related

Javascript regex to match type annotations

I'm trying to match type annotations from a string of parameters:
foo: string, bar:number, baz: Array<string>
my initial pattern was working fine for primitives:
:\s*\w+
but it's not capturing arrays, so I tried an alternation, but it's not working:
:\s*\w+|:\s*\w+<\w+>
end result should be:
foo, bar, baz
You can make the part with the brackets optional and replace the matches with an empty string leaving the desired result:
:\s*\w+(?:<\w+>)?
Regex demo
let s = "foo: string, bar:number, baz: Array<string>";
console.log(s.replace(/:\s*\w+(?:<\w+>)?/g, ''));
Or match the parts using a capturing group
(\w+):\s*\w
Regex demo
let s = "foo: string, bar:number, baz: Array<string>";
let matches = Array.from(s.matchAll(/(\w+):\s*\w/g), m => m[1]);
console.log(matches.join(", "));

Scala regex find matches in middle of string [duplicate]

This question already has an answer here:
Working regex fails when using Scala pattern matching
(1 answer)
Closed 5 years ago.
I have written the following code in scala:
val regex_str = "([a-z]+)(\\d+)".r
"_abc123" match {
case regex_str(a, n) => "found"
case _ => "other"
}
which returns "other", but if I take off the leading underscore:
val regex_str = "([a-z]+)(\\d+)".r
"abc123" match {
case regex_str(a, n) => "found"
case _ => "other"
}
I get "found". How can I find any ([a-z]+)(\\d+) instead of just at the beginning? I am used to other regex languages where you use a ^ to specify beginning of the string, and the absence of that just gets all matches.
Scala regex patterns default as "anchored", i.e. bound to beginning and end of target string.
You'll get the expected match with this.
val regex_str = "([a-z]+)(\\d+)".r.unanchored
Hi May be you need something like this,
val regex_str = "[^>]([a-z]+)(\\d+)".r
"_abc123" match {
case regex_str(a, n) => println(s"found $a $n")
case _ => println("other")
}
This will avoid the first character from your string.
Hope this helps!
The unapplySeq of the Regex tries to capture the whole input by default (treats the pattern as if it was between ^ and $).
There are two ways to capture inside the input:
use .* before and after the captures: val regex_str = ".*([a-z]+)(\\d+).*".r
do the same with .unanchored: val regex_str = "([a-z]+)(\\d+)".r.unanchored
Otherwise scala treats regular expression anchors the same way as in other languages; this one is an exception made for semantic reasons.
The regex extractor in scala pattern-matching attempts to match the entire string. If you want to skip some junk-characters in the beginning and in the end, prepend a . with a reluctant quantifier to the regex:
val regex_str = ".*?([a-z]+)(\\d+).*".r
val result = "_!+<>__abc123_%$" match {
case regex_str(a, n) => s"found a = '$a', n = '$n'"
case _ => "no match"
}
println(result)
This outputs:
found a = 'abc', n = '123'
Otherwise, don't use the pattern match with the extractor, use "...".r.findAllIn to find all matches.

Scala regex get parameters in path

regex noob here.
example path:
home://Joseph/age=20/race=human/height=170/etc
Using regex, how do I grab everything after the "=" between the /Joseph/ path and /etc? I'm trying to create a list like
[20, human, 170]
So far I have
val pattern = ("""(?<=Joseph/)[^/]*""").r
val matches = pattern.findAllIn(path)
The pattern lets me just get "age=20" but I thought findAllIn would let me find all of the "parameter=" matches. And after that, I'm not sure how I would use regex to just obtain the "20" in "age=20", etc.
Code
See regex in use here
(?:(?<=/Joseph/)|\G(?!\A)/)[^=]+=([^=/]+)
Usage
See code in use here
object Main extends App {
val path = "home://Joseph/age=20/race=human/height=170/etc"
val pattern = ("""(?:(?<=/Joseph/)|\G(?!\A)/)[^=]+=([^=/]+)""").r
pattern.findAllIn(path).matchData foreach {
m => println(m.group(1))
}
}
Results
Input
home://Joseph/age=20/race=human/height=170/etc
Output
20
human
170
Explanation
(?:(?<=/Joseph/)|\G(?!\A)/) Match the following
(?<=/Joseph/) Positive lookbehind ensuring what precedes matches /Joseph/ literally
\G(?!\A)/ Assert position at the end of the previous match and match / literally
[^=]+ Match one or more of any character except =
= Match this literally
([^=/]+) Capture one or more of any character except = and / into capture group 1
Your pattern looks for the pattern directly after Joseph/, which is why only age=20 matched, maybe just look after =?
val s = "home://Joseph/age=20/race=human/height=170/etc"
// s: String = home://Joseph/age=20/race=human/height=170/etc
val pattern = "(?<==)[^/]*".r
// pattern: scala.util.matching.Regex = (?<==)[^/]*
pattern.findAllIn(s).toList
// res3: List[String] = List(20, human, 170)

Pattern matching extract String Scala

I want to extract part of a String that match one of the tow regex patterns i defined:
//should match R0010, R0100,R0300 etc
val rPat="[R]{1}[0-9]{4}".r
// should match P.25.01.21 , P.27.03.25 etc
val pPat="[P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}".r
When I now define my method to extract the elements as:
val matcher= (s:String) => s match {case pPat(el)=> println(el) // print the P.25.01.25
case rPat(el)=>println(el) // print R0100
case _ => println("no match")}
And test it eg with:
val pSt=" P.25.01.21 - Hello whats going on?"
matcher(pSt)//prints "no match" but should print P.25.01.21
val rSt= "R0010 test test 3,870"
matcher(rSt) //prints also "no match" but should print R0010
//check if regex is wrong
val pHead="P.25.01.21"
pHead.matches(pPat.toString)//returns true
val rHead="R0010"
rHead.matches(rPat.toString)//return true
I'm not sure if the regex expression are wrong but the matches method works on the elements. So what is wrong with the approach?
When you use pattern matching with strings, you need to bear in mind that:
The .r pattern you pass will need to match the whole string, else, no match will be returned (the solution is to make the pattern .r.unanchored)
Once you make it unanchored, watch out for unwanted matches: R[0-9]{4} will match R1234 in CSR123456 (solutions are different depending on what your real requirements are, usually word boundaries \b are enough, or negative lookarounds can be used)
Inside a match block, the regex matching function requires a capturing group to be present if you want to get some value back (you defined it as el in pPat(el) and rPat(el).
So, I suggest the following solution:
val rPat="""\b(R\d{4})\b""".r.unanchored
val pPat="""\b(P\.\d{2}\.\d{2}\.\d{2})\b""".r.unanchored
val matcher= (s:String) => s match {case pPat(el)=> println(el) // print the P.25.01.25
case rPat(el)=>println(el) // print R0100
case _ => println("no match")
}
Then,
val pSt=" P.25.01.21 - Hello whats going on?"
matcher(pSt) // => P.25.01.21
val pSt2_bad=" CP.2334565.01124.212 - Hello whats going on?"
matcher(pSt2_bad) // => no match
val rSt= "R0010 test test 3,870"
matcher(rSt) // => R0010
val rSt2_bad = "CSR00105 test test 3,870"
matcher(rSt2_bad) // => no match
Some notes on the patterns:
\b - a leading word boundary
(R\d{4}) - a capturing group matching exactly 4 digits
\b - a trailing word boundary
Due to the triple quotes used to define the string literal, there is no need to escape the backslashes.
Introduce groups in your patterns:
val rPat=".*([R]{1}[0-9]{4}).*".r
val pPat=".*([P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}).*".r
...
scala> matcher(pSt)
P.25.01.21
scala> matcher(rSt)
R0010
If code is written in the following way, the desired outcome will be generated. Reference API documentation followed is http://www.scala-lang.org/api/2.12.1/scala/util/matching/Regex.html
//should match R0010, R0100,R0300 etc
val rPat="[R]{1}[0-9]{4}".r
// should match P.25.01.21 , P.27.03.25 etc
val pPat="[P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}".r
def main(args: Array[String]) {
val pSt=" P.25.01.21 - Hello whats going on?"
val pPatMatches = pPat.findAllIn(pSt);
pPatMatches.foreach(println)
val rSt= "R0010 test test 3,870"
val rPatMatches = rPat.findAllIn(rSt);
rPatMatches.foreach(println)
}
Please, let me know if that works for you.

Extracting inner group with Scala regex

My Scala app is being given a string that may or may not contain the token "flimFlam(*)" inside of it, where the asterisk represents any kind of text, chars, punctuation, etc. There will always only be 0 or 1 instances of "flimFlam(*)" in this string, never more.
I need to detect if the given input string contains a "flimFlam(*)" instance, and if it does, extract out whatever is inside the two parentheses. Hence, if my string contains "flimFlam(Joe)", then the result would be a string with a value of "Joe", etc.
My best attempt so far:
val inputStr : String = "blah blah flimFlam(Joe) blah blah"
// Regex must be case-sensitive for "flimFlam" (not "FLIMFLAM", "flimflam", etc.)
val flimFlamRegex = ".*flimFlam\\(.*?\\)".r
val insideTheParens = flimFlamRegex.findFirstIn(inputStr)
Can anyone spot where I'm going awry?
Use pattern matching and regex extractor
val regex = ".*flimFlam\\((.*)\\).*".r
inputStr match {
case regex(x) => println(x)
case _ => println("no match")
}
Scala REPL
scala> val inputStr : String = "blah blah flimFlam(Joe) blah blah"
inputStr: String = blah blah flimFlam(Joe) blah blah
scala> val regex = ".*flimFlam\\((.*)\\).*"
regex: String = .*flimFlam\((.*)\).*
scala> val regex = ".*flimFlam\\((.*)\\).*".r
regex: scala.util.matching.Regex = .*flimFlam\((.*)\).*
scala> inputStr match { case regex(x) => println(x); case _ => println("no match")}
Joe
You may use a capturing group around .*? and just use an unanchored regex within match block so that the pattern could stay short and "pretty" (no need for .* around the value you are looking for):
var str = "blah blah flimFlam(Joe) blah blah"
val pattern = """flimFlam\((.*?)\)""".r.unanchored
val res = str match {
case pattern(res) => println(res)
case _ => "No match"
}
See the online demo
Also, note that you do not need to double backslashes inside """-quoted string literals that helps avoid excessive backslashes.
And a hint: if the flimFlam is a whole word, add \b in front - """\bflimFlam\((.*?)\)""".