Scala regex : capture between group - regex

In below regex I need "test" as output but it gives complete string which matches the regex. How can I capture string between two groups?
val pattern = """\{outer.*\}""".r
println(pattern.findAllIn(s"try {outer.test}").matchData.map(step => step.group(0)).toList.mkString)
Input : "try {outer.test}"
expected Output : test
current output : {outer.test}

You may capture that part using:
val pattern = """\{outer\.([^{}]*)\}""".r.unanchored
val s = "try {outer.test}"
val result = s match {
case pattern(i) => i
case _ => ""
}
println(result)
The pattern matches
\{outer\. - a literal {outer. substring
([^{}]*) - Capturing group 1: zero or more (*) chars other than { and } (see [^{}] negated character class)
\} - a } char.
NOTE: if your regex must match the whole string, remove the .unanchored I added to also allow partial matches inside a string.
See the Scala demo online.
Or, you may change the pattern so that the first part is no longer as consuming pattern (it matches a string of fixed length, so it is possible):
val pattern = """(?<=\{outer\.)[^{}]*""".r
val s = "try {outer.test}"
println(pattern.findFirstIn(s).getOrElse(""))
// => test
See this Scala demo.
Here, (?<=\{outer\.), a positive lookbehind, matches {outer. but does not put it into the match value.

Related

match everything but a given string and do not match single characters from that string

Let's start with the following input.
Input = 'blue, blueblue, b l u e'
I want to match everything that is not the string 'blue'. Note that blueblue should not match, but single characters should (even if present in match string).
From this, If I replace the matches with an empty string, it should return:
Result = 'blueblueblue'
I have tried with [^\bblue\b]+
but this matches the last four single characters 'b', 'l','u','e'
Another solution:
(?<=blue)(?:(?!blue).)+(?=blue|$)|^(?:(?!blue).)+(?=blue|$)
Regex demo
If you regex engine support the \K flag, then we can try:
/blue\K|.*?(?=blue|$)/gm
Demo
This pattern says to match:
blue match "blue"
\K but then forget that match
| OR
.*? match anything else until reaching
(?=blue|$) the next "blue" or the end of the string
Edit:
On JavaScript, we can try the following replacement:
var input = "blue, blueblue, b l u e";
var output = input.replace(/blue|.*?(?=blue|$)/g, (x) => x != "blue" ? "" : "blue");
console.log(output);

regex to extract substring for special cases

I have a scenario where i want to extract some substring based on following condition.
search for any pattern myvalue=123& , extract myvalue=123
If the "myvalue" present at end of the line without "&", extract myvalue=123
for ex:
The string is abcdmyvalue=123&xyz => the it should return myvalue=123
The string is abcdmyvalue=123 => the it should return myvalue=123
for first scenario it is working for me with following regex - myvalue=(.?(?=[&,""]))
I am looking for how to modify this regex to include my second scenario as well. I am using https://regex101.com/ to test this.
Thanks in Advace!
Some notes about the pattern that you tried
if you want to only match, you can omit the capture group
e* matches 0+ times an e char
the part .*?(?=[&,""]) matches as least chars until it can assert eiter & , or " to the right, so the positive lookahead expects a single char to the right to be present
You could shorten the pattern to a match only, using a negated character class that matches 0+ times any character except a whitespace char or &
myvalue=[^&\s]*
Regex demo
function regex(data) {
var test = data.match(/=(.*)&/);
if (test === null) {
return data.split('=')[1]
} else {
return test[1]
}
}
console.log(regex('abcdmyvalue=123&3e')); //123
console.log(regex('abcdmyvalue=123')); //123
here is your working code if there is no & at end of string it will have null and will go else block there we can simply split the string and get the value, If & is present at the end of string then regex will simply extract the value between = and &
if you want to use existing regex then you can do it like that
var test = data1.match(/=(.*)&|=(.*)/)
const result = test[1] ? test[1] : test[2];
console.log(result);

Extract the repetitive parts of a String by Regex pattern matching in Scala

I have this code for extracting the repetitive : separated sections of a regex, which does not give me the right output.
val pattern = """([a-zA-Z]+)(:([a-zA-Z]+))*""".r
for (p <- pattern findAllIn "it:is:very:great just:because:it is") p match {
case pattern("it", pattern(is, pattern(very, great))) => println("it: "+ is + very+ great)
case pattern(it, _,rest) => println( it+" : "+ rest)
case pattern(it, is, very, great) => println(it +" : "+ is +" : "+ very +" : " + great)
case _ => println("match failure")
}
What am I doing wrong?
How can I write a case expression which allows me to extract each : separated part of the pattern regex?
What is the right syntax with which to solve this?
How can the match against unknown number of arguments to be extracted from a regex be done?
In this case print:
it : is : very : great
just : because : it
is
You can't use repeated capturing group like that, it only saves the last captured value as the current group value.
You can still get the matches you need with a \b[a-zA-Z]+(?::[a-zA-Z]+)*\b regex and then split each match with ::
val text = "it:is:very:great just:because:it is"
val regex = """\b[a-zA-Z]+(?::[a-zA-Z]+)*\b""".r
val results = regex.findAllIn(text).map(_ split ':').toList
results.foreach { x => println(x.mkString(", ")) }
// => it, is, very, great
// just, because, it
// is
See the Scala demo. Regex details:
\b - word boundary
[a-zA-Z]+ - one or more ASCII letters
(?::[a-zA-Z]+)* - zero or more repetitions of
: - a colon
[a-zA-Z]+ - one or more ASCII letters
\b - word boundary

Scala regex find matches in middle of string [duplicate]

This question already has an answer here:
Working regex fails when using Scala pattern matching
(1 answer)
Closed 5 years ago.
I have written the following code in scala:
val regex_str = "([a-z]+)(\\d+)".r
"_abc123" match {
case regex_str(a, n) => "found"
case _ => "other"
}
which returns "other", but if I take off the leading underscore:
val regex_str = "([a-z]+)(\\d+)".r
"abc123" match {
case regex_str(a, n) => "found"
case _ => "other"
}
I get "found". How can I find any ([a-z]+)(\\d+) instead of just at the beginning? I am used to other regex languages where you use a ^ to specify beginning of the string, and the absence of that just gets all matches.
Scala regex patterns default as "anchored", i.e. bound to beginning and end of target string.
You'll get the expected match with this.
val regex_str = "([a-z]+)(\\d+)".r.unanchored
Hi May be you need something like this,
val regex_str = "[^>]([a-z]+)(\\d+)".r
"_abc123" match {
case regex_str(a, n) => println(s"found $a $n")
case _ => println("other")
}
This will avoid the first character from your string.
Hope this helps!
The unapplySeq of the Regex tries to capture the whole input by default (treats the pattern as if it was between ^ and $).
There are two ways to capture inside the input:
use .* before and after the captures: val regex_str = ".*([a-z]+)(\\d+).*".r
do the same with .unanchored: val regex_str = "([a-z]+)(\\d+)".r.unanchored
Otherwise scala treats regular expression anchors the same way as in other languages; this one is an exception made for semantic reasons.
The regex extractor in scala pattern-matching attempts to match the entire string. If you want to skip some junk-characters in the beginning and in the end, prepend a . with a reluctant quantifier to the regex:
val regex_str = ".*?([a-z]+)(\\d+).*".r
val result = "_!+<>__abc123_%$" match {
case regex_str(a, n) => s"found a = '$a', n = '$n'"
case _ => "no match"
}
println(result)
This outputs:
found a = 'abc', n = '123'
Otherwise, don't use the pattern match with the extractor, use "...".r.findAllIn to find all matches.

Scala regex get parameters in path

regex noob here.
example path:
home://Joseph/age=20/race=human/height=170/etc
Using regex, how do I grab everything after the "=" between the /Joseph/ path and /etc? I'm trying to create a list like
[20, human, 170]
So far I have
val pattern = ("""(?<=Joseph/)[^/]*""").r
val matches = pattern.findAllIn(path)
The pattern lets me just get "age=20" but I thought findAllIn would let me find all of the "parameter=" matches. And after that, I'm not sure how I would use regex to just obtain the "20" in "age=20", etc.
Code
See regex in use here
(?:(?<=/Joseph/)|\G(?!\A)/)[^=]+=([^=/]+)
Usage
See code in use here
object Main extends App {
val path = "home://Joseph/age=20/race=human/height=170/etc"
val pattern = ("""(?:(?<=/Joseph/)|\G(?!\A)/)[^=]+=([^=/]+)""").r
pattern.findAllIn(path).matchData foreach {
m => println(m.group(1))
}
}
Results
Input
home://Joseph/age=20/race=human/height=170/etc
Output
20
human
170
Explanation
(?:(?<=/Joseph/)|\G(?!\A)/) Match the following
(?<=/Joseph/) Positive lookbehind ensuring what precedes matches /Joseph/ literally
\G(?!\A)/ Assert position at the end of the previous match and match / literally
[^=]+ Match one or more of any character except =
= Match this literally
([^=/]+) Capture one or more of any character except = and / into capture group 1
Your pattern looks for the pattern directly after Joseph/, which is why only age=20 matched, maybe just look after =?
val s = "home://Joseph/age=20/race=human/height=170/etc"
// s: String = home://Joseph/age=20/race=human/height=170/etc
val pattern = "(?<==)[^/]*".r
// pattern: scala.util.matching.Regex = (?<==)[^/]*
pattern.findAllIn(s).toList
// res3: List[String] = List(20, human, 170)