regex pattern selects alternates matches - regex

Regex pattern
/("[^:=,]+":")(.*?)("}*\]*}*,")/
String :
"foo":""fooooooooooooooooooo"foooo","bar":"barrrrrrrrr""barrrrrr","fooo":"foooooo","bar":"barrrrrr","
Matches the first and the third pattern
http://rubular.com/r/S5fbsSfCjy
String:
"bar":"barrrrrrrrr""barrrrrr","fooo":"foooooo","bar":"barrrrrr","foo":""fooooooooooooooooooo"foooo","
Matches the first and the third pattern
http://rubular.com/r/hDfcBCkB2o
How do make it match all 4 patterns match any of the string above?

That's because the ," at the end of your regex pattern consumes the quotes from the following string. So, it is not matched. In fact, the regex will match only every alternate matching string.
You need to use look-ahead:
/("[^:=,]+":")(.*?)("}*\]*}*(?=,"))/
http://rubular.com/r/6v2OjPtmVM

Related

How to match the closest pattern on a capture group excluding overlap? [duplicate]

Given an input string fooxxxxxxfooxxxboo I am trying to write a regex that matches fooxxxboo i.e. starting from the second foo till the last boo.
I tried the following
foo.*?boo matches the complete string fooxxxxxxfooxxxboo
foo.*boo also matches the complete string fooxxxxxxfooxxxboo
I read this Greedy vs. Reluctant vs. Possessive Quantifiers and I understand their difference, but I am trying to match the shortest string from the end which matches the regex i.e. something like the regex to be evaluated from back.
Is there any way I can match only the last portion?
Use negative lookahead assertion.
foo(?:(?!foo).)*?boo
DEMO
(?:(?!foo).)*? - Non-greedy match of any character but not of foo zero or more times. That is, before matching each character, it would check that the character is not the letter f followed by two o's. If yes, then only the corresponding character will be matched.
Why the regex foo.*?boo matches the complete string fooxxxxxxfooxxxboo?
Because the first foo in your regex matches both the foo strings and the following .*? will do a non-greedy match upto the string boo, so we got two matches fooxxxxxxfooxxxboo and fooxxxboo. Because the second match present within the first match, regex engine displays only the first.
.*(foo.*?boo)
Try this. Grab the capture i.e $1 or \1.
See demo.
https://regex101.com/r/nL5yL3/9

Regex - Nested matches

When the following regex - \d\[\w*] is given the input string - asd3[bc]de , it would match 3[bc].
When given input such as 3[bc4[de]] that has nested matches, it matches the inner pattern 4[de] and not the outer one. Why is this so? Is there a way to force the regex to match the outer pattern?
\w won't match a '['.
The \d\[ matches the 3[, then \w* matches bc4, but won't match the inner '['. So, the regex engine has to back track and find another match for \d\[. That matches the 4[, \w* matches de, and then the ]s match.
I believe there are some regex engines that can have recursive patterns and match nested items.
let re = /\d\[\w*]?(\d\[\w*])]/;
let str = "3[bc4[de]]";
console.log([...str.match(re)]);

What is the difference between an anchored regex and an un-anchored regex?

What is the difference between an anchored regex and an un-anchored regex?
Usage found here:
... These should be specified as a list of pairs where the first element is an un-anchored regex (in java.util.regex.Pattern syntax) against which the platform name is matched...
Unanchored regex means a regex pattern that has no anchors, ^ for start of string, and $ for the end of string, and thus allows partial matches. E.g. in Java, Matcher#find() method can search for partial matches inside an input string, "a.c" will find a match in "1.0 abc.". The anchored "^a.c$" pattern will match an "abc" string, but won't find a match in "1.0 abc.".
Also, an unanchored regex may mean the code that handles the regex pattern does not check if the match is equal to full input string. E.g. in Java, Matcher#matches() method requires that the pattern must match the full input string and s.matches("a.c") will match an "abc" string, but won't find a match in "1.0 abc.".
Anchored regex means the pattern will only match a string if the whole string matches.
See Start of String and End of String Anchors for more information about anchors in regex.

Keep string after first number

This seems like it should be easy but I can't figure out which permutation of regex matching will result in extracting the whole string after the first number if the string. I can extract the string before the first number like so:
gsub( "\\d.*$", "", "DitchMe5KeepMe" )
Any idea how to write the regex pattern such that the string after the first number is kept?
Instead of lazy dot matching, I'd rely on a \D non-digit character class and use sub to make just one replacement:
sub( "^\\D*\\d", "", "DitchMe5KeepMe" )
Here,
^ - matches the start of a string
\D* - matches zero or more non-digits
\d - matched a digit
NOTE: to remove up to the first number, add a + after the last \d to match one or more digits.
See the IDEONE demo.
What I can see is that you want to remove everything until the first number, so you can use this regex and replace it with an empty string:
^.*?\d
I used .*? to make the pattern ungreedy, so if you had DitchMe5Keep8Me it will match DitchMe5, if you use a greedy pattern like .*\d it would match DitchMe5Keep8
Regex 101 Demo
R Fiddle Demo
You can also use str_extract from stringr:
library(stringr)
str_extract("DitchMe5KeepMe", "(?<=\\d).*$")
[1] "KeepMe"
which will extract everything after the first digit.
str_extract("DitchMe5KeepMe6keepme", "(?<=\\d).*$")
[1] "KeepMe6keepme"

Shortest match in regex from end

Given an input string fooxxxxxxfooxxxboo I am trying to write a regex that matches fooxxxboo i.e. starting from the second foo till the last boo.
I tried the following
foo.*?boo matches the complete string fooxxxxxxfooxxxboo
foo.*boo also matches the complete string fooxxxxxxfooxxxboo
I read this Greedy vs. Reluctant vs. Possessive Quantifiers and I understand their difference, but I am trying to match the shortest string from the end which matches the regex i.e. something like the regex to be evaluated from back.
Is there any way I can match only the last portion?
Use negative lookahead assertion.
foo(?:(?!foo).)*?boo
DEMO
(?:(?!foo).)*? - Non-greedy match of any character but not of foo zero or more times. That is, before matching each character, it would check that the character is not the letter f followed by two o's. If yes, then only the corresponding character will be matched.
Why the regex foo.*?boo matches the complete string fooxxxxxxfooxxxboo?
Because the first foo in your regex matches both the foo strings and the following .*? will do a non-greedy match upto the string boo, so we got two matches fooxxxxxxfooxxxboo and fooxxxboo. Because the second match present within the first match, regex engine displays only the first.
.*(foo.*?boo)
Try this. Grab the capture i.e $1 or \1.
See demo.
https://regex101.com/r/nL5yL3/9