How to use Regex to get nested matches including their capturing groups? - regex

I have a string with a nested pattern func(func(doSomething)) and I have a Regex expression: /func\(([^ ]*)\)/gm. I want to get two separate matches where:
Match 1:
match: func(func(doSomething))
group captured:func(doSomething)
Match 2:
match: func(doSomething)
group captured:doSomething
However, I'm only getting a single match with the entire inner 'func' being a capturing group.
Here is the regex link: https://regex101.com/r/dUa4SC/1
Is it possible to achieve this using regex? if so Please help me with it. Thanks

You can build a recursive function to check the regex over the matched groups, in JavaScript it would be something like this:
function RecursiveMatch(pattern, text){
let matches = text.match(pattern);
if(matches != null && matches.length > 1){
console.log(matches[1] + " found in "+ matches[0])
RecursiveMatch(pattern, matches[1])
}
}
RecursiveMatch("func\\\(([^ ]*)\\\)", "func(func(doSomething))");
And this is the output:
func(doSomething) found in func(func(doSomething))
doSomething found in func(doSomething)

Related

How to get method/function in a string by using regex

I trying to get arguments from function in the string.
Argument possible to contains:
Example
Placeholder:
{{request.expires_in}}
//can match regex: \{\{[a-zA-z0-9\.\-\_]\}\}
function
#func_compare('string1','string2',1234)
Others:
dERzUxOfnj49g/kCGLR3vhzBOTLwEMgrpa1/MCBpXQR2NIFV1yjraGVZLkujG63J0joj+TvNocjpJSQq2TpPRzLfCSZADcjmbkBkphIpsT8=
//Any string except brackets
Case
Below is the sample case I working with.
Content:
#func_compare('string1',#func_test(2),1234),'Y-m-d H:i:s',#func_strtotime({{request.expires_in}} - 300)
Regex using:
(?<=#func_compare\().*[^\(](?=\))
I expect will get
'string1',#func_test(2),1234
But what matched from the regex now is
'string1',#func_test(2),1234),'Y-m-d H:i:s',#func_strtotime({{request.expires_in}} - 300
Anyone know how to get the arguments in between the #func_strtotime brackets. I will appreciate any response.
Would you please try:
(?<=#func_compare\().*?(?:\(.*?\).*?)?(?=\))
which will work for both cases.
[Explanation of the regex]
.*?(?:\(.*?\).*?)?(?=\))
.*? the shortest match not to overrun the next pattern
(?:\(.*?\).*?)? a group of substring which includes a pair of parens followed by another substring of length 0 or longer
(?=\)) positive lookahead for the right paren
You'll get the result using recursive regex:
(?<=#func_compare\()([^()]*\((?:.*?\)|(?1))*)[^()]*(?=\))
Demo & explanation

Regex multiple match

Suppose I have the following string: [P6]aabbcc<em>ddeeff</em>gghhiijj<em>kkllmmnn</em>oopp[P2]qqrr<em>ssttuuww</em>xxyyzz.
How will I extract the <em>...</em> tag along with the info inside square brackets, i.e, I wanted to extract the following:
[P6] and <em>ddeeff</em>
[P6] and <em>kkllmmnn</em>
[P2] and <em>ssttuuww</em>
I have tried a lot using many patterns but I am not able to find all the above matches (https://regex101.com/r/b64Wuv/1).
Does any one know how to do this with regex?
#San, you are quite close. The pattern needs a bit more to your's as below [sample in C#]
Regex regex = new Regex(#"(?<Ps>\[.*?]).+?<em>(?<ems>.*?)<\/em>");
var input = "[P6]aabbcc<em>ddeeff</em>gghhiijj<em>kkllmmnn</em>oopp[P2]qqrr<em>ssttuuww</em>xxyyzz";
var matches = regex.Matches(input);
foreach (Match match in matches)
{
if (match.Success)
{
Console.WriteLine($"{match.Groups["Ps"].Value} {match.Groups["ems"].Value}");
}
}
I think you have to use 2 regex:
1st regex - to match strings:
Match 1: [P6]aabbcc<em>ddeeff</em>gghhiijj<em>kkllmmnn</em>oopp
Match 2: [P2]qqrr<em>ssttuuww</em>xxyyzz
to achieve this use \[[^[]+, example.
2nd regex - to match ems:
Match 1: <em>ddeeff</em>
Match 2: <em>kkllmmnn</em>
to achieve this use <em>([^<]+?)<\/em>, example.

Regular expression to find specific text within a string enclosed in two strings, but not the entire string

I have this type of text:
string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5
I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?
I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.
I am doing a global replacement in Notepad++. I just need regex, code will not work.
Thank you in advance.
Roman
If I understand correctly here a code to do what you want
var intput = new List<string>
{
"string1_dog_bit_johny_bit_string2",
"string1_cat_bit_johny_bit_string2",
"string1_crocodile_bit_johny_bit_string2",
"string3_crocodile_bit_johny_bit_string4",
"string4_crocodile_bit_johny_bit_string5"
};
Regex regex = new Regex(#"(?<bitGroup>bit)");
var allMatches = new List<string>();
foreach (var str in intput)
{
if (str.StartsWith("string1") && str.EndsWith("string2"))
{
var matchCollection = regex.Matches(str);
allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
}
}
Console.WriteLine("All matches {0}", allMatches.Count);
This regex will do the job:
^string1_(?:.*(bit))+.*_string2$
^ means the start of the text (or line if you use the m option like so: /<regex>/m )
$ means the end of the text
. means any character
* means the previous character/expression is repeated 0 or more times
(?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".
If you provide us with the language you want to use, we could give more specific solutions.
edit: As you added that you're trying a replacement in Notepad++ I propose the following:
Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.
Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.
You can use the regex:
(?:string1|\G)(?:(?!string2).)*?\Kbit
regex101 demo. Tried it on notepad++ as well and it's working.
There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!

Regex to match alphanumerics, URL operators except forward slashes

I've been trying for the past couple of hours to get this regex right but unfortunately, I still can't get it. Tried searching through existing threads too but no dice. :(
I'd like a regex to match the following possible strings:
userprofile?id=123
profile
search?type=player&gender=male
someotherpage.htm
but not
userprofile/
helloworld/123
Basically, I'd like the regex to match alphanumerics, URL operators such as ?, = and & but not forward slashes. (i.e. As long as the string contains a forward slash, the regex should just return 0 matches.)
I've tried the following regexes but none seem to work:
([0-9a-z?=.]+)
(^[^\/]*$[0-9a-z?=.]+)
([0-9a-z?=.][^\/]+)
([0-9a-z?=.][\/$]+)
Any help will be greatly appreciated. Thank you so much!
The reason they all match is that your regexp matches part of the string and you've not told it that it needs to match the entire string. You need to make sure that it doesn't allow any other characters anywhere in the string, e.g.
^[0-9a-z&?=.]+$
Here's a small perl script to test it:
#!/usr/bin/perl
my #testlines = (
"userprofile?id=123",
"userprofile",
"userprofile?type=player&gender=male",
"userprofile.htm",
"userprofile/",
"userprofile/123",
);
foreach my $testline(#testlines) {
if ($testline =~ /^[0-9a-z&?=.]+$/) {
print "$testline matches\n";
} else {
print "$testline doesn't match - bad regexp, no cookie\n";
}
}
This should do the trick:
/\w+(\.htm|\?\w+=\w*(&\w+=\w*)*)?$/i
To break this down:
\w+ // Match [a-z0-9_] (1 or more), to specify resource
( // Alternation group (i.e., a OR b)
\.htm // Match ".htm"
| // OR
\? // Match "?"
\w+=\w* // Match first term of query string (e.g., something=foo)
(&\w+=\w*)* // Match remaining terms of query string (zero or more)
)
? // Make alternation group optional
$ // Anchor to end of string
The i flag is for case-insensitivity.

Regex AND operator

Based on this answer
Regular Expressions: Is there an AND operator?
I tried the following on http://regexpal.com/ but was unable to get it to work. What am missing? Does javascript not support it?
Regex: (?=foo)(?=baz)
String: foo,bar,baz
It is impossible for both (?=foo) and (?=baz) to match at the same time. It would require the next character to be both f and b simultaneously which is impossible.
Perhaps you want this instead:
(?=.*foo)(?=.*baz)
This says that foo must appear anywhere and baz must appear anywhere, not necessarily in that order and possibly overlapping (although overlapping is not possible in this specific case because the letters themselves don't overlap).
Example of a Boolean (AND) plus Wildcard search, which I'm using inside a javascript Autocomplete plugin:
String to match: "my word"
String to search: "I'm searching for my funny words inside this text"
You need the following regex: /^(?=.*my)(?=.*word).*$/im
Explaining:
^ assert position at start of a line
?= Positive Lookahead
.* matches any character (except newline)
() Groups
$ assert position at end of a line
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
Test the Regex here: https://regex101.com/r/iS5jJ3/1
So, you can create a javascript function that:
Replace regex reserved characters to avoid errors
Split your string at spaces
Encapsulate your words inside regex groups
Create a regex pattern
Execute the regex match
Example:
function fullTextCompare(myWords, toMatch){
//Replace regex reserved characters
myWords=myWords.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
//Split your string at spaces
arrWords = myWords.split(" ");
//Encapsulate your words inside regex groups
arrWords = arrWords.map(function( n ) {
return ["(?=.*"+n+")"];
});
//Create a regex pattern
sRegex = new RegExp("^"+arrWords.join("")+".*$","im");
//Execute the regex match
return(toMatch.match(sRegex)===null?false:true);
}
//Using it:
console.log(
fullTextCompare("my word","I'm searching for my funny words inside this text")
);
//Wildcards:
console.log(
fullTextCompare("y wo","I'm searching for my funny words inside this text")
);
Maybe you are looking for something like this. If you want to select the complete line when it contains both "foo" and "baz" at the same time, this RegEx will comply that:
.*(foo)+.*(baz)+|.*(baz)+.*(foo)+.*
Maybe just an OR operator | could be enough for your problem:
String: foo,bar,baz
Regex: (foo)|(baz)
Result: ["foo", "baz"]