Regex multiple match - regex

Suppose I have the following string: [P6]aabbcc<em>ddeeff</em>gghhiijj<em>kkllmmnn</em>oopp[P2]qqrr<em>ssttuuww</em>xxyyzz.
How will I extract the <em>...</em> tag along with the info inside square brackets, i.e, I wanted to extract the following:
[P6] and <em>ddeeff</em>
[P6] and <em>kkllmmnn</em>
[P2] and <em>ssttuuww</em>
I have tried a lot using many patterns but I am not able to find all the above matches (https://regex101.com/r/b64Wuv/1).
Does any one know how to do this with regex?

#San, you are quite close. The pattern needs a bit more to your's as below [sample in C#]
Regex regex = new Regex(#"(?<Ps>\[.*?]).+?<em>(?<ems>.*?)<\/em>");
var input = "[P6]aabbcc<em>ddeeff</em>gghhiijj<em>kkllmmnn</em>oopp[P2]qqrr<em>ssttuuww</em>xxyyzz";
var matches = regex.Matches(input);
foreach (Match match in matches)
{
if (match.Success)
{
Console.WriteLine($"{match.Groups["Ps"].Value} {match.Groups["ems"].Value}");
}
}

I think you have to use 2 regex:
1st regex - to match strings:
Match 1: [P6]aabbcc<em>ddeeff</em>gghhiijj<em>kkllmmnn</em>oopp
Match 2: [P2]qqrr<em>ssttuuww</em>xxyyzz
to achieve this use \[[^[]+, example.
2nd regex - to match ems:
Match 1: <em>ddeeff</em>
Match 2: <em>kkllmmnn</em>
to achieve this use <em>([^<]+?)<\/em>, example.

Related

How to use Regex to get nested matches including their capturing groups?

I have a string with a nested pattern func(func(doSomething)) and I have a Regex expression: /func\(([^ ]*)\)/gm. I want to get two separate matches where:
Match 1:
match: func(func(doSomething))
group captured:func(doSomething)
Match 2:
match: func(doSomething)
group captured:doSomething
However, I'm only getting a single match with the entire inner 'func' being a capturing group.
Here is the regex link: https://regex101.com/r/dUa4SC/1
Is it possible to achieve this using regex? if so Please help me with it. Thanks
You can build a recursive function to check the regex over the matched groups, in JavaScript it would be something like this:
function RecursiveMatch(pattern, text){
let matches = text.match(pattern);
if(matches != null && matches.length > 1){
console.log(matches[1] + " found in "+ matches[0])
RecursiveMatch(pattern, matches[1])
}
}
RecursiveMatch("func\\\(([^ ]*)\\\)", "func(func(doSomething))");
And this is the output:
func(doSomething) found in func(func(doSomething))
doSomething found in func(doSomething)

C# Regex not matching string spanning multiple line

In c# I have a regex and I can't get my head around why it is not matching.
The pattern (abc\r\n)* should match the abc\r\nabc\r\n in the string 123\r\nabc\r\nabc\r\n345
Regex regex = new Regex("(abc\r\n)*", RegexOptions.Compiled);
var mat = regex.Match("123\r\nabc\r\nabc\r\n345");
The funny thing is that mat.Success returns true.
The same pattern matches online
The Match method works as expected.
Actually the pattern (abc\r\n)* will find 12 matches. The Match method returns to you the first match only which is an empty string.
So that if you are looking to match abc\r\nabc\r\n exactly you should use this pattern:
Regex regex = new Regex("(abc\r\nabc\r\n)", RegexOptions.Compiled);
if you like to match all abc\r\n you should use:
Regex regex = new Regex("(abc\r\n)", RegexOptions.Compiled);
var mat = regex.Matches("123\r\nabc\r\nabc\r\n345");
And so on, bottom line is that the problem is in the pattern itself.

Regular Expression to match no tags except <br>

We need to match text from a user input, but specifically reject any tags that aren't <br>.
From other stackoverflow posts I can find the opposite match to what I need (i.e. it matches the offending tags rather than the text and the other tag). Due to constraints we can't use negative logic to this for validation. The regex is:
<(?!\/?br(?=>|\s.*>))\/?.*?>
Is it possible to match the whole text if it only contains "normal" text and BR tags?
For example these should match:
bob
bob<br>bob
bob<br />bob
bob</br>
These should fail to match
bob<p>bob
bob<div>bob
bob</div>bob
Could use two negative lookaheads:
(?si)^(?!.*<(?!\/?br\b)\w).*
as a Java string:
"(?si)^(?!.*<(?!\\/?br\\b)\\w).*"
Used s (dot match newline too), i (caseless) modifier.
test at regexplanet (click on Java); test at regex101; see SO Regex FAQ
(?=^[a-zA-Z0-9]+$|[^<>]*<\s*(\/)?\s*br\s*(\/)?\s*>[^<>]*)^.*$
You can try this.This use postive lookahead.See demo.
http://regex101.com/r/kO7lO2/4
The below regex would work,
String s = "bob\n" +
"bob<br>bob\n" +
"bob<br />bob\n" +
"bob</br>\n" +
"bob<p>bob\n" +
"bob<div>bob\n" +
"bob</div>bob";
Pattern regex = Pattern.compile("^\\w+(?:<(?=\\/?br(?=>|\\s.*>))\\/?.*?>(?:\\w+)?)?$", Pattern.MULTILINE);
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(0));
}
Output:
bob
bob<br>bob
bob<br />bob
bob</br

Regular expression to find specific text within a string enclosed in two strings, but not the entire string

I have this type of text:
string1_dog_bit_johny_bit_string2
string1_cat_bit_johny_bit_string2
string1_crocodile_bit_johny_bit_string2
string3_crocodile_bit_johny_bit_string4
string4_crocodile_bit_johny_bit_string5
I want to find all occurrences of “bit” that occur only between string1 and string2. How do I do this with regex?
I found the question Regex Match all characters between two strings, but the regex there matches the entire string between string1 and string2, whereas I want to match just parts of that string.
I am doing a global replacement in Notepad++. I just need regex, code will not work.
Thank you in advance.
Roman
If I understand correctly here a code to do what you want
var intput = new List<string>
{
"string1_dog_bit_johny_bit_string2",
"string1_cat_bit_johny_bit_string2",
"string1_crocodile_bit_johny_bit_string2",
"string3_crocodile_bit_johny_bit_string4",
"string4_crocodile_bit_johny_bit_string5"
};
Regex regex = new Regex(#"(?<bitGroup>bit)");
var allMatches = new List<string>();
foreach (var str in intput)
{
if (str.StartsWith("string1") && str.EndsWith("string2"))
{
var matchCollection = regex.Matches(str);
allMatches.AddRange(matchCollection.Cast<Match>().Select(match => match.Groups["bitGroup"].Value));
}
}
Console.WriteLine("All matches {0}", allMatches.Count);
This regex will do the job:
^string1_(?:.*(bit))+.*_string2$
^ means the start of the text (or line if you use the m option like so: /<regex>/m )
$ means the end of the text
. means any character
* means the previous character/expression is repeated 0 or more times
(?:<stuff>) means a non-capturing group (<stuff> won't be captured as a result of the matching)
You could use ^string1_(.*(bit).*)*_string2$ if you don't care about performance or don't have large/many strings to check. The outer parenthesis allow multiple occurences of "bit".
If you provide us with the language you want to use, we could give more specific solutions.
edit: As you added that you're trying a replacement in Notepad++ I propose the following:
Use (?<=string1_)(.*)bit(.*)(?=_string2) as regex and $1xyz$2 as replacement pattern (replace xyz with your string). Then perform an "replace all" operation until N++ doesn't find any more matches. The problem here is that this regex will only match 1 bit per line per iteration - and therefore needs to be applied repeatedly.
Btw. even if a regexp matches the whole line, you can still only replace parts of it using capturing groups.
You can use the regex:
(?:string1|\G)(?:(?!string2).)*?\Kbit
regex101 demo. Tried it on notepad++ as well and it's working.
There're description in the demo site, but if you want more explanations, let me know and I'll elaborate!

Regex AND operator

Based on this answer
Regular Expressions: Is there an AND operator?
I tried the following on http://regexpal.com/ but was unable to get it to work. What am missing? Does javascript not support it?
Regex: (?=foo)(?=baz)
String: foo,bar,baz
It is impossible for both (?=foo) and (?=baz) to match at the same time. It would require the next character to be both f and b simultaneously which is impossible.
Perhaps you want this instead:
(?=.*foo)(?=.*baz)
This says that foo must appear anywhere and baz must appear anywhere, not necessarily in that order and possibly overlapping (although overlapping is not possible in this specific case because the letters themselves don't overlap).
Example of a Boolean (AND) plus Wildcard search, which I'm using inside a javascript Autocomplete plugin:
String to match: "my word"
String to search: "I'm searching for my funny words inside this text"
You need the following regex: /^(?=.*my)(?=.*word).*$/im
Explaining:
^ assert position at start of a line
?= Positive Lookahead
.* matches any character (except newline)
() Groups
$ assert position at end of a line
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
Test the Regex here: https://regex101.com/r/iS5jJ3/1
So, you can create a javascript function that:
Replace regex reserved characters to avoid errors
Split your string at spaces
Encapsulate your words inside regex groups
Create a regex pattern
Execute the regex match
Example:
function fullTextCompare(myWords, toMatch){
//Replace regex reserved characters
myWords=myWords.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
//Split your string at spaces
arrWords = myWords.split(" ");
//Encapsulate your words inside regex groups
arrWords = arrWords.map(function( n ) {
return ["(?=.*"+n+")"];
});
//Create a regex pattern
sRegex = new RegExp("^"+arrWords.join("")+".*$","im");
//Execute the regex match
return(toMatch.match(sRegex)===null?false:true);
}
//Using it:
console.log(
fullTextCompare("my word","I'm searching for my funny words inside this text")
);
//Wildcards:
console.log(
fullTextCompare("y wo","I'm searching for my funny words inside this text")
);
Maybe you are looking for something like this. If you want to select the complete line when it contains both "foo" and "baz" at the same time, this RegEx will comply that:
.*(foo)+.*(baz)+|.*(baz)+.*(foo)+.*
Maybe just an OR operator | could be enough for your problem:
String: foo,bar,baz
Regex: (foo)|(baz)
Result: ["foo", "baz"]