Regex Match only not keywords - regex

I have a string like <dlskjfldjf>Text creation List<string> checking edit<br/>. Need help with regex to match only <dlskjfldjf> and not List<string>
Keyword could be any generic type like
IList<T>
List<T>
etc
I tried with <([a-zA-Z]+)> which would match <dlskjfldjf> and below which would match List<string> but not sure how to mix them both
((List|ilist|IList|IEnumerable|IQuerable)(<)([A-Za-z.,\[\]\s]*)(>))|(<T>)

Use a Zero-width negative lookbehind assertion (?<! expression ):
string pattern = #"(?<!(List|ilist|IList|IEnumerable|IQuerable))<([a-zA-Z]+)>";

In a language that supports Negative Lookbehind a pattern like this could work:
(?<!(List|ilist|IList|IEnumerable|IQuerable))<([a-zA-Z]+)>
In JavaScript you may need to use two patterns to achieve the same result, test once for the angle bracket pattern and then test again to ensure you don't have the type information preceding it.

What you could do is match what you don't want to keep, and capture in a group what you do want to keep:
\b(?:List|ilist|IList|IEnumerable|IQuerable)<[^<>]*>|(<[a-zA-Z]+>)
That will match:
\b Word boundary to prevent any of the listed words in the alternation being part of a larger word
(?: Non capturing group
List|ilist|IList|IEnumerable|IQuerable Alternation which will match any of the listed words
) Close non capturing group
<[^<>]*> Match <, not <> 0+ times, then matc >
| Or
( Capture group (What you want to keep)
<[a-zA-Z]+> Match <, then 1+ times a lower or uppercase char, then >
) Close capture group
For example:
const regex = /\b(?:List|ilist|IList|IEnumerable|IQuerable)<[^<>]*>|(<[a-zA-Z]+>)/g;
const str = `<dlskjfldjf>Text creation List<string> checking edit<br/> or IList<string> or <aAbB>`;
let m;
let res = [];
while ((m = regex.exec(str)) !== null) {
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
if (m[1] !== undefined) res.push(m[1]);
}
console.log(res);

Related

How to match in a single/common Regex Group matching or based on a condition

I would like to extract two different test strings /i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
and
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8
with a single RegEx and in Group-1.
By using this RegEx ^.[i,na,fm,d]+\/(.+([,\/])?(\/|.+=.+,\/).+\/[,](live.([^,]).).+_)?.+(640).*$ I can get the second string to match the desired result int/2021/11/25/,live_20211125_215206_
but the first string does not match in Group-1 and the missing expected test string 1 extraction is int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45
Any pointers on this is appreciated.
Thanks!
If you want both values in group 1, you can use:
^/(?:[id]|na|fm)/([^/\s]*/\d{4}/\d{2}/\d{2}/\S*?)(?:/,|[^_]+_)640(?:\D|$)
The pattern matches:
^ Start of string
/ Match literally
(?:[id]|na|fm) Match one of i d na fm
/ Match literally
( Capture group 1
[^/\s]*/ Match any char except a / or a whitespace char, then match /
\d{4}/\d{2}/\d{2}/ Match a date like pattern
\S*? Match optional non whitespace chars, as few as possible
) Close group 1
(?:/,|[^_]+_) Match either /, or 1+ chars other than _ and then match _
640 Match literally
(?:\D|$) Match either a non digits or assert end of string
See a regex demo and a go demo.
We can't know all the rules of how the strings your are matching are constructed, but for just these two example strings provided:
package main
import (
"fmt"
"regexp"
)
func main() {
var re = regexp.MustCompile(`(?m)(\/i/int/\d{4}/\d{2}/\d{2}/.*)(?:\/,|_[\w_]+)640`)
var str = `
/i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8`
match := re.FindAllStringSubmatch(str, -1)
for _, val := range match {
fmt.Println(val[1])
}
}

Regex negative lookahead is not working properly

I have the following javascript code
var d = '<some string 2>';
var a = ...;
var b = '<some string 1>';
.
.
.
var c = b.map.....
To find the closest string declaration to c, I was using something like this:
var \w+ = '(.*?)';(?:(?!var \w+ = '(.*?)';.*?var c = \w+\.map)).*?var c = \w+\.map
I am using negative lookahead to not match a declaration if there is another declaration ahead that matches. However, the result I am getting is still <some string 2>. Could someone please explain me what causes that to happen?
The are more ways to match a string in JavaScript, and even more ways to break the pattern as this is very very brittle.
For the example data in the question, using a negative lookahead to assert that the next line does not start with the .map part or a string:
^var\s+\w\s*=\s*(["'`])(.*?)\1;\n(?:(?!var\s+\w+\s*=\s*(?:\w+\.map\b|(["'`]).*?\3);\n).*\n)*var\s+\w+\s*=\s*\w\.map\b
^ Start of string
var\s+\w\s*=\s* Match a possible var format and the equals sign
(["'`]) Capture group 1, match one of " ' or `
(.*?) Capture group 2 capture the data that you want to find
\1;\n A back reference to match up the quote in group 1
(?: Non capture group, to repeat as a whole
(?! Negative lookahead, assert what is directly to the right is not
var\s+\w+\s*=\s* Match a var part with =
(?: Non capture group for the alternation |
\w+\.map\b Match the .map part
| Or
(["'`]).*?\3 Capture group 3 to match one of the quotes with a back reference to match it up
) Close non capture group
;\n Match ; and a newline
) Close lookahead
.*\n Match the whole line and a newline
)* Close non capture group and optionally repeat to match all lines
var\s+\w+\s*=\s*\w\.map\b Match the line with .map
Regex demo
const regex = /^var\s+\w\s*=\s*(["'`])(.*?)\1;\n(?:(?!var\s+\w+\s*=\s*(?:\w+\.map\b|(["'`]).*?\3);\n).*\n)*var\s+\w+\s*=\s*\w\.map\b/m;
const str = `var d = '<some string 2>';
var a = ...;
var b = '<some string 1>';
.
.
.
var c = b.map.....`;
const m = str.match(regex);
if (m) {
console.log(m[2]);
}

Scala regex : capture between group

In below regex I need "test" as output but it gives complete string which matches the regex. How can I capture string between two groups?
val pattern = """\{outer.*\}""".r
println(pattern.findAllIn(s"try {outer.test}").matchData.map(step => step.group(0)).toList.mkString)
Input : "try {outer.test}"
expected Output : test
current output : {outer.test}
You may capture that part using:
val pattern = """\{outer\.([^{}]*)\}""".r.unanchored
val s = "try {outer.test}"
val result = s match {
case pattern(i) => i
case _ => ""
}
println(result)
The pattern matches
\{outer\. - a literal {outer. substring
([^{}]*) - Capturing group 1: zero or more (*) chars other than { and } (see [^{}] negated character class)
\} - a } char.
NOTE: if your regex must match the whole string, remove the .unanchored I added to also allow partial matches inside a string.
See the Scala demo online.
Or, you may change the pattern so that the first part is no longer as consuming pattern (it matches a string of fixed length, so it is possible):
val pattern = """(?<=\{outer\.)[^{}]*""".r
val s = "try {outer.test}"
println(pattern.findFirstIn(s).getOrElse(""))
// => test
See this Scala demo.
Here, (?<=\{outer\.), a positive lookbehind, matches {outer. but does not put it into the match value.

Regular expression for match string with new line char

How use regular expression to match in text passphrase between Passphrase= string and \n char (Select: testpasssword)? The password can contain any characters.
My partial solution: Passphrase.*(?=\\nName) => Passphrase=testpasssword
[wifi_d0b5c2bc1d37_7078706c617967726f756e64_managed_psk]\nPassphrase=testpasssword\nName=pxplayground\nSSID=9079706c697967726f759e69\nFrequency=2462\nFavorite=true\nAutoConnect=true\nModified=2018-06-18T09:06:26.425176Z\nIPv4.method=dhcp\nIPv4.DHCP.LastAddress=0.0.0.0\nIPv6.method=auto\nIPv6.privacy=disabled\n
With QRegularExpression that supports PCRE regex syntax, you may use
QString str = "your_string";
QRegularExpression rx(R"(Passphrase=\K.+?(?=\\n))");
qDebug() << rx.match(str).captured(0);
See the regex demo
The R"(Passphrase=\K.+?(?=\\n))" is a raw string literal defining a Passphrase=\K.+?(?=\\n) regex pattern. It matches Passphrase= and then drops the matched text with the match reset operator \K and then matches 1 or more chars, as few as possible, up to the first \ char followed with n letter.
You may use a capturing group approach that looks simpler though:
QRegularExpression rx(R"(Passphrase=(.+?)\\n)");
qDebug() << rx.match(str).captured(1); // Here, grab Group 1 value!
See this regex demo.
The only thing you were missing is the the lazy quantifier telling your regex to only match as much as necessary and a positive lookbehind. The first one being a simple question mark after the plus, the second one just prefacing the phrase you want to match but not include by inputting ?<=. Check the code example to see it in action.
(?<=Passphrase=).+?(?=\\n)
const regex = /(?<=Passphrase=).+?(?=\\n)/gm;
const str = `[wifi_d0b5c2bc1d37_7078706c617967726f756e64_managed_psk]\\nPassphrase=testpasssword\\nName=pxplayground\\nSSID=9079706c697967726f759e69\\nFrequency=2462\\nFavorite=true\\nAutoConnect=true\\nModified=2018-06-18T09:06:26.425176Z\\nIPv4.method=dhcp\\nIPv4.DHCP.LastAddress=0.0.0.0\\nIPv6.method=auto\\nIPv6.privacy=disabled\\n
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}

Solution required to create Regex pattern

I am developing a windows application in C#. I have been searching for the solution to my problem in creating a Regex pattern. I want to create a Regex pattern matching the either of the following strings:
XD=(111111) XT=( 588.466)m3 YT=( .246)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr
XD=(111 ) XT=( 588.466)m3 YT=( .009)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr
The specific requirement is that I need all the values from the above given string which is a collection of key/value pairs. Also, would like to know the right approach (in terms of efficiency and performance) out of the two...Regex pattern matching or substring, for the above problem.
Thank you all in advance and let me know, if more details are required.
I don't know C#, so there probably is a better way to build a key/value array. I constructed a regex and handed it to RegexBuddy which generated the following code snippet:
StringCollection keyList = new StringCollection();
StringCollection valueList = new StringCollection();
StringCollection unitList = new StringCollection();
try {
Regex regexObj = new Regex(
#"(?<key>\b\w+) # Match an alphanumeric identifier
\s*=\s* # Match a = (optionally surrounded by whitespace)
\( # Match a (
\s* # Match optional whitespace
(?<value>[^()]+) # Match the value string (anything except parens)
\) # Match a )
(?<unit>[^\s=]+ # Match an optional unit (anything except = or space)
\b # which must end at a word boundary
(?!\s*=) # and not be an identifier (i. e. followed by =)
)? # and is optional, as mentioned.",
RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
keyList.Add(matchResult.Groups["key"].Value);
valueList.Add(matchResult.Groups["value"].Value);
unitList.Add(matchResult.Groups["unit"].Value);
matchResult = matchResult.NextMatch();
}
Regex re=new Regex(#"(\w+)\=\(([\d\s\.]+)\)");
MatchCollection m=re.Matches(s);
m[0].Groups will have { XD=(111111), XD, 111111 }
m[1].Groups will have { XT=( 588.466), XT, 588.466 }
String[] rows = { "XD=(111111) XT=( 588.466)m3 YT=( .246)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr",
"XD=(111 ) XT=( 588.466)m3 YT=( .009)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr" };
foreach (String s in rows) {
MatchCollection Pair = Regex.Matches(s, #"
(\S+) # Match all non-whitespace before the = and store it in group 1
= # Match the =
(\([^)]+\S+) # Match the part in brackets and following non-whitespace after the = and store it in group 2
", RegexOptions.IgnorePatternWhitespace);
foreach (Match item in Pair) {
Console.WriteLine(item.Groups[1] + " => " + item.Groups[2]);
}
Console.WriteLine();
}
Console.ReadLine();
If you want to extract the units also then use this regex
#"(\S+)=(\([^)]+(\S+))
I added a set of brackets around it, then you will find the unit in item.Groups[3]