Regex negative lookahead is not working properly - regex

I have the following javascript code
var d = '<some string 2>';
var a = ...;
var b = '<some string 1>';
.
.
.
var c = b.map.....
To find the closest string declaration to c, I was using something like this:
var \w+ = '(.*?)';(?:(?!var \w+ = '(.*?)';.*?var c = \w+\.map)).*?var c = \w+\.map
I am using negative lookahead to not match a declaration if there is another declaration ahead that matches. However, the result I am getting is still <some string 2>. Could someone please explain me what causes that to happen?

The are more ways to match a string in JavaScript, and even more ways to break the pattern as this is very very brittle.
For the example data in the question, using a negative lookahead to assert that the next line does not start with the .map part or a string:
^var\s+\w\s*=\s*(["'`])(.*?)\1;\n(?:(?!var\s+\w+\s*=\s*(?:\w+\.map\b|(["'`]).*?\3);\n).*\n)*var\s+\w+\s*=\s*\w\.map\b
^ Start of string
var\s+\w\s*=\s* Match a possible var format and the equals sign
(["'`]) Capture group 1, match one of " ' or `
(.*?) Capture group 2 capture the data that you want to find
\1;\n A back reference to match up the quote in group 1
(?: Non capture group, to repeat as a whole
(?! Negative lookahead, assert what is directly to the right is not
var\s+\w+\s*=\s* Match a var part with =
(?: Non capture group for the alternation |
\w+\.map\b Match the .map part
| Or
(["'`]).*?\3 Capture group 3 to match one of the quotes with a back reference to match it up
) Close non capture group
;\n Match ; and a newline
) Close lookahead
.*\n Match the whole line and a newline
)* Close non capture group and optionally repeat to match all lines
var\s+\w+\s*=\s*\w\.map\b Match the line with .map
Regex demo
const regex = /^var\s+\w\s*=\s*(["'`])(.*?)\1;\n(?:(?!var\s+\w+\s*=\s*(?:\w+\.map\b|(["'`]).*?\3);\n).*\n)*var\s+\w+\s*=\s*\w\.map\b/m;
const str = `var d = '<some string 2>';
var a = ...;
var b = '<some string 1>';
.
.
.
var c = b.map.....`;
const m = str.match(regex);
if (m) {
console.log(m[2]);
}

Related

How to match in a single/common Regex Group matching or based on a condition

I would like to extract two different test strings /i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
and
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8
with a single RegEx and in Group-1.
By using this RegEx ^.[i,na,fm,d]+\/(.+([,\/])?(\/|.+=.+,\/).+\/[,](live.([^,]).).+_)?.+(640).*$ I can get the second string to match the desired result int/2021/11/25/,live_20211125_215206_
but the first string does not match in Group-1 and the missing expected test string 1 extraction is int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45
Any pointers on this is appreciated.
Thanks!
If you want both values in group 1, you can use:
^/(?:[id]|na|fm)/([^/\s]*/\d{4}/\d{2}/\d{2}/\S*?)(?:/,|[^_]+_)640(?:\D|$)
The pattern matches:
^ Start of string
/ Match literally
(?:[id]|na|fm) Match one of i d na fm
/ Match literally
( Capture group 1
[^/\s]*/ Match any char except a / or a whitespace char, then match /
\d{4}/\d{2}/\d{2}/ Match a date like pattern
\S*? Match optional non whitespace chars, as few as possible
) Close group 1
(?:/,|[^_]+_) Match either /, or 1+ chars other than _ and then match _
640 Match literally
(?:\D|$) Match either a non digits or assert end of string
See a regex demo and a go demo.
We can't know all the rules of how the strings your are matching are constructed, but for just these two example strings provided:
package main
import (
"fmt"
"regexp"
)
func main() {
var re = regexp.MustCompile(`(?m)(\/i/int/\d{4}/\d{2}/\d{2}/.*)(?:\/,|_[\w_]+)640`)
var str = `
/i/int/2021/11/18/019e1691-614c-4402-a8c1-d0239ad1ac45/,640-1_999899,480-1_999899,960-1_999899,1280-1_999899,1920-1_999899,.mp4.csmil/master.m3u8?set-segment-duration=responsive
/i/int/2021/11/25/,live_20211125_215206_sendeton_640x360-50p-1200kbit,live_20211125_215206_sendeton_480x270-50p-700kbit,live_20211125_215206_sendeton_960x540-50p-1600kbit,live_20211125_215206_sendeton_1280x720-50p-3200kbit,live_20211125_215206_sendeton_1920x1080-50p-5000kbit,.mp4.csmil/master.m3u8`
match := re.FindAllStringSubmatch(str, -1)
for _, val := range match {
fmt.Println(val[1])
}
}

return text between two characters

My regex is matching a string like between a tab or comma and a colon.
(?<=\t|,)([a-z].*?)(:)
This returns a string: app_open_icon:
I would like the regex to remove the : appended and return app_open_icon only.
How should I chnage this regex to exclude :?
Try (?<=\t|,)[a-z].*?(?=:)
const regex = /(?<=\t|,)[a-z].*?(?=:)/;
const text='Lorem ipsum dolor sit amet,app_open_icon:consectetur adipiscing...';
const result = regex.exec(text);
console.log(result);
You don't have to use lookarounds, you can use a capture group.
[\t,]([a-z][^:]*):
[\t,] Match either a tab or comma
( Capture group 1
[a-z][^:]* Match a char in the range a-z and 0+ times any char except :
) Close group 1
: Match literally
Regex demo
const regex = /[\t,]([a-z][^:]*):/;
const str = `,app_open_icon:`;
const m = str.match(regex);
if (m) {
console.log(m[1]);
}
To get a match only using lookarounds, you can turn the matches into lookarounds, and omit the capture group 1:
(?<=[\t,])[a-z][^:]*(?=:)
Regex demo
const regex = /(?<=[\t,])[a-z][^:]*(?=:)/;
const str = `,app_open_icon:`;
const m = str.match(regex);
if (m) {
console.log(m[0]);
}
You already have achived that with your regex.
Positive Lookbehind : (?<=\t|,) check for comma or tab before start of match
First capturing group : ([a-z].*?) captures everything before : and after , or \t
Second capturing group : (:) captures :
Just extract the first captuaring group of your match.
Check regex demo

Regex Match only not keywords

I have a string like <dlskjfldjf>Text creation List<string> checking edit<br/>. Need help with regex to match only <dlskjfldjf> and not List<string>
Keyword could be any generic type like
IList<T>
List<T>
etc
I tried with <([a-zA-Z]+)> which would match <dlskjfldjf> and below which would match List<string> but not sure how to mix them both
((List|ilist|IList|IEnumerable|IQuerable)(<)([A-Za-z.,\[\]\s]*)(>))|(<T>)
Use a Zero-width negative lookbehind assertion (?<! expression ):
string pattern = #"(?<!(List|ilist|IList|IEnumerable|IQuerable))<([a-zA-Z]+)>";
In a language that supports Negative Lookbehind a pattern like this could work:
(?<!(List|ilist|IList|IEnumerable|IQuerable))<([a-zA-Z]+)>
In JavaScript you may need to use two patterns to achieve the same result, test once for the angle bracket pattern and then test again to ensure you don't have the type information preceding it.
What you could do is match what you don't want to keep, and capture in a group what you do want to keep:
\b(?:List|ilist|IList|IEnumerable|IQuerable)<[^<>]*>|(<[a-zA-Z]+>)
That will match:
\b Word boundary to prevent any of the listed words in the alternation being part of a larger word
(?: Non capturing group
List|ilist|IList|IEnumerable|IQuerable Alternation which will match any of the listed words
) Close non capturing group
<[^<>]*> Match <, not <> 0+ times, then matc >
| Or
( Capture group (What you want to keep)
<[a-zA-Z]+> Match <, then 1+ times a lower or uppercase char, then >
) Close capture group
For example:
const regex = /\b(?:List|ilist|IList|IEnumerable|IQuerable)<[^<>]*>|(<[a-zA-Z]+>)/g;
const str = `<dlskjfldjf>Text creation List<string> checking edit<br/> or IList<string> or <aAbB>`;
let m;
let res = [];
while ((m = regex.exec(str)) !== null) {
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
if (m[1] !== undefined) res.push(m[1]);
}
console.log(res);

Regex to remove bracket but not its contents

I would like to remove constant text and brackets from a String using regex. e.g.
INPUT Expected OUTPUT
var x = CONST_STR(ABC) var x = ABC
var y = CONST_STR(DEF) var y = DEF
How to achieve this?
Try with regex:
(?<==\s)\w+\(([^)]+)\)
DEMO
which means:
(?<==\s) - lookbehind for = and space (\s), the regax need to fallow these signs,
\w+ - one or more word characters (A-Za-z_0-9) this might be upgraded to naming rules of language from code you are processing,
\( - opening bracket,
([^)]+) - capturing group (\1 or $1) for one or more characters, not closing brackets,
\) - closing bracket,
Just remember that in Java you need to use double escape character \,
like in:
public class Test {
public static void main(String[] args) {
System.out.println("var y = CONST_STR(DEF)".replaceAll("(?<==\\s)\\w+\\(([^)]+)\\)", "$1"));
}
}
with output:
var y = DEF
The $1 in replaceAll() is a call to captured groups nr 1, so a text captured by this fragment of regex: ([^)]+) - words in brackets.

Solution required to create Regex pattern

I am developing a windows application in C#. I have been searching for the solution to my problem in creating a Regex pattern. I want to create a Regex pattern matching the either of the following strings:
XD=(111111) XT=( 588.466)m3 YT=( .246)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr
XD=(111 ) XT=( 588.466)m3 YT=( .009)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr
The specific requirement is that I need all the values from the above given string which is a collection of key/value pairs. Also, would like to know the right approach (in terms of efficiency and performance) out of the two...Regex pattern matching or substring, for the above problem.
Thank you all in advance and let me know, if more details are required.
I don't know C#, so there probably is a better way to build a key/value array. I constructed a regex and handed it to RegexBuddy which generated the following code snippet:
StringCollection keyList = new StringCollection();
StringCollection valueList = new StringCollection();
StringCollection unitList = new StringCollection();
try {
Regex regexObj = new Regex(
#"(?<key>\b\w+) # Match an alphanumeric identifier
\s*=\s* # Match a = (optionally surrounded by whitespace)
\( # Match a (
\s* # Match optional whitespace
(?<value>[^()]+) # Match the value string (anything except parens)
\) # Match a )
(?<unit>[^\s=]+ # Match an optional unit (anything except = or space)
\b # which must end at a word boundary
(?!\s*=) # and not be an identifier (i. e. followed by =)
)? # and is optional, as mentioned.",
RegexOptions.IgnorePatternWhitespace);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
keyList.Add(matchResult.Groups["key"].Value);
valueList.Add(matchResult.Groups["value"].Value);
unitList.Add(matchResult.Groups["unit"].Value);
matchResult = matchResult.NextMatch();
}
Regex re=new Regex(#"(\w+)\=\(([\d\s\.]+)\)");
MatchCollection m=re.Matches(s);
m[0].Groups will have { XD=(111111), XD, 111111 }
m[1].Groups will have { XT=( 588.466), XT, 588.466 }
String[] rows = { "XD=(111111) XT=( 588.466)m3 YT=( .246)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr",
"XD=(111 ) XT=( 588.466)m3 YT=( .009)m3 G=( 3.6)V N=(X0000000000) M=(Y0000000000) O=(Z0000000000) Date=(06.01.01)Time=(00:54:55) Q=( .00)m3/hr" };
foreach (String s in rows) {
MatchCollection Pair = Regex.Matches(s, #"
(\S+) # Match all non-whitespace before the = and store it in group 1
= # Match the =
(\([^)]+\S+) # Match the part in brackets and following non-whitespace after the = and store it in group 2
", RegexOptions.IgnorePatternWhitespace);
foreach (Match item in Pair) {
Console.WriteLine(item.Groups[1] + " => " + item.Groups[2]);
}
Console.WriteLine();
}
Console.ReadLine();
If you want to extract the units also then use this regex
#"(\S+)=(\([^)]+(\S+))
I added a set of brackets around it, then you will find the unit in item.Groups[3]