RegEx: Negated lookahead a back reference many times - regex

How do we lookahead until there is no back reference of a character in RegEx?
Given:
We are looking for phrases within quotes and it can be multiline "check we have a return here
but this line is still part of previous one 'a' string".
It breaks once we have another 'testing with single quotes "surrounding" double quotes';
How do we look for double quotes and single quotes once they close themselves?
I tried this pattern, but it's not working:
/(['"])[^$1]+\1/g
Look here

If your strings have no escape sequences, it is as easy as using a tempered greedy token like
/(['"])(?:(?!\1)[\s\S])+\1/g
See the regex demo. The (?:(?!\1)[\s\S])+ matches any symbol ([\s\S]) that is not the value captured into Group 1 (either ' or "). To also match "" or '', replace the + (1 or more occurrences) with * quantifier (0 or more occurrences).
If you may have escape sequences, you may use
/(['"])(?:\\[\s\S]|(?!\1)[^\\])*?\1/g
See this demo.
See the pattern details:
(['"]) - Group 1 capturing a ' or "
(?:\\[^]|(?!\1)[^\\])*? - 0+ (but as few as possible) occurrences of
\\[^] - any escape sequence
| - or
(?!\1)[^\\] - any char other than \ and the one captured into Group 1
\1 - the value kept in Group 1.
NOTE: [\s\S] in JS matches any char including line break chars. A JS only construct that matches all chars is [^] and is preferable from the performance point of view, but is not advised as it is not supported in other regex flavors (i.e. it is not portable).

Related

Regex for no single quote and newline character in between single quotes

so far I have this '.[^ \n']*'(?!') with a negative look ahead after the last qoute
Unfortunately, this does allow ''' (three single quotes).
The regex should match these strings
'abc'
'abc##$%^xyz'
The regex shouldn't match these strings
'\n'
'abc#'#$%^xyz'
'''
'
My current regex is looking at negative precedes for a single quote. I am trying to find a way to make it more generalized so if doesn't match if it has odd number of single qoutes.
If your patterns occur always alone in a line, you could use this:
^'[^\n']*'$
If you want to find matching pairs of single quotes in a bigger text, I think regex is not the solution for you.
You could use:
^'[^\n']*(?:'[^\n']*')*[^\n']*'$
Explanation
^ Start of string
' Match a single quote
[^\n']* Match 0+ chars other than a newline or a single quote
(?: Non capture group to repeat as a whole part
'[^\n']*' Match from ' to ' without matching newlines in between
)* Close the non capture group and optionally repeat it
[^\n']* Match 0+ chars other than a newline or a single quote
' Match a single quote
$ End of string
See a regex101 demo.

Ungreedy with look behind

I have this kind of text:
other text opt1 opt2 opt3 I_want_only_this_text because_of_this
And am using this regex:
(?<=opt1|opt2|opt3).*?(?=because_of_this)
Which returns me:
opt2 opt3 I_want_only_this_text
However, I want to match only "I_want_only_this_text".
What is the best way to achieve this?
I don't know in what order the opt's will appear and they are only examples. Actual words will be different and there will be more of them.
Test screenshot
Actual data:
regex
(?<=※|を|備考|町|品は|。).*(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします)
text
こだわり豚には通常の豚よりビタミンB1が2倍以上あります。私たちの育てた愛情たっぷりのこだわり豚をぜひ召し上がってください。商品説明名称えびの産こだわり豚切落し産地宮崎県えびの市内容量500g×8パック合計4kg賞味期限90日保存方法-15℃以下で保存すること提供者株式会社さつま屋産業備考・本お礼品は冷凍でのお届けとなります
what I want to get:
冷凍で
You can use
(?<=※|を|備考|町|品は|。)(?:(?!※|を|備考|町|品は|。).)*?(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします)
See the regex demo. The scheme is the same as in (?<=opt1|opt2|opt3)(?:(?!opt1|opt2|opt3).)*?(?=because_of_this) (see demo).
The tempered greedy token solution allows you to match multiple occurrences of the same pattern in a longer string.
Details
(?<=※|を|備考|町|品は|。) - a positive lookbehind that matches a location that is immediately preceded with one of the alternatives listed in the lookbehind
(?:(?!※|を|備考|町|品は|。).)*? - any char other than a line break char, zero or more but as few as possible occurrences, that is not a starting point of any of the alternative patterns in the negative lookahead
(?=のお届けとなります|でお届けします|にてお届け致します|にてお届けいたします) - a positive lookahead that requires one of the alternative patterns to appear immediately to the right of the current location.
You could add a negative lookahead (?!\s*opt\d) to assert that there is no opt and a digit to the right. You can use a character class to list the digits 1, 2 and 3 instead of using the alternation with |.
(?<=\bopt[123]\s(?!\s*opt\d)).*?(?=\s*\bbecause_of_this\b)
Regex demo
It might be a bit more efficient to use a match with a capture group:
\bopt[123]\s(?!\s*opt\d)(.*?)\s*\bbecause_of_this\b
Regex demo
What about:
.*\bopt[123]\b\s*(.*?)\s*because_of_this\b
See the online demo.
.* - A greedy match of any character other than newline upto the last occurence of:
\bopt[123]\b - A word boundary followed by literally "opt" with a trailing number 1, 2 or 3 and another word boundary.
\s* - 0+ whitespace characters.
(.*?) - A 1st capture group with a lazy match of 0+ characters upto:
\s* - 0+ whitespace characters.
because_of_this\b - Literally "because_of_this" followed by a word-boundary.
If you need to have this written out in alternations:
.*\b(?:opt1|opt2|opt3)\b\s*(.*?)\s*because_of_this\b
See that demo.

RegEx: don't capture match, but capture after match

There are a thousand regular expression questions on SO, so I apologize if this is already covered. I did look first.
I have string:
Name Subname 11X22 88X620 AB33(20) YA5619 77,66
I need to capture this string: YA5619
What I am doing is just finding AB33(20) and after this I am capturing until first white space. But AB33(20) can be AB-33(20) or AB33(-20) or AB33(-1).
My preg_match regex is: (?<=\bAB\d{2}\(\d{2}\)\s).+?(?=\s)
Why I am getting error when I change from \d{2} to \d+?
For final result I was thinking this regix will work but no:
(?<=\bAB-?\d+\(-?\d+\)\s).+?(?=\s)
Any ideas what I am doing wrong?
With most regex flavors, lookbehind needs to evaluate to a fixed-length sequence, so you can't use variable quantifiers like * or + or even {1,2}.
Instead of using lookaround, you can simply match your marker pattern and then forget it with \K.
AB-?\d+(?:\(-?\d+\))? \K[^ ]+
demo: https://regex101.com/r/8XXngH/1
It depends on the language. If it is in .NET for example, it matches due to the various length in the lookbehind.
Another solution might be to use a character class and add the character you would allow to match. Then match a whitespace character and capture in a group matching \S+ which matches 1+ times not a whitespace character.
\bAB[()\d-]+\s\K\S+
Explanation
\bAB Match literally prepended with word boundary to prevent AB being part of a larger match.
[()\d-]+ Match 1+ times any of the listed character in the character class
\s Match a whitespace char (or \s+ to match 1 or more)
\K Reset the starting point of the reported match( Forget what was matched)
\S+ Match in a group 1+ times not a whitespace character
Regex demo | Php demo

Regex expression for 2 identical strings in a row

So I am trying to create a regex expression for the following template.
"[alphaNumeric]String/String.xcl"
So
[a1B2c3]Hello/Hello.xcl would pass
a1B2c3]hello/hello.xcl fails
[a1B2c3]Hello/hello.xcl fails
[a1B2c3]hello/hello.xc fails
I have tried the following so far:
\[[\da-zA-Z]+\][a-z]+\/[a-z]+\.xcl$
How do I check if the middle strings are identical?
Use a backreference:
\[[a-zA-Z0-9]+\]([^/]+)/\1\.xcl
The term in parenthesis captures the first part of your path. We may then refer to it later in the regex using \1.
Depending on how you plan to use this regex, you might need optional starting and closing anchors (^ and $).
Demo
You may capture the part after brackets and use a backreference after /:
^\[[\da-zA-Z]+]([A-Za-z]+)\/\1\.xcl$
^^^^^^^^^^ ^^
See the regex demo
Details
^ - start of the string
\[ - a [
[\da-zA-Z]+ - 1+ alphanumeric chars
] - a ] char
([A-Za-z]+) - Capturing group 1: one or more letters
\/ - a slash
\1 - a backreference to capturing group 1 value
\.xcl - .xcl substring
$ - end of string.
NOTE: If you do not care about what kind of chars there can be inside brackets, you may replace [\da-zA-Z]+ with [^\]]+.
NOTE2: If you want to match any chars on both ends of /, replace ([A-Za-z]+) with ([^\/]+).

notepad++ remove text between two string using regular expression

I want to remove text between two strings using regular expression in notepad++. Here is my full string
[insertedOn]) VALUES (1, N'1F9ACCD2-3B60-49CF-830B-42B4C99F6072',
I want final string like this
[insertedOn]) VALUES (N'1F9ACCD2-3B60-49CF-830B-42B4C99F6072',
Here I removed 1, from string. 1,2,3 is in incremental order.
I tried lot of expression but not worked. Here is one of them (VALUES ()(?s)(.*)(, N')
How can I remove this?
You may use
(VALUES \().*?,\s*(N')
and replace with $1$2. Note that in case the part of string to be removed can contain line breaks, enable the . matches newline. If the N and VALUES must be matched only when in ALLCAPS, make sure the Match case option is checked.
Pattern details
(VALUES \() - Group 1 (later referred with $1 from the replacement pattern): a literal substring VALUES (
.*? - any 0+ chars, as few as possible, up to the leftmost occurrence of the sunsequent subpatterns
,\s* - a comma and 0+ whitespaces (use \h instead of \s to only match horizontal whitespace chars)
(N') - Group 2 (later referred with $2 from the replacement pattern): a literal substring N'.
You should first escape literal ( before VALUES: \(
By doing so, .* in your regex in addition to s (DOTALL) flag causes engine to greedily match up to end of input string then backtracks to stop at the first occurrence of , N' which means unexpected matches.
To improve your regex you should 1) make .* ungreedy 2) remove (?s) 3) escape (:
(VALUES \().*?, (N')
To be more precise in matching you'd better search for:
VALUES \(\K\d+, *(?=N')
and replace with nothing.
Breakdown:
VALUES \( March VALUES ( literally
\K Reset match
\d+, * Match digits preceding a comma and optional spaces
(?=N') Followed by N'