How do I match what's between the quotes excluding these? - regex

I want to match what's between the quotes but excluding these. I tried positive and negative lookahead, which works for the end quote but I cannot exclude the first one. What am I doing wrong?
Here is the example I'm using:
A: $("div"),
B: $("img.some_class"),
B: $("img.some_class.another_class"),
C: $("#some_id"),
D: $(".some_class"),
E: $("input#some_id"),
F: $("div#some_id.some_class.some_other"),
G: $("div.some_class#some_id")
Here is my regex so far:
/(?!").*(?=")/g

Try this:
/\("\K[^"]+/g
\K means that the return value will start here.
For example, it will find: A: $("div but return as match just: div.
Here Is Demo

There are not two, but four different lookaround modifiers, because you need to specify two different aspects:
Are you asserting that something is there (positive) or is not there (negative)?
Are you asserting that it's before the specified pattern (lookbehind) or after it (lookahead)?
The four combinations are generally written like this:
?= for positive lookahead
?! for negative lookahead
?<= for positive lookbehind
?<! for negative lookbehind
You've used a negative lookahead when you wanted a positive lookbehind, so the fixed version of what you wrote would be:
/(?<=").*(?=")/g
Beware the "greediness" of .*, which will match as much of the string as possible; you might want to use .*? to make it "non-greedy", or explicitly say "anything other than a quote mark" ([^"]*).
Another approach is to match the quotes normally, rather than with a lookaround, but "capture" the part between them: /"(.*?)"/. How you get to the "captured group" will vary depending on your programming language / tool, which you haven't specified.

The pattern (?!").*(?=") first asserts what is directly on the right is not a double quote (?!") which succeeds because for the example data that is a $.
Then .* is greedy and will match 0+ times any character except a newline and will match until the end of the string. Then it will backtrack to fulfill the assertion (?=") where directly on the right is a double quote.
If a positive lookbehind is supported, you might change the (?!") to (?<=") and the pattern could look like (?<=\$\(")[^"]+(?="\)) to not match empty double quotes.
Taking the dollar sign and the opening and closing parenthesis into account, you could use a capturing group and a negated character class [^"]+ to match any char except a double quote:
\$\("([^"]+)"\)
Regex demo

Using lookahead and lookbehinds as you asked :
/(?<=").*(?=")/g
Test Here : https://regex101.com/r/kCEuow/2
You might also consider using substrings :
/"([^"]+)"/g
Test the regex : https://regex101.com/r/kCEuow/1

Related

Regular expression to search for specific Referer in HTTP Header

I need to create a regular expression to match everything except a specific URL for a given Referer. I currently have it to match but can't reverse it and create the negative for it.
What I currently have:
Referer:(http(s)?(:\/\/))?(www\.)?test.com(\/.*)?
In the list below:
Referer:http://www.test.online/
Referer:https://www.test.online/
Referer:https://www.test.tv/
Referer:https://www.blah.com/
Referer:https://www.test.com/
Referer:http://www.test.com/
Referer:http://test.com/
Referer:https://test.com/
It will match:
Referer:https://www.test.com/
Referer:http://www.test.com/
Referer:http://test.com/
Referer:https://test.com/
However, I would like it to match everything except for those.
This is for our WAF so unfortunately are restricted on the usage which can only be fulfilled searching for the HTTP Header being passed back.
Try this regex:
^(?!.*Referer:(http(s)?(:\/\/))?(www\.)?test.com(\/.*)?).*$
A good way to negate your regex is to use negative lookahead.
Explanation:
The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead [is any regex pattern].
Working example: https://regex101.com/r/QJfeBB/1
You could use an anchor ^ to assert the start of the string and use a negative lookahead to assert what is on the right is not what you want to match.
Note that you have to escape the dot to match it literally and you could omit the last part (\/.*)?.
If you don't use the capturing groups for later use you might also turn those into non capturing groups (?:) instead.
^(?!Referer:(https?(:\/\/))?(www\.)?test\.com).+$
regex101 demo
About the pattern
^ Start of the string
(?! Negative lookahead to assert what is on the right does not match
Referer:(https?(:\/\/))?(www\.)?test\.com Match your pattern
) Close negative lookahead
.+ Match any char except a newline 1+ times
$ Assert end of the string

Regex Negative Lookbehind Matches Lookbehind text .NET

Say I have the following strings:
PB-GD2185-11652-MTCH
GD2185-11652-MTCH
KD-GD2185-11652-MTCH
KD-GD2185-11652
I want REGEX.IsMatch to return true if the string has MTCH in it and does not start with PB.
I expected the regex to be the following:
^(?<!PB)\S+(?=MTCH)
but that gives me the following matches:
PB-GD2185-11652-
GD2185-11652-
KD-GD2185-11652-
I do not understand why the negative lookbehind not only doesn't exclude the match but includes the PB characters in the match. The positive lookahead works as expected.
EDIT 1
Let me start with a simpler example. The following regex matches all of the strings as I would expect it to:
\S+
The following regex still matches all of the strings even though I would expect it not to:
\S+(?!MTCH)
The following regex matches all but the final H character on the first three strings:
\S+(?<!MTCH)
From the documentation at regex 101, a lookahead looks for text to the right of the pattern and a lookbehind looks for text to the left of the pattern, so having a lookahead at the beginning of a string does not jive with the documentation.
Edit 2
take another example with the following three strings:
grey
greyhound
hound
the regex:
^(?<!grey)hound
only matches the final hound. whereas the regex:
^(?<!grey)\S+
matches all three.
You need a lookahead: ^(?!PB)\S+(?=MTCH). Using the look-behind means the PB has to come before the first character.
The problem was because of the greediness of \S+. When dealing with lookarounds and greedy quantifiers you can easily match more characters than you expect. One way to deal with this is to insert a negative lookaround in a group with the greedy quantifier to exclude it as a match as stated in this question:
How to non-greedy multiple lookbehind matches
and on this helpful website about greediness in regular expressions:
http://www.rexegg.com/regex-quantifiers.html
Note that this second link has a few other ways to deal with the greediness in various situations.
A good regular expression for this situation is as follows:
^(?<!PB)((?!PB)\S+)(MTCH)
In situations like this it is going to be much clearer to do it logically within the code. So first check if the string matches MTCH and then that it doesn't match ^PB

Negative look ahead regex

How can I match strings that aren't preceded by an # sign?
/(?!#)(somestring|someotherstring)/
Doesn't produce expected results. Am testing this in sublime text following Sublime: Regular Expressions.cheatsheet
You need to use a lookbehind:
(?<!#)(somestring|someotherstring)
The (?!#) lookahead will check if the following symbol is not a #.
Some more details:
Lookbehind has the same effect [VS: as lookahead], but works backwards. It tells the regex engine to temporarily step backwards in the string, to check if the text inside the lookbehind can be matched there. (?<!a)b matches a b that is not preceded by an a, using negative lookbehind. It doesn't match cab, but matches the b (and only the b) in bed or debt.
Negative lookbehind is written as (?<!text), using an exclamation point instead of an equals sign.

RegEx lookahead but not immediately following

I am trying to match terms such as the Dutch ge-berg-te. berg is a noun by itself, and ge...te is a circumfix, i.e. geberg does not exist, nor does bergte. gebergte does. What I want is a RegEx that matches berg or gebergte, working with a lookaround. I was thinking this would work
\b(?i)(ge(?=te))?berg(te)?\b
But it doesn't. I am guessing because a lookahead only checks the immediate following characters, and not across characters. Is there any way to match characters with a lookahead withouth the constraint that those characters have to be immediately behind the others?
Valid matches would be:
Berg
berg
Gebergte
gebergte
Invalid matches could be:
Geberg
geberg
Bergte
bergte
ge-/Ge- and -te always have to occur together. Note that I want to try this with a lookahead. I know it can be done simpler, but I want to see if its methodologically possible to do something like this.
Here is one non-lookaround based regex:
\b(berg|gebergte)\b
Use it with i (ignore case) flag. This regex uses alternation and word boundary to search for complete words berg OR gebergte.
RegEx Demo
Lookaround based regex:
(?<=\bge)berg(?=te\b)|\bberg\b
This regex used a lookahead and lookbehind to search for berg preceded by ge and followed by te. Alternatively it matches complete word berg using word boundary asserter \b which is also 0-width asserter like anchors ^ and $.
To generally forbid a sign, you can put the negative lookaround to the beginning of a string and combine it with random number of other signs before the string you want to forbid:
regex: don't match if containing a specific string
^(?!.\*720).*
This will not match, if the string contains 720, but else match everything else.

RegEx - String To Help Match

I read somewhere that it is possible to have a RegEx in which strings preceding and following are not to be matched, but instead help with ambiguities.
For example, I would like a RegEx that matches only "TESTING" from the second line ("defTESTINGghi") and nothing from line one and line two.
abcTESTINGdef
defTESTINGghi
ghiTESTINGjkl
If supported you can use the \K escape sequence. \K resets the starting point of the reported match and any previously consumed characters are no longer included. The Positive Lookahead asserts that the preceded is followed by ghi.
def\KTESTING(?=ghi)
Live Demo
Or depending on what your definition of the preceded and following not being matched are, why not simply use a capturing group to capture only the desired subpattern?
def(TESTING)ghi
Live Demo
You could try the below regexes to match the string TESTING only on the second line,
Through positive lookahead and lookbehind,
(?<=def)TESTING(?=ghi)
Matches the string TESTING only if it's present just after to the def and must be follwed by ghi.
Through positive lookahead,
TESTING(?=ghi)
Matches the string TESTING only if it's followed by ghi.
Through negative lookahead,
TESTING(?!def|jkl)
Matches the string TESTING if it's not followed by def or jkl.
Reference