RegEx - String To Help Match - regex

I read somewhere that it is possible to have a RegEx in which strings preceding and following are not to be matched, but instead help with ambiguities.
For example, I would like a RegEx that matches only "TESTING" from the second line ("defTESTINGghi") and nothing from line one and line two.
abcTESTINGdef
defTESTINGghi
ghiTESTINGjkl

If supported you can use the \K escape sequence. \K resets the starting point of the reported match and any previously consumed characters are no longer included. The Positive Lookahead asserts that the preceded is followed by ghi.
def\KTESTING(?=ghi)
Live Demo
Or depending on what your definition of the preceded and following not being matched are, why not simply use a capturing group to capture only the desired subpattern?
def(TESTING)ghi
Live Demo

You could try the below regexes to match the string TESTING only on the second line,
Through positive lookahead and lookbehind,
(?<=def)TESTING(?=ghi)
Matches the string TESTING only if it's present just after to the def and must be follwed by ghi.
Through positive lookahead,
TESTING(?=ghi)
Matches the string TESTING only if it's followed by ghi.
Through negative lookahead,
TESTING(?!def|jkl)
Matches the string TESTING if it's not followed by def or jkl.
Reference

Related

How to match the closest pattern on a capture group excluding overlap? [duplicate]

Given an input string fooxxxxxxfooxxxboo I am trying to write a regex that matches fooxxxboo i.e. starting from the second foo till the last boo.
I tried the following
foo.*?boo matches the complete string fooxxxxxxfooxxxboo
foo.*boo also matches the complete string fooxxxxxxfooxxxboo
I read this Greedy vs. Reluctant vs. Possessive Quantifiers and I understand their difference, but I am trying to match the shortest string from the end which matches the regex i.e. something like the regex to be evaluated from back.
Is there any way I can match only the last portion?
Use negative lookahead assertion.
foo(?:(?!foo).)*?boo
DEMO
(?:(?!foo).)*? - Non-greedy match of any character but not of foo zero or more times. That is, before matching each character, it would check that the character is not the letter f followed by two o's. If yes, then only the corresponding character will be matched.
Why the regex foo.*?boo matches the complete string fooxxxxxxfooxxxboo?
Because the first foo in your regex matches both the foo strings and the following .*? will do a non-greedy match upto the string boo, so we got two matches fooxxxxxxfooxxxboo and fooxxxboo. Because the second match present within the first match, regex engine displays only the first.
.*(foo.*?boo)
Try this. Grab the capture i.e $1 or \1.
See demo.
https://regex101.com/r/nL5yL3/9

How do I match what's between the quotes excluding these?

I want to match what's between the quotes but excluding these. I tried positive and negative lookahead, which works for the end quote but I cannot exclude the first one. What am I doing wrong?
Here is the example I'm using:
A: $("div"),
B: $("img.some_class"),
B: $("img.some_class.another_class"),
C: $("#some_id"),
D: $(".some_class"),
E: $("input#some_id"),
F: $("div#some_id.some_class.some_other"),
G: $("div.some_class#some_id")
Here is my regex so far:
/(?!").*(?=")/g
Try this:
/\("\K[^"]+/g
\K means that the return value will start here.
For example, it will find: A: $("div but return as match just: div.
Here Is Demo
There are not two, but four different lookaround modifiers, because you need to specify two different aspects:
Are you asserting that something is there (positive) or is not there (negative)?
Are you asserting that it's before the specified pattern (lookbehind) or after it (lookahead)?
The four combinations are generally written like this:
?= for positive lookahead
?! for negative lookahead
?<= for positive lookbehind
?<! for negative lookbehind
You've used a negative lookahead when you wanted a positive lookbehind, so the fixed version of what you wrote would be:
/(?<=").*(?=")/g
Beware the "greediness" of .*, which will match as much of the string as possible; you might want to use .*? to make it "non-greedy", or explicitly say "anything other than a quote mark" ([^"]*).
Another approach is to match the quotes normally, rather than with a lookaround, but "capture" the part between them: /"(.*?)"/. How you get to the "captured group" will vary depending on your programming language / tool, which you haven't specified.
The pattern (?!").*(?=") first asserts what is directly on the right is not a double quote (?!") which succeeds because for the example data that is a $.
Then .* is greedy and will match 0+ times any character except a newline and will match until the end of the string. Then it will backtrack to fulfill the assertion (?=") where directly on the right is a double quote.
If a positive lookbehind is supported, you might change the (?!") to (?<=") and the pattern could look like (?<=\$\(")[^"]+(?="\)) to not match empty double quotes.
Taking the dollar sign and the opening and closing parenthesis into account, you could use a capturing group and a negated character class [^"]+ to match any char except a double quote:
\$\("([^"]+)"\)
Regex demo
Using lookahead and lookbehinds as you asked :
/(?<=").*(?=")/g
Test Here : https://regex101.com/r/kCEuow/2
You might also consider using substrings :
/"([^"]+)"/g
Test the regex : https://regex101.com/r/kCEuow/1

How to create proper regular expression to find last character which I want to?

I need to create regex to find last underscore in string like 012344_2.0224.71_3 or 012354_5.00123.AR_3.335_8
I have wanted find last part with expression [^.]+$ and then find underscore at found element but I can not handle it.
I hope you can help me :)
Just use a negative character class [^_] that will match everything except an underscore (this helps to ensure no other underscores are found afterwards) and end of string $
Pattern would look as such:
(_)[^_]*$
The final underscore _ is in a capturing group, so you are wanting to return the submatch. You would replace the group 1 (your underscore).
See it live: Regex101
Notice the green highlighted portion on Regex101, this is your submatch and is what would be replaced.
The simplest solution I can imagine is using .*\K_, however not all regex flavours support \K.
If not, another idea would be to use _(?=[^_]*$)
You have a demo of the first and second option.
Explanation:
.*\K_: Fetches any character until an underscore. Since the * quantifier is greedy, It will match until the last underscore. Then \K discards the previous match and then we match the underscore.
_(?=[^_]*$): Fetch an underscore preceeded by non-underscore characters until the end of the line
If you want nothing but the "net" (i.e., nothing matched except the last underscore), use positive lookahead to check that no more underscores are in the string:
/_(?=[^_]*$)/gm
Demo
The pattern [^.]+$ matches not a dot 1+ times and then asserts the end of the string. The will give you the matches 71_3 and 335_8
What you want to match is an underscore when there are no more underscores following.
One way to do that is using a negative lookahead (?!.*_) if that is supported which asserts what is at the right does not match any character followed by an underscore
_(?!.*_)
Pattern demo

Regular expression to search for specific Referer in HTTP Header

I need to create a regular expression to match everything except a specific URL for a given Referer. I currently have it to match but can't reverse it and create the negative for it.
What I currently have:
Referer:(http(s)?(:\/\/))?(www\.)?test.com(\/.*)?
In the list below:
Referer:http://www.test.online/
Referer:https://www.test.online/
Referer:https://www.test.tv/
Referer:https://www.blah.com/
Referer:https://www.test.com/
Referer:http://www.test.com/
Referer:http://test.com/
Referer:https://test.com/
It will match:
Referer:https://www.test.com/
Referer:http://www.test.com/
Referer:http://test.com/
Referer:https://test.com/
However, I would like it to match everything except for those.
This is for our WAF so unfortunately are restricted on the usage which can only be fulfilled searching for the HTTP Header being passed back.
Try this regex:
^(?!.*Referer:(http(s)?(:\/\/))?(www\.)?test.com(\/.*)?).*$
A good way to negate your regex is to use negative lookahead.
Explanation:
The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead [is any regex pattern].
Working example: https://regex101.com/r/QJfeBB/1
You could use an anchor ^ to assert the start of the string and use a negative lookahead to assert what is on the right is not what you want to match.
Note that you have to escape the dot to match it literally and you could omit the last part (\/.*)?.
If you don't use the capturing groups for later use you might also turn those into non capturing groups (?:) instead.
^(?!Referer:(https?(:\/\/))?(www\.)?test\.com).+$
regex101 demo
About the pattern
^ Start of the string
(?! Negative lookahead to assert what is on the right does not match
Referer:(https?(:\/\/))?(www\.)?test\.com Match your pattern
) Close negative lookahead
.+ Match any char except a newline 1+ times
$ Assert end of the string

Full match only if the capturing group encountered once

The pattern:
(test):(thestring)
What I want is full match only if there is just one test: before
test:thestring
But in this case there wouldn't be full match:
test:test:thestring
I've tried qualificator, but it didn't work.
Need help
Try this pattern: ^(?!.*((?(?<=^)|(?<=:))test(?=(:|$))).*(?1)).+$.
The main part is ((?(?<=^)|(?<=:))test(?=(:|$))), which matches test if it's preceeded by colon : or is at the beginning of a line and it's followed by colon : or end of the line.
(?(?<=^)|(?<=:)) this is workaround to (?<=(:|^)), but lookbehinds must have fixed length.
Then we have backreference to first capturing group (?1), to see if there are any other test.
This whole pattern is placed in negative lookahead (?!...), to match everything if it doesn't match pattern explained above (test matched more than one time).
Demo
for this very specific case:
(?<!.)(test:thestring)
Regex101
All it does is search for the string test:thestring and ensures that there are no characters before it. (Use MichaƂ Turczyn's regex for an all purpose search!)
^((?!test:).)*(test:thestring)
See in action
If you want a full match and there should be only one time test: before test:string you might assert the start of the string ^, use a negative lookahead (?:(?!test:).) to match any character if what is on the right side is not test:
Then match test:thestring followed by a negative lookahead (?:(?!test:thestring).)* that matches any character if what is on the right side is not test:thestring and assert the end of the string $
^(?:(?!test:).)*test:thestring(?:(?!test:thestring).)*$
Regex demo