Mixing Lookahead and Lookbehind in 1 Regexp - regex

I'm trying to match first occurrence of window.location.replace("http://stackoverflow.com") in some HTML string.
Especially I want to capture the URL of the first window.location.replace entry in whole HTML string.
So for capturing URL I formulated this 2 rules:
it should be after this string: window.location.redirect("
it should be before this string ")
To achieve it I think I need to use lookbehind (for 1st rule) and lookahead (for 2nd rule).
I end up with this Regex:
.+(?<=window\.location\.redirect\(\"?=\"\))
It doesn't work. I'm not even sure that it legal to mix both rules like I did.
Can you please help me with translating my rules to Regex? Other ways of doing this (without lookahead(behind)) also appreciated.

The pattern you wrote is really not the one you need as it matches something very different from what you expect: text window.location.redirect("=") in text window.location.redirect("=") something. And it will only work in PCRE/Python if you remove the ? from before \" (as lookbehinds should be fixed-width in PCRE). It will work with ? in .NET regex.
If it is JS, you just cannot use a lookbehind as its regex engine does not support them.
Instead, use a capturing group around the unknown part you want to get:
/window\.location\.redirect\("([^"]*)"\)/
or
/window\.location\.redirect\("(.*?)"\)/
See the regex demo
No /g modifier will allow matching just one, first occurrence. Access the value you need inside Group 1.
The ([^"]*) captures 0+ characters other than a double quote (URLs you need should not have it). If these URLs you have contain a ", you should use the second approach as (.*?) will match any 0+ characters other than a newline up to the first ").

Related

Regex Email validation with some special cases [duplicate]

I am trying to make a regex match which is discarding the lookahead completely.
\w+([-+.]\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*
This is the match and this is my regex101 test.
But when an email starts with - or _ or . it should not match it completely, not just remove the initial symbols. Any ideas are welcome, I've been searching for the past half an hour, but can't figure out how to drop the entire email when it starts with those symbols.
You can use the word boundary near # with a negative lookbehind to check if we are at the beginning of a string or right after a whitespace, then check if the 1st symbol is not inside the unwanted class [^\s\-_.]:
(?<=^|\s)[^\s\-_.]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*
See demo
List of matches:
support#github.com
s.miller#mit.edu
j.hopking#york.ac.uk
steve.parker#soft.de
info#company-hotels.org
kiki#hotmail.co.uk
no-reply#github.com
s.peterson#mail.uu.net
info-bg#software-software.software.academy
Additional notes on usage and alternative notation
Note that it is best practice to use as few escaped chars as possible in the regex, so, the [^\s\-_.] can be written as [^\s_.-], with the hyphen at the end of the character class still denoting a literal hyphen, not a range. Also, if you plan to use the pattern in other regex engines, you might find difficulties with the alternation in the lookbehind, and then you can replace (?<=\s|^) with the equivalent (?<!\S). See this regex:
(?<!\S)[^\s_.-]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*
And last but not least, if you need to use it in JavaScript or other languages not supporting lookarounds, replace the (?<!\S)/(?<=\s|^) with a (non)capturing group (\s|^), wrap the whole email pattern part with another set of capturing parentheses and use the language means to grab Group 1 contents:
(\s|^)([^\s_.-]\w*(?:[-+.]\w+)*\b#\w+(?:[-.]\w+)*\.\w+(?:[-.]\w+)*)
See the regex demo.
I use this for multiple email addresses, separate with ‘;':
([A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4};)*
For a single mail:
[A-Za-z0-9._%-]+#[A-Za-z0-9.-]+\.[A-Za-z]{2,4}

Regex optionally extracting characters between two characters

I have the following string thisIs/My-7777-Any-other-text it also is possible for the following thisIs/My-7777
I am looking to extract My-777 in both scenarios using regex. So essentially I am looking to extract everything between the first forward flash and the second hyphen (Second hyphen may not exist). I tried the following regex which wasn't quite right
(?<=\/)(.*)(?=-)
You could use a capture group
^[^\/]*\/([^-]*-[^-]*)
^ Start of string
[^\/]*\/
( Capture group
[^-]*-[^-]* Match a - between optional chars that are not -
) Close capture group
regex demo
Without an anchor, and not allowing / before and after -
[^\/]*\/([^-\/]*-[^-\/]*)
Regex demo
If we take into account the structure of your current input strings, you can use
(?<=\/)[^-]+-[^-]+
See the regex demo.
If your strings are more complex and look like thisIs/My-7777/more-text-here, and you actually want to match from the first /, then you may use
^[^\/]+\/\K[^\/-]+-[^\/-]+ ## PHP, PCRE, Boost (Notepad++), Onigmo (Ruby)
(?<=^[^\/]+\/)[^\/-]+-[^\/-]+ ## JS (except IE & Safari), .NET, Python PyPi regex)
See this regex demo or this regex demo. Note \n is added in the demo since the input is a single multiline string, in real life input, if a newline char is expected, use it in each negated character class to keep matching on the one line.
This one is working for me, Try it with case insensitive ticked
Find what: .*?/|-any.*
Replace with: blank
Output should be ↠↠ My-7777

Regular expression - skip characters in jMeter testing

I have the below regular expression which retrieves me all characters begins with
(state%3)((?:(?!#).)*)
I want to ignore the state%3. I have tried all kinds of lookback but nothing seems to work
Here is the full text that I need to match agains
"state%3DnGl%252BlPm8CkHfYd2PpBq7W0H2z6xgUeICgB7KFmGmGG8cTSQTf%252B9cYCfFSsT5YSPTITdbaLAlJoQ22%252FCXRAu3ROqTQYzpPfGYxKmRZ7iIqwx3g0GLpVkaXq5FL3Js5FcTGpncQx7TA9w1A6HsSyxxcktfwX8QSzhqJQj5lntOolrPoIqpa4l2C%252BbhCWuAOY18BwVynMv8%252BuSl#login/"
A couple of things I have already tried
^.{5}\Kstate
But seems not working. Any ideas. I need this to retrieve for jMeter testing.
No need of lookbehind, nor any lookarounds at all. Use a single capturing group and a negated character class:
state%3([^#]+)
AND set the template value to $1$.
See the regex demo. Details:
state%3 - matches a literal text
([^#]+) - Capturing group #1 (that is why template should be $1$): one or more chars other than #.

How to end a string with $ directly after .* with a RegEx?

I'm trying to report on a set of URLs that catches all potential URL parameters and I'm having an issue defining the RegEx properly.
We have this RegEx to capture a few variations of our URLs to feed into our reporting but I need to be able to end the string with a $ but when I do, it doesn't show any results.
The RegEx:
/join/$|/join/\?product.*|/join/\.*
For another account, we only use one variation which is outlined below (which works):
^/join/$
I believe the issue is in that after \?product.*, I'm not ending the string (or even starting it).
So far I have tried: ^/join/$|(^[/join/\?product.*]$)|(^[/join/\.*]$) with no luck.
If you want to match the dollar sign literally you have to escape it \$ or else it would mean an anchor to assert the end of the string / line.
This pattern ^/join/$ would therefore only match /join/
In your pattern you use an alternation where the last part /join/\.* would match /join/ but also /join/..... because when you escape the dot you will match it literally and the * quantifier repeats 0+ times.
Perhaps you are looking for:
^/join/(?:\?product.*\$)?$
This will match /join/ followed by an optional part (?:\?product.*\$)? that will match ?product, followed by any char 0+ times and will end on $.
Regex demo
Please, make the pattern lazy and $ is a special character for regex so need to escape that. (Regarding escaping part, google analytics may follow something else.) [] is used to capture a character in a range, be careful with that as well, as you are trying to capture a group I think.
\?product.*?\$

Trying to figure out how to capture text between slashes regex

I have a regex
/([/<=][^/]*[/=?])$/g
I'm trying to capture text between the last slashes in a file path
/1/2/test/
but this regex matches "/test/" instead of just test. What am I doing wrong?
You need to use lookaround assertions.
(?<=\/)[^\/]*(?=\/[^\/]*$)
DEMO
or
Use the below regex and then grab the string you want from group index 1.
\/([^\/]*)\/[^\/]*$
The easy way
Match:
every character that is not a "/"
Get what was matched here. This is done by creating a backreference, ie: put inside parenthesis.
followed by "/" and then the end of string $
Code:
([^/]*)/$
Get the text in group(1)
Harder to read, only if you want to avoid groups
Match exactly the same as before, except now we're telling the regex engine not to consume characters when trying to match (2). This is done with a lookahead: (?= ).
Code:
[^/]*(?=/$)
Get what is returned by the match object.
The issue with your code is your opening and closing slashes are part of your capture group.
Demo
text: /1/2/test/
regex: /\/(\[^\/\]*?)(?=\/)/g
captures a list of three: "1", "2", "test"
The language you're using affects the results. For instance, JavaScript might not have certain lookarounds, or may actually capture something in a non-capture group. However, the above should work as intended. In PHP, all / match characters must be escaped (according to regex101.com), which is why the cleaner [/] wasn't used.
If you're only after the last match (i.e., test), you don't need the positive lookahead:
/\/([^\/]*?)\/$/