I need to do a non greedy match and hope someone can help me. I have the following, and I am using JavaScript and ASP
match(/\href=".*?\/pdf\/.*?\.pdf/)
The above match, matches the first start of an href tag. I need it to only match the last href that is part of the /pdf/ folder.
any ideas ?
You need to use capturing parenthesis for sub-expression matches:
match(/\href=".*?(\/pdf\/.*?\.pdf)/)[1];
Match will return an array with the entire match at index 0, all sub expression captures will be added to the array in the order they matched. In this case, index 1 contains the section matching \/pdf\/.*?\.pdf.
Try and make your regex more specific than just .*? if it's matching too broadly. For instance:
match(/\href="([^"]+?\/pdf\/[^\.]+?\.pdf)"/)[1];
[^"]+? will lazily match a string of characters that doesn't contain the double quote character. This will limit the match to staying within the quotes, so the match won't be too broad in the following string, for instance:
TestSome PDF
Related
I have a string and would like to match a part of it.
The string is Accept: multipart/mixedPrivacy: nonePAI: <sip:4168755400#1.1.1.238>From: <sip:4168755400#1.1.1.238>;tag=5430960946837208_c1b08.2.3.1602135087396.0_1237422_3895152To: <sip:4168755400#1.1.1.238>
I want to match PAI: <sip:4168755400#
the whitespace can be a word so i would like to use .* but if i used that it matches most of the string
The example on that link is showing what i'm matching if i use the whitespace instead of .*
(PAI: <sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
The example on that link is showing what i'm trying to achieve with .* but it should only match PAI: <sip:4168755400#
(PAI:.*<sip:)((?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4})#
I tried lookaround but failing.
Any idea?
thanks
Matching the single space can be updated by using a character class matching either a space or a word character and repeat that 1 or more times to match at least a single occurrence.
Note that you don't have to escape the spaces, and in both occasions you can use an optional character class matching either a space or hyphen [ -]?
If you want the match only, you can omit the 2 capturing groups if you want to.
(PAI:[ \w]+<sip:)((?:\([2-9]\d{2}\) ?|[2-9]\d{2}[ -]?)[2-9]\d{2}[- ]?\d{4})#
Regex demo
The regex should be like
PAI:.*?(<sip:.*?#)
Explanation:
PAI:.*? find the word PAI: and after the word it can be anything (.*) but ? is used to indicate that it should match as few as possible before it found the next expression.
(<sip:.*?#) capturing group that we want the result.
<sip:.*?# find <sip: and after the word it can be anything .*? before it found #.
Example
I am trying to match a url param and this param's position is not fixed in the uri. It can show up sometime right after the ? or after the &. I need to match vr=359821 param in the below uri's. How can I do this.
Example urls:
/br/col/aon/11631?vr=359821&cId=9113
/br/col/aon/11631?cId=9113&vr=359821
/br/col/aon/11631?cId=9113&vr=359821&grid=2&page=something
Somethings I tried:
I tried to use backreferencing (not sure if this is right approach) but was not successful.
I was trying to group them and may be backreference to find the string within that group.
(\/br\/col\/aon\/11631)(\?cId=9113&(vr=359821)) # this matches second url above but not others.
(\/br\/col\/aon\/11631)(\?cId=9113&(vr=359821)).+?\3 # this is wrong I know.
(\/br\/col\/aon\/11631)(\?cId=9113&(vr=359821)).*?\2[vr=359821] # this is wrong
Above regex are wrong but my idea was to make it a group and match vr=359821 in that group. I dont know if this is even possible in regex.
why I am doing this:
The final goal is to redirect this url to a different url with all the params from original request in ngnix.
In the last 2 patterns that you tried, you are using a backreference like \2 and \3. But a backreference will match the same data that was already captured in the corresponding group.
In this case, that is not the desired behaviour. Instead, you want to match a key value pair in the uri, which does not have to exist in the content before.
Therefore you can match the start of the pattern followed by a non greedy quantifier (as it can also occur right after the question mark) to match the first occurrence of vr= followed by 1 or more digits.
In the comments I suggested this pattern \/br\/col\/aon\/11631\b.*?[?&](vr=\d+), but (depending on the regex delimiters) you don't have to escape the forward slash.
The pattern could be
/br/col/aon/11631\b.*?[?&](vr=\d+)
The pattern matches
/br/col/aon/11631\b Match the start of the pattern followed by a word boundary
.*? Match any char as least as possible
[?&] Match either ? or &
(vr=\d+) Capture group 1, match vr= followed by 1+ digits
Regex demo
From what I read is that nginx uses PCRE. To get a more specific pattern, one option could be:
/br/col/aon/11631\?.*?(?<=[?&])(vr=\d+)(?=\&|$)
This pattern matches
/br/col/aon/11631\? Match the start of the pattern followed by the question mark
.*? Match any char as least as possible
(?<=[?&]) Positive lookbehind, assert what is directy to the left is either ? or &
(vr=\d+) Capture group 1, match vr= followed by 1+ digits
(?=\&|$) Positive lookahead, assert what is directly to the right is & or the end of the string to prevent a partial match
Regex demo
I have a regex expression that is matching URLs in a string which are not between quotes. This is working great but I have a minor issue with it.
The part that is dealing with the quotes is capturing the first character (can also be a white space) before the URL (usually https).
Here is the regex expression:
/(?:^|[^"'])(ftp|http|https|file):\/\/[\S]+(\b|$)/gim
You can test it out and you will see this unwanted match happening in front of the URL (if you type anything in front of the URL of course).
How do I get the proper Full match?
The non-capturing group (?:^|[^"']) is matching and consuming the char other than ' and " with the [^'"] negated character class. As that char is consumed, it is added to the whole match value. What a capturing group does not do is adding the matched substring to a separate memory buffer, and thus you cannot access it later after a match is found.
The usual solutions are:
A capturing group around the part of the regex you need to extract and then getting the corresponding submatch (e.g. with (?:^|[^"'])((?:ftp|https?|file):\/\/\S+)(?:\b|$) pattern)
Using a lookaround, here, a (?<!["']) negative lookbehind that only matches a location that is not immediately preceded with ' or ": (?<!["'])(?:ftp|https?|file):\/\/\S+(?:\b|$).
I'm trying to match first occurrence of window.location.replace("http://stackoverflow.com") in some HTML string.
Especially I want to capture the URL of the first window.location.replace entry in whole HTML string.
So for capturing URL I formulated this 2 rules:
it should be after this string: window.location.redirect("
it should be before this string ")
To achieve it I think I need to use lookbehind (for 1st rule) and lookahead (for 2nd rule).
I end up with this Regex:
.+(?<=window\.location\.redirect\(\"?=\"\))
It doesn't work. I'm not even sure that it legal to mix both rules like I did.
Can you please help me with translating my rules to Regex? Other ways of doing this (without lookahead(behind)) also appreciated.
The pattern you wrote is really not the one you need as it matches something very different from what you expect: text window.location.redirect("=") in text window.location.redirect("=") something. And it will only work in PCRE/Python if you remove the ? from before \" (as lookbehinds should be fixed-width in PCRE). It will work with ? in .NET regex.
If it is JS, you just cannot use a lookbehind as its regex engine does not support them.
Instead, use a capturing group around the unknown part you want to get:
/window\.location\.redirect\("([^"]*)"\)/
or
/window\.location\.redirect\("(.*?)"\)/
See the regex demo
No /g modifier will allow matching just one, first occurrence. Access the value you need inside Group 1.
The ([^"]*) captures 0+ characters other than a double quote (URLs you need should not have it). If these URLs you have contain a ", you should use the second approach as (.*?) will match any 0+ characters other than a newline up to the first ").
I am trying to match with the following regex.
\d{11}(.*)
Which is any 11 digits followed by a string. I want to extract the tailing string whatever it is.
I used RE2::FullMatch but it gives the first half (the 11 digits). How to get the sub-string matched with (.*) ?
string subStr
RE2::FullMatch("<sip:+19073381121#216.67.108.201:5060;user=phone>;npi=ISDN",(<sip:\+(\d{11}))(.*), &subStr);
I am trying to extract everything starting from # in above string. Basically I want what matches to (.*) but the above function returns <sip:+19073381121.
I am not very familiar with regex but I looked at different APIs to extract substrings and found this one usefull
Remove the extra capturing groups from your regular expression.
<sip:\+\d{11}(.*)
To get the sub string matched with (.*) use $1. That is the first capturing group that you specified with the brackets.