This text
"dhdhd89(dd)"
Matched against this regex
.+?(?:\()
..returns "dhdhd89(".
Why is the start parenthesis included in the capture?
Two different tools, as well as the .NET Regex class, returns the same result. So I gather there is something I don't understand about this.
The way I read my regex is.
Match any character, at least one occurrence. As few as possible.
The matched string should be followed by a start parenthesis, but not to be included in the capture.
I can find workaround, but I still want to know what is going on.
Just turn the non-capturing group to positive lookahead assertion.
.+?(?=\()
.+? non-greedy match of one or more characters followed by an opening parenthesis. Assertions won't match any characters but asserts whether a match is possible or not. But the non-capturing group will do the matching operation.
DEMO
You can just use this negation based regex to capture only text before a literal (:
^([^(]+)
When you use:
.+?(?:\()
Regex engine does match ( after initial text but it just doesn't return that in a captured group to you.
You havn't defined capture groups then I guess you display the whole match (group 0), you can do:
(.+?)(?:\()
and the string you want is in group 1
or use lookahead as #AvinashRaj said.
Related
Suppose I have the following regex that matches a string with a semicolon at the end:
\".+\";
It will match any string except an empty one, like the one below:
"";
I tried using this:
\".+?\";
But that didn't work.
My question is, how can I make the .+ part of the, optional, so the user doesn't have to put any characters in the string?
To make the .+ optional, you could do:
\"(?:.+)?\";
(?:..) is called a non-capturing group. It only does the matching operation and it won't capture anything. Adding ? after the non-capturing group makes the whole non-capturing group optional.
Alternatively, you could do:
\".*?\";
.* would match any character zero or more times greedily. Adding ? after the * forces the regex engine to do a shortest possible match.
As an alternative:
\".*\";
Try it here: https://regex101.com/r/hbA01X/1
I am trying to use regex to match anything but "id":digits part
I have come up with this "(\b(id":)(\d+)\b)" to find the id:byDigits pattern, but I need to negate that but haven't been able to get around it.
[{"age":1,"id":123,"value":"14"},
{"age":1,"id":4214,"value":"4324"},
{"age":3,"id":4244,"value":"545"}]
Any help is appreciated.
Simplest option is to capture the rest of the string into groups and use it in the substituion as below
Demo: https://regex101.com/r/cRVA5C/2/
Pattern: ^([\s\S]*?)\s*"id":\d+,?\s*([\s\S]*?)$
Breakdown:
([\s\S]*?): match any number of any characters before and after "id":. Capture it into groups \1 and \2
\s*"id":\d+,?\s*: match "id"=\d+, optionally preceded by spaces and optionally followed by spaces and ,.
In substituition, use \1\2, to get the desired output.
Note: Regex may not be the ideal tool for parsing JSON.
I need to create a regular expression to match everything except a specific URL for a given Referer. I currently have it to match but can't reverse it and create the negative for it.
What I currently have:
Referer:(http(s)?(:\/\/))?(www\.)?test.com(\/.*)?
In the list below:
Referer:http://www.test.online/
Referer:https://www.test.online/
Referer:https://www.test.tv/
Referer:https://www.blah.com/
Referer:https://www.test.com/
Referer:http://www.test.com/
Referer:http://test.com/
Referer:https://test.com/
It will match:
Referer:https://www.test.com/
Referer:http://www.test.com/
Referer:http://test.com/
Referer:https://test.com/
However, I would like it to match everything except for those.
This is for our WAF so unfortunately are restricted on the usage which can only be fulfilled searching for the HTTP Header being passed back.
Try this regex:
^(?!.*Referer:(http(s)?(:\/\/))?(www\.)?test.com(\/.*)?).*$
A good way to negate your regex is to use negative lookahead.
Explanation:
The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead [is any regex pattern].
Working example: https://regex101.com/r/QJfeBB/1
You could use an anchor ^ to assert the start of the string and use a negative lookahead to assert what is on the right is not what you want to match.
Note that you have to escape the dot to match it literally and you could omit the last part (\/.*)?.
If you don't use the capturing groups for later use you might also turn those into non capturing groups (?:) instead.
^(?!Referer:(https?(:\/\/))?(www\.)?test\.com).+$
regex101 demo
About the pattern
^ Start of the string
(?! Negative lookahead to assert what is on the right does not match
Referer:(https?(:\/\/))?(www\.)?test\.com Match your pattern
) Close negative lookahead
.+ Match any char except a newline 1+ times
$ Assert end of the string
I have a regex
/([/<=][^/]*[/=?])$/g
I'm trying to capture text between the last slashes in a file path
/1/2/test/
but this regex matches "/test/" instead of just test. What am I doing wrong?
You need to use lookaround assertions.
(?<=\/)[^\/]*(?=\/[^\/]*$)
DEMO
or
Use the below regex and then grab the string you want from group index 1.
\/([^\/]*)\/[^\/]*$
The easy way
Match:
every character that is not a "/"
Get what was matched here. This is done by creating a backreference, ie: put inside parenthesis.
followed by "/" and then the end of string $
Code:
([^/]*)/$
Get the text in group(1)
Harder to read, only if you want to avoid groups
Match exactly the same as before, except now we're telling the regex engine not to consume characters when trying to match (2). This is done with a lookahead: (?= ).
Code:
[^/]*(?=/$)
Get what is returned by the match object.
The issue with your code is your opening and closing slashes are part of your capture group.
Demo
text: /1/2/test/
regex: /\/(\[^\/\]*?)(?=\/)/g
captures a list of three: "1", "2", "test"
The language you're using affects the results. For instance, JavaScript might not have certain lookarounds, or may actually capture something in a non-capture group. However, the above should work as intended. In PHP, all / match characters must be escaped (according to regex101.com), which is why the cleaner [/] wasn't used.
If you're only after the last match (i.e., test), you don't need the positive lookahead:
/\/([^\/]*?)\/$/
I'm attempting to capture the 6 digit number in the following:
ObjectID: !nrtdms:0:!session:slonwswtest1:!database:TEST:!folder:ordinary,486150:
I tried the following regex:
\d+(?::$)
attempting to use a non-capturing group to strip the colon out of the returned match, but it returns the colon as in:
486150:
Any ideas what I'm doing wrong?
You want a positive lookahead:
\d+(?=:$)
A non-capturing group is simply a group that cannot be accessed via a backreference; they still are part of the match, nonetheless.
Alternatively, you can use
(\d+):$
and obtain the 1st match group.
You should use a positive lookahead rather than a non-capturing group
\d+(?=:$)
Non-capturing groups are groups that will not create a capture (to be used in backreferences or extracted from the match result). Nonetheless they will match the expression.
What you're looking for is lookahead - to test the expression but exclude it from the match:
\d+(?=:$)
Probably your regex tool is returning the complete match since you don't have any capture group there. Try to enclose the \d+ in a capture group, and find the way to get capture group 1 in your regex tool.
Alternatively, you can also use positive look-ahead:
\d+(?=:$)
And given that you want to capture 6 digits, you can make that explicit:
\d{6}