Trouble with non-capturing groups in regular expression - regex

I'm attempting to capture the 6 digit number in the following:
ObjectID: !nrtdms:0:!session:slonwswtest1:!database:TEST:!folder:ordinary,486150:
I tried the following regex:
\d+(?::$)
attempting to use a non-capturing group to strip the colon out of the returned match, but it returns the colon as in:
486150:
Any ideas what I'm doing wrong?

You want a positive lookahead:
\d+(?=:$)
A non-capturing group is simply a group that cannot be accessed via a backreference; they still are part of the match, nonetheless.
Alternatively, you can use
(\d+):$
and obtain the 1st match group.

You should use a positive lookahead rather than a non-capturing group
\d+(?=:$)

Non-capturing groups are groups that will not create a capture (to be used in backreferences or extracted from the match result). Nonetheless they will match the expression.
What you're looking for is lookahead - to test the expression but exclude it from the match:
\d+(?=:$)

Probably your regex tool is returning the complete match since you don't have any capture group there. Try to enclose the \d+ in a capture group, and find the way to get capture group 1 in your regex tool.
Alternatively, you can also use positive look-ahead:
\d+(?=:$)
And given that you want to capture 6 digits, you can make that explicit:
\d{6}

Related

How can I remove something from the middle of a string with regex?

I have strings which look like this:
/xxxxx/xxxxx-xxxx-xxxx-338200.html
With my regex:
(?<=-)(\d+)(?=\.html)
It matches just the numbers before .html.
Is it possible to write a regex that matches everything that surrounds the numbers (matches the .html part and the part before the numbers)?
In your current pattern you already use a capturing group. In that case you might also match what comes before and after instead of using the lookarounds
-(\d+)\.html
To get what comes before and after the digits, you could use 2 capturing groups:
^(.*-)\d+(\.html)$
Regex demo
In the replacement use the 2 groups.
This should do the job:
.*-\d+\.html
Explanation: .* will match anything until -\d+ say it should match a - followed by a sequence of digits before a \.html (where \. represents the character .).
To capture groups, just do (.*-)(\d+)(\.html). This will put everything before the number in a group, the number in another group and everything after the number in another group.

How to use lookahead and lookbehind in more than one capturing group

I am trying to use positive lookahead and lookbehind to extract data between parentheses and I need to use the same number of capture groups as there are number of parentheses. The problem I am facing is when I try to use more than one capture group then there are no matches but if I use only one group then it works fine. What changes do I have to make to my regex to make it match the appropriate data. The regex that I am using along with the data is here. I want to use this in AWS Athena to read data from my S3 bucket objects.
I have tried various other ways but settled on this method with lookahead and lookbehind as it ensures that the parentheses is not captured.
((?<=VERS\=\()[^\)]*(?=\)))((?<=UUID\=\()[^\)]*(?=\)))
The expected result is that the first capture group captures data from first parentheses and the second group captures data from the second parentheses.
If you want to match either of those, you could add a pipe | which means an alternation between the 2 parts and take the lookarounds outside of the capturing group.
Note that you don't have to escape the = the the ) inside the character class.
(?<=VERS=\()([^)]*)(?=\))|(?<=UUID=\()([^)]*)(?=\))
^
Regex demo
Instead of using lookarounds, you might also match the 2 parts:
VERS=\(([^)]+)\);UUID=\(([^)]+)\);
Regex demo

avoid repeatation of group members to be captured

i have regex characters and words as in regex but i want only single member to be selected from this group \d(?i)(R|k|M|E|next|prev){1,2}
valid are - 8RK, 6ME, 9Rnext
invalid are - 8MM,0RR, 9nextnext
Please suggest
As said in the comments, you might want to use lookarounds, namely a neg. lookahead here:
\d(?i)
(?:
(R|k|M|E|next|prev) # capture group 1
(?!\1) # make sure, there's not the same submatch in front
){1,2}
See a demo on regex101.com.
This regex should work :)
\d\w\w,\s\d\w\w,\s\w.*$

Regex with non-capturing group using stringr in R

I am trying to use non-capturing groups with the str_extract function from the stringr package. Here is an example:
library(stringr)
txt <- "foo"
str_extract(txt,"(?:f)(o+)")
This returns
"foo"
while i expect it to return only
"oo"
like in this post: https://stackoverflow.com/a/14244553/3750030
How do i use non-capturing groups in R to remove the content of the groups from the returned value while using it for matching?
When you are using regex (?:f)(o+) this won't Capture but it will match it for sure.
What capturing means is storing in memory for back-referencing, so that it can be used for repeated match in same string or replacing captured string.
like in this post: https://stackoverflow.com/a/14244553/3750030
You misunderstood that answer. Non-Capturing groups doesn't means Non-Matching. It's captured in $1 ( group 1 ) because there is no group prior to it.
If you wish to Only match suppose B followed by A then you should use positive lookbehind like this.
Regex: (?<=f)(o+)
Explanation:
(?<=f) This will look for f to be present behind the following token but won't match.
(o+) This will match and capture as group (here in $1)if previous condition is true.
Regex101 Demo

Non capturing group included in capture?

This text
"dhdhd89(dd)"
Matched against this regex
.+?(?:\()
..returns "dhdhd89(".
Why is the start parenthesis included in the capture?
Two different tools, as well as the .NET Regex class, returns the same result. So I gather there is something I don't understand about this.
The way I read my regex is.
Match any character, at least one occurrence. As few as possible.
The matched string should be followed by a start parenthesis, but not to be included in the capture.
I can find workaround, but I still want to know what is going on.
Just turn the non-capturing group to positive lookahead assertion.
.+?(?=\()
.+? non-greedy match of one or more characters followed by an opening parenthesis. Assertions won't match any characters but asserts whether a match is possible or not. But the non-capturing group will do the matching operation.
DEMO
You can just use this negation based regex to capture only text before a literal (:
^([^(]+)
When you use:
.+?(?:\()
Regex engine does match ( after initial text but it just doesn't return that in a captured group to you.
You havn't defined capture groups then I guess you display the whole match (group 0), you can do:
(.+?)(?:\()
and the string you want is in group 1
or use lookahead as #AvinashRaj said.