Capturing substrings if followed by certain string - regex

I'm trying to capture the name of the message that are between quotes. It can be formatted like
("AddMessage",function(e){...})
or i.e.
("AddMessage","RemoveMessage",function(e){...})
I'm trying to capture the name of the messages with a regex but, on the regex I tried
\("(?<name>\w+Message)"(,"(\w+Message)")*,function\([\w,]*\){.*?}\)
Regex101 tells me to make a group recursive to capture multiples message's name.
A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data.
I really can't get it, can anyone help me ?

You'll have to capture the repeated sections into an enclosing capture group.
Add a named capture group around the repeated sections, and use "grouping only" (no capture group is captured) for the inner:
\("(?<name>\w+Message)",?(?<messages>(?:"\w+Message",?)*),function\([\w,]*\){.*?}\)
regex101 demo

Related

Regex Get number in Predictable String with *One* Capturing group

GCP Logging (Log Based Metrics) regex requires that I only use 1 capturing group for my Regex Filter, as a result, I can't use the look ahead and behind's that I would prefer. How can I get the number after KeyOne:?
Log: KeyOne:32|KeyTwo:0|KeyThree:|Language:english
Desired Capture: 32
If I were allowed multiple capturing groups, this would work: (?<=KeyOne:)([0-9]+)(?=|)
Is there any way to do this with one Capturing Group?
What you want to do can be done exactly with a single capturing group without lookarounds because the regex is handled with a function that returns only the captured substring if the capturing group is defined in the pattern.
In your case, you may simply convert non-consuming pattern parts to consuming patterns:
KeyOne:([0-9]+)\|
KeyOne:([0-9]+)
Note you only need \| at the end if you only want the match to occur when there is a | char following the number.

How to use lookahead and lookbehind in more than one capturing group

I am trying to use positive lookahead and lookbehind to extract data between parentheses and I need to use the same number of capture groups as there are number of parentheses. The problem I am facing is when I try to use more than one capture group then there are no matches but if I use only one group then it works fine. What changes do I have to make to my regex to make it match the appropriate data. The regex that I am using along with the data is here. I want to use this in AWS Athena to read data from my S3 bucket objects.
I have tried various other ways but settled on this method with lookahead and lookbehind as it ensures that the parentheses is not captured.
((?<=VERS\=\()[^\)]*(?=\)))((?<=UUID\=\()[^\)]*(?=\)))
The expected result is that the first capture group captures data from first parentheses and the second group captures data from the second parentheses.
If you want to match either of those, you could add a pipe | which means an alternation between the 2 parts and take the lookarounds outside of the capturing group.
Note that you don't have to escape the = the the ) inside the character class.
(?<=VERS=\()([^)]*)(?=\))|(?<=UUID=\()([^)]*)(?=\))
^
Regex demo
Instead of using lookarounds, you might also match the 2 parts:
VERS=\(([^)]+)\);UUID=\(([^)]+)\);
Regex demo

Regular expression to match text after another regex matched text

In general, I would like to match a text with a pattern and match the text after that with another pattern. This sounds blurry I assume, so look at this example:
https://regex101.com/r/i35XhG/1
In the example I am matching for "Chassis ID :" where I do not know the number of spaces between "Chassis ID" and ":", therefore I added \s+. The second capturing group matches a specially formatted series of hexadecimal numbers.
Now my goal is to isolate the hexadecimal part in the result but I only get that together with "Chassis ID :". How can I accomplish this ?
This is a general problem for me to match something dynamic in length, but only care and retrieve what comes afterwards.
Thank you in advance.
All you need to do is to wrap your capturing group into parenthesis, like:
(Chassis ID\s+: )()([0-9a-f]+:[0-9a-f]+:[0-9a-f]+:[0-9a-f]+:[0-9a-f]+:[0-9a-f]+)
Now you can access 3c:61:04:65:22:80 by \3.
Look here: https://regex101.com/r/14vr2O/1 now you can see Group 3. with your value.
And also you may simplify your regex to this one:
(Chassis ID\s+: )((?:[0-9a-f]+:){5}[0-9a-f]+)
( and ) create a capturing group in a regex Your first group is capturing Chassis ID\s+: and your second group is capturing nothing. Remove the ( and) around Chassis ID\s+: and move the closing ) of the second capturing group to the end of the regex. Now you can access 3c:61:04:65:22:80 by the first capturing group.
Chassis ID\s+: ([0-9a-f]+:[0-9a-f]+:[0-9a-f]+:[0-9a-f]+:[0-9a-f]+:[0-9a-f]+)

How to get the characters surround specific pattern in one capturing group?

I have this string:
any123thing
Here is my specific pattern:
\d+ // which matches '123' in the string above
Now I want to get anything in the one capturing group. Is it possible?
Here is what I have tried so far:
(\w+(?:\d+)\w+)
But $1 in this ^ regex is any123thing. While I want to get this: anything.
Note: I don't want to use replace function.
You cannot
According to regular-expression.info (emphasis mine):
Besides grouping part of a regular expression together, parentheses also create a numbered capturing group. It stores the part of the string matched by the part of the regular expression inside the parentheses.
Consider you example:
(\w+(?:\d+)\w+)
Everything "inside the parentheses" is captured, including the non-capturing group.
In this case, it is effectively equivalent to using just the outer capturing group:
(\w+\d+\w+)
Whether you have a capturing group, non-capturing group, or no group at all inside of another group. The parent group will capture everything "inside the parentheses".
Non-capturing groups are a tool for optimization when you don't have the need to use a back reference. But don't let the name fool you: if they are inside of another group, that group still captures the match. In other words, they do not exclude themselves from parent groups.
#Tushar had suggested using ([a-zA-Z]+)\d*([a-zA-Z]+) and using the $1$2 captured group back references in tandem. This is the only approach if you're using regular expressions.

Regex Exclude Character From Group

I have a response:
MS1:111980613994
124 MS2:222980613994124
I have the following regex:
MS\d:(\d(?:\r?\n?)){15}
According to Regex, the "(?:\r?\n?)" part should let it match for the group but exclude it from the capture (so I get a contiguous value from the group).
Problem is that for "MS1:xxx" it matches the [CR][LF] and includes it in the group. It should be excluded from the capture ...
Help please.
The (?:...) syntax does not mean that the enclosed pattern will be excluded from any capture groups that enclose the (?:...).
It only means that that the group formed by (?:...) will be a non-capturing group, as opposed to a new capture group.
Put another way:
(?:...) only groups
(...) has two functions: it both groups and captures.
Capture groups capture all of the text matched by the pattern they enclose, even the parts that are matched by nested groups (whether they are capturing or not).
An example
With the regex...
.*(l.*(o.*o).*l).*
...there are two capture groups. If we match this against hello world we get the following captures:
1: lo worl
2: o wo
Note that the text captured by group 2 is also captured by group 1.
If we change the inner group to be non-capturing...
.*(l.*(?:o.*o).*l).*
...group 1's capture will not be changed (when matched against the same string), but there is no longer a group 2:
1: lo worl
As you can see, if a non-capturing group is enclosed by a capture group, that enclosing capture group will capture the characters matched by the non-capturing group.
What are they For?
The purpose of non-capturing groups is not to exclude content from other capturing groups, but rather to act as a way to group operations without also capturing.
For example, if you want to repeat a substring, you might write (?:substring)*.
How do I solve my real problem?
If you really want to ignore embedded \rs and \ns your best bet is to strip them out in a second step. You don't say what language you're using, but something equivalent to this (Python) should work:
s = re.sub(r'[\r\n]', '', s)
Perhaps what you mean to do here is place the [CR][LF] matching part outside of the captured group, something like: MS\d:(\d){15}(?:\r?\n?)
So far as I know, you'll have to use 2 regexes. One is "MS\d:(\d(?:\r?\n?)){15}", the other is used to remove the line breaks from the matches.
Please refer to "Regular expression to skip character in capture group".
How about MS\d:(?:(\d)\r?\n?){15}