How do I write a regex for this...? - regex

Given a string in the format someName_v0001, how do I write a regex that can give me the name (the bit before _v) and version (the bit after _v), where the version suffix is optional.
e.g. for the input
input => (name, version)
abc_v0001 => (abc, 0001)
abc_v10000 => (abc, 10000)
abc_vwx_v0001 => (abc_vwx,1)
abc => (abc, null)
I've tried this...
(.*)_v\(d*)
... but I don't know how to handle the case of the optional version suffix.

You can use
^(.*?)(?:_v0*(\d+))?$
See the regex demo.
Details
^ - start of string
(.*?) - Capturing group 1: any zero or more chars other than line break chars as few as possible
(?:_v0*(\d+))? - an optional sequence of
_v - a _v substring
0* - zero or more 0 chars
(\d+) - Capturing group 2: any one or more digits
$ - end of string.

To say that a character (or group of characters) is optional, use a ? after it.
For example ABC? would match both “AB” and “ABC”, while AB(CD)? would match both “AB” and “ABCSD”.
I assume you want to make the “_v” part of the version optional as well. In that case, you need to enclose it in a non-capturing group, (?: ), so that you can make it optional using ? without also capturing it.
The correct regex for your scenario is (.*)(?:_v(\d+))?. Capture group 1 will be the name and capture group 2 will be the version, if it exists.

You could try:
(.*)_v([\d]*)
The first capture group is name, the second is version.
() => capture result
.* => any character
_v => the string "_v" this should compensate for the special cases
[\d]* => any number of any digits

Related

How would you retrieve N characters using regex from an existing regex?

I would need to retrieve an email id from a email address.
(i.e. this-is-the-best.email#gmail.com => this-is-the-best.email)
The regex that I used is (.*)#.* .
Now I need truncate the string with N characters.
(i.e. N=7 => this-is N=30 =>this-is-the-best.email)
How would I add this to a existing regex?
Any other recommendations?
What about: ([^#]{1,7}).+?
this-is-the-best-email#gmail.com
short#hotmail.co.uk
Becomes:
this-is
short
I think that this is what you are looking for:
((.{1,7}).*)#.+
The first capturing group contains the full id and the second group contains up to 7 chars.
In your pattern (.*)#.* you don't need the trailing .* as it is optional, and the dot can match any character including spaces and the # itself which can match much more that just an email like address.
The thing of interest is the non whitespace chars excluding an # char before actually matching the #, and in that case you can use a capture group matching 7 non whitespace chars.
([^\s#]{7})[^\s#]*#
The pattern matches:
([^\s#]{7}) Capture group 1, match 7 non whitespace chars excluding #
[^\s#]* Optionally match any non whitespace char excluding #
# Match literally
Regex demo

How to capture nested named groups when referencing outer group by name?

In the list of integer numbers separated by comma, I need to capture (via a PCRE regex) the first occurrence of 12* (if any) and the first occurrence of 45* (if any). How do I do that?
I tried the following but it can only capture inside the first number in the sequence :(
(?P<number>(?P<n12>12\d)|(?P<n45>45\d)|\d+)(?:,(?P>number))*
Here's a sample string to test: 11,222,123,444,456,7. I expect to capture n12=123 and n45=456 here.
UPD
As a workaround, my own solution is to declare the delimiter optional (which it isn't), like this:
(?:,?(?P<number>(?P<n12>12\d)|(?P<n45>45\d)|\d+))*
- this works for me, but not in all cases (e.g. ,1234, 123,4, 1234 and ,123,4 are parsed identically) which i'd like to avoid if possible.
UPD2
N.B. C'mon, this is not the real task I'm faced with - it is just a simplified example. Here's another one so that you can get my idea better:
(?P<animal>(?P<cat>pussy|cat)|(?P<dog>doge|dog)|\w+)(?:,(?P>animal))*
pussy,mouse,dog,bird - has to capture: cat=pussy, dog=dog
Without named groups, you could capture either 12 or 45 in group 1, and for the second capture group recurse the first subpattern using (?1) and before it assert that it is not the same as what is already captured in group 1 using a negative lookahead with a backreference (?!\1)
^(?:\d+,)*?(12|45)(?:\d*(?:,\d+)*?,(?!\1)((?1)))?
Explanation
^ Start of string
(?:\d+,)*? Match as least as possible optional repetitions of 1+ digits and ,
(12|45)\d* Capture either 12 or 45 in group 1
(?: Non capture group
(?:,\d+)*?, Match as least as possible optional repetitions of , and 1+ digits and match ,
(?!\1) Negative lookahead, assert not what was captured in group 1
((?1)) Capture group 2, repeat the first subpattern
)? Close non capture group and make it optional to also allow matching 1 capture group
Regex demo
If you want named capture groups for a single or 2 group values, you can use an alternation with the J flag to allow duplicate subpattern names.
The pattern matches either first occurrence of 12 and then 45, or only 12 or only 45.
^(?:(?:\d+,)*?(?P<n12>12)\d*(?:,\d+)*?,(?P<n45>45)|(?:\d+,)*?(?P<n45>45)\d*(?:,\d+)*?,(?P<n12>12)|(?:\d+,)*?(?P<n12>12)|(?:\d+,)*?(?P<n45>45))
Regex demo
Looks like PCRE doesn't allow to capture named subpatterns nested inside a named pattern called by reference. So the exact answer to the asked question is "There's no way. Sorry".
But there's a workaround for this specific case: instead of referencing the subpattern:
(?P<animal>...)(?:,(?P>animal))*
- you may avoid referencing it:
(?:,(?P<animal>...))*
- but this would require the subject to have a leading delimiter in the beginning, which it doesn't have.
A bad workaround for this is to mark the delimiter as optional:
(?:,?(?P<animal>...))*
- but it allows strange sequences to match.
A better solution is to mark the delimiter conditionally required: if the subpattern has already matched at least once, then the delimiter is required, otherwise it must be omitted:
(?:(?(animal),)(?P<animal>...))*
i.e
(?:(?(animal),)(?P<animal>(?P<cat>pussy|cat)|(?P<dog>doge|dog)|\w+))*
N.B. This will only capture the last match for each subpattern (if any).

Using regex replacement in Sublime 3

I am trying to use replace in Sublime using regular expressions but I'm stuck. I tried various combinations but don't seem to be getting there.
This is the input and my desired output:
Input: N_BBP_c_46137_n
Output : BBP
I tried combinations of:
[^BBP]+\b
\*BBP*+\g
But none of the above (and many others) don't seem to work.
To turn N_BBP_c_46137_n into BBP and according to the comment just want that entire long name such as N_BBP_ to be replaced by only BBP* you might also use a capture group to keep BBP.
\bN_(BBP)_\S*
\bN_ Match N preceded by a word boundary
(BBP) Capture group 1, match BBP (or use [A-Z]+ to match 1+ uppercase chars)
_\S* Match _ followed by 0+ times a non whitespace char
In the replacement use the first capturing group $1
Regex demo
You may use
(N_)[^_]*(_c_\d+_n)
Replace with ${1}some new value$2.
Details
(N_) - Group 1 ($1 or ${1} if the next char is a digit): N_
[^_]* - any 0 or more chars other than _
-(_c_\d+_n) - Group 2 ($2): _c_, 1 or more digits and then _n.
See the regex demo.

Need to fix the perl regex to handle multiple cases

I'm trying to handle some cases of strings with the regex:
(.*note(?:'|")?\s*=>\s*)("|')?(.*?)\2(.*)
Strings:
note => "note goes here",
note => 'note goes here',
note => $note,
note => "$note",
note => '$note',
note => '$note'
note => $note . $note2 (can go longer, think it as key value of the perl hash)
# note => '$note',
There can be multiple spaces in start/end/in between. I need to capture " (or '), $note, ,or whatever is left after note_section. There can be # in beginning if this line is a comment, so, I've included .* in beginning. Given regex is failing in case 3 as there is \2 as null.
Edit:
Requirement is that I'm reading a file, and replacing the value of note with some tag say NOTETAG, and all other things around remain same, including inverted commas and spaces. For that,
we need to capture the everything from beginning till we start writing the value
We should capture inverted commas too, so that I can write it back exactly
We need to capture the value of the note
We should capture things after the note value as well.
e.g. note => "kamal" , will become note => "NOTETAG" , (notice we didnt ate , from last)
s{
\b
note
\s*
=>
\s*
\K
(?: (.*)
| '[^']*'
| "[^"]*"
)
}{
defined($1)
? $1 =~ s{\$note\b}{"NOTETAG"}gr
: '"NOTETAG"'
}exg;
Yuo could try (note\s*=>\s*(?:"|')?)[^'",]+
Explanation:
(...) - capturing group
note - match note literally
\s* - match zero or more of whitespaces
=> - match => literally
(?:..) - non-capturing group
"|' - alternation: match either ' or "
? - match preceding pattern zero or one time
[^'",]+ - negated character class - match one or more chraacters (due to + operator) other than ', ", ,
Demo
As a replacement use \1NOTETAG, where \1 means first capturing group

Regex: Regular expression to pick the nth parameter of the response

Consider the example below:
AT+CEREG?
+CEREG: "4s",123,"7021","28","8B7200B",8,,,"00000010","10110100"
The desired response would be to pick n
n=1 => "4s"
n=2 => 123
n=8 =>
n=10 => 10110100
In my case, I am enquiring some details from an LTE modem and above is the type of response I receive.
I have created this regex which captures the (n+1)th member under group 2 including the last member, however, I can't seem to work out how to pick the 1st parameter in the approach I have taken.
(?:([^,]*,)){5}([^,].*?(?=,|$))?
Could you suggest an alternative method or complete/correct mine?
You may start matching from : (or +CEREG: if it is a static piece of text) and use
:\s*(?:[^,]*,){min}([^,]*)
where min is the n-1 position of the expected value.
See the regex demo. This solution is std::regex compatible.
Details
: - a colon
\s* - 0+ whitespaces
(?:[^,]*,){min} - min occurrences of any 0+ chars other than , followed with ,
([^,]*) - Capturing group 1: 0+ chars other than ,.
A boost::regex solution might look neater since you may easily capture substrings inside double quotes or substrings consisting of chars other than whitespace and commas using a branch reset group:
:\s*(?:[^,]*,){0}(?|"([^"]*)"|([^,\s]+))
See the regex demo
Details
:\s*(?:[^,]*,){min} - same as in the first pattern
(?| - start of a branch reset group where each alternative branch shares the same IDs:
"([^"]*)" - a ", then Group 1 holding any 0+ chars other than " and then a " is just matched
| - or
([^,\s]+)) - (still Group 1): one or more chars other than whitespace and ,.