Regex: Regular expression to pick the nth parameter of the response - c++

Consider the example below:
AT+CEREG?
+CEREG: "4s",123,"7021","28","8B7200B",8,,,"00000010","10110100"
The desired response would be to pick n
n=1 => "4s"
n=2 => 123
n=8 =>
n=10 => 10110100
In my case, I am enquiring some details from an LTE modem and above is the type of response I receive.
I have created this regex which captures the (n+1)th member under group 2 including the last member, however, I can't seem to work out how to pick the 1st parameter in the approach I have taken.
(?:([^,]*,)){5}([^,].*?(?=,|$))?
Could you suggest an alternative method or complete/correct mine?

You may start matching from : (or +CEREG: if it is a static piece of text) and use
:\s*(?:[^,]*,){min}([^,]*)
where min is the n-1 position of the expected value.
See the regex demo. This solution is std::regex compatible.
Details
: - a colon
\s* - 0+ whitespaces
(?:[^,]*,){min} - min occurrences of any 0+ chars other than , followed with ,
([^,]*) - Capturing group 1: 0+ chars other than ,.
A boost::regex solution might look neater since you may easily capture substrings inside double quotes or substrings consisting of chars other than whitespace and commas using a branch reset group:
:\s*(?:[^,]*,){0}(?|"([^"]*)"|([^,\s]+))
See the regex demo
Details
:\s*(?:[^,]*,){min} - same as in the first pattern
(?| - start of a branch reset group where each alternative branch shares the same IDs:
"([^"]*)" - a ", then Group 1 holding any 0+ chars other than " and then a " is just matched
| - or
([^,\s]+)) - (still Group 1): one or more chars other than whitespace and ,.

Related

Using regex replacement in Sublime 3

I am trying to use replace in Sublime using regular expressions but I'm stuck. I tried various combinations but don't seem to be getting there.
This is the input and my desired output:
Input: N_BBP_c_46137_n
Output : BBP
I tried combinations of:
[^BBP]+\b
\*BBP*+\g
But none of the above (and many others) don't seem to work.
To turn N_BBP_c_46137_n into BBP and according to the comment just want that entire long name such as N_BBP_ to be replaced by only BBP* you might also use a capture group to keep BBP.
\bN_(BBP)_\S*
\bN_ Match N preceded by a word boundary
(BBP) Capture group 1, match BBP (or use [A-Z]+ to match 1+ uppercase chars)
_\S* Match _ followed by 0+ times a non whitespace char
In the replacement use the first capturing group $1
Regex demo
You may use
(N_)[^_]*(_c_\d+_n)
Replace with ${1}some new value$2.
Details
(N_) - Group 1 ($1 or ${1} if the next char is a digit): N_
[^_]* - any 0 or more chars other than _
-(_c_\d+_n) - Group 2 ($2): _c_, 1 or more digits and then _n.
See the regex demo.

How to capture group no of every group in a repeated capturing group

My regex is something like this **(A)(([+-]\d{1,2}[YMD])*)** which is matching as expected like A+3M, A-3Y+5M+3D etc..
But I want to capture all the groups of this sub pattern**([+-]\d{1,2}[YMD])***
For the following example A-3M+2D, I can see only 4 groups. A-3M+2D (group 0), A(group 1), -3M+2D (group 2), +2D (group 3)
Is there a way I can get the **-3M** as a separate group?
Repeated capturing groups usually capture only the last iteration. This is true for Kotlin, as well as Java, as the languages do not have any method that would keep track of each capturing group stack.
What you may do as a workaround, is to first validate the whole string against a certain pattern the string should match, and then either extract or split the string into parts.
For the current scenario, you may use
val text = "A-3M+2D"
if (text.matches("""A(?:[+-]\d{1,2}[YMD])*""".toRegex())) {
val results = text.split("(?=[-+])".toRegex())
println(results)
}
// => [A, -3M, +2D]
See the Kotlin demo
Here,
text.matches("""A(?:[+-]\d{1,2}[YMD])*""".toRegex()) makes sure the whole string matches A and then 0 or more occurrences of + or -, 1 or 2 digits followed with Y, M or D
.split("(?=[-+])".toRegex()) splits the text with an empty string right before a - or +.
Pattern details
^ - implicit in .matches() - start of string
A - an A substring
(?: - start of a non-capturing group:
[+-] - a character class matching + or -
\d{1,2} - one to two digits
[YMD] - a character class that matches Y or M or D
)* - end of the non-capturing group, repeat 0 or more times (due to * quantifier)
\z - implicit in matches() - end of string.
When splitting, we just need to find locations before - or +, hence we use a positive lookahead, (?=[-+]), that matches a position that is immediately followed with + or -. It is a non-consuming pattern, the + or - matched are not added to the match value.
Another approach with a single regex
You may also use a \G based regex to check the string format first at the start of the string, and only start matching consecutive substrings if that check is a success:
val regex = """(?:\G(?!^)[+-]|^(?=A(?:[+-]\d{1,2}[YMD])*$))[^-+]+""".toRegex()
println(regex.findAll("A-3M+2D").map{it.value}.toList())
// => [A, -3M, +2D]
See another Kotlin demo and the regex demo.
Details
(?:\G(?!^)[+-]|^(?=A(?:[+-]\d{1,2}[YMD])*$)) - either the end of the previous successful match and then + or - (see \G(?!^)[+-]) or (|) start of string that is followed with A and then 0 or more occurrences of +/-, 1 or 2 digits and then Y, M or D till the end of the string (see ^(?=A(?:[+-]\d{1,2}[YMD])*$))
[^-+]+ - 1 or more chars other than - and +. We need not be too careful here since the lookahead did the heavy lifting at the start of string.

Filter out a expression from Regex match

I have a regex query which works fine for most of the input patterns but few.
Regex query I have is
("(?!([1-9]{1}[0-9]*)-(([1-9]{1}[0-9]*))-)^(([1-9]{1}[0-9]*)|(([1-9]{1}[0-9]*)( |-|( ?([1-9]{1}[0-9]*))|(-?([1-9]{1}[0-9]*)){1})*))$")
I want to filter out a certain type of expression from the input strings i.e except for the last character for the input string every dash (-) should be surrounded by the two separate integers i.e (integer)(dash)(integer).
Two dashes sharing 3 integers is not allowed even if they have integers on either side like (integer)(dash)(integer)(dash)(integer).
If the dash is the last character of input preceded by the integer that's an acceptable input like (integer)(dash)(end of the string).
Also, two consecutive dashes are not allowed. Any of the above-mentioned formats can have space(s) between them.
To give the gist, these dashes are used in my input string to provide a range.
Some example of expressions that I want to filter out are
1-5-10, 1 - 5 - 10, 1 - - 5, -5
Update - There are some rules which will drive the input string. My job is to make sure I allow only those input strings which follow the format. Rules for the format are -
1. Space (‘ ‘) delimited numbers. But dash line doesn’t need to have a space. For example, “10 20 - 30” or “10 20-30” are all valid values.
2. A dash line (‘-‘) is used to set range (from, to). It also can used to set to the end of job queue list. For example, “100-150 200-250 300-“ is a valid value.
3. A dash-line without start job number is not allowed. For example, “-10” is not allowed.
Thanks
You might use:
^(?:(?:[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]*|[1-9][0-9]*)(?: (?:[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]*|[1-9][0-9]*))*(?: [1-9][0-9]*-)?|[1-9][0-9]*-?)[ ]*$
Regex demo
Explanation
^ Assert start of the string
(?: Non capturing group
(?: Non capturing group
[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]* Match number > 0, an optional space, a dash, an optional space and number > 0. The space is in a character class [ ] for clarity.
| Or
[1-9][0-9]* Match number > 0
) Close non capturing group
(?:[ ] Non capturing group followed by a space
(?: Non capturing group
[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]* Match number > 0, an optional space, a dash, an optional space and number > 0.
| Or
[1-9][0-9]* Match number > 0
) close non capturing group
)* close non capturing group and repeat zero or more times
(?: [1-9][0-9]*-)? Optional part that matches a space followed by a number > 0
| Or
[1-9][0-9]*-? Match a number > 0 followed by an optional dash
) close non capturing group
[ ]*$ Match zero or more times a space and assert the end of the string
NoteIf you want to match zero or more times a space instead of an optional space, you could update [ ]? to [ ]*. You can write [1-9]{1} as [1-9]
After the update the question got quite a lot of complexity. Since some parts of the regex are reused multiple times I took the liberty of working this out in Ruby and cleaned it up afterwards. I'll show you the build process so the regex can be understood. Ruby uses #{variable} for regex and string interpolation.
integer = /[1-9][0-9]*/
integer_or_range = /#{integer}(?: *- *#{integer})?/
integers_or_ranges = /#{integer_or_range}(?: +#{integer_or_range})*/
ending = /#{integer} *-/
regex = /^(?:#{integers_or_ranges}(?: +#{ending})?|#{ending})$/
#=> /^(?:(?-mix:(?-mix:(?-mix:[1-9][0-9]*)(?: *- *(?-mix:[1-9][0-9]*))?)(?: +(?-mix:(?-mix:[1-9][0-9]*)(?: *- *(?-mix:[1-9][0-9]*))?))*)(?: +(?-mix:(?-mix:[1-9][0-9]*) *-))?|(?-mix:(?-mix:[1-9][0-9]*) *-))$/
Cleaning up the above regex leaves:
^(?:[1-9][0-9]*(?: *- *[1-9][0-9]*)?(?: +[1-9][0-9]*(?: *- *[1-9][0-9]*)?)*(?: +[1-9][0-9]* *-)?|[1-9][0-9]* *-)$
You can replace [0-9] with \d if you like, but since you used the [0-9] syntax in your question I used it for the answer as well. Keep in mind that if you do replace [0-9] with \d you'll have to escape the backslash in string context. eg. "[0-9]" equals "\\d"
You mention in your question that
Any of the above-mentioned formats can have space(s) between them.
I assumed that this means space(s) are not allowed before or after the actual content, only between the integers and -.
Valid:
15 - 10
1234 -
Invalid:
15 - 10
123
If this is not the case simply add * to the start and end.
^ *... *$
Where ... is the rest of the regex.
You can test the regex in my demo, but it should be clear from the build process what the regex does.
var
inputs = [
'1-5-10',
'1 - 5 - 10',
'1 - - 5',
'-5',
'15-10',
'15 - 10',
'15 - 10',
'1510',
'1510-',
'1510 -',
'1510 ',
' 1510',
' 15 - 10',
'10 20 - 30',
'10 20-30',
'100-150 200-250 300-',
'100-150 200-250 300- ',
'1-2526-27-28-',
'1-25 26-2728-',
'1-25 26-27 28-',
],
regex = /^(?:[1-9][0-9]*(?: *- *[1-9][0-9]*)?(?: +[1-9][0-9]*(?: *- *[1-9][0-9]*)?)*(?: +[1-9][0-9]* *-)?|[1-9][0-9]* *-)$/,
logInputAndMatch = input => {
console.log(`input: "${input}"`);
console.log(input.match(regex))
};
inputs.forEach(logInputAndMatch);

Regular Expression to parse group of strings with quotes separated by space

Given a line of string that does not have any linebreak, I want to get groups of strings which may consist of quotes and separated by space. Space is allowed only if it's within quotes. E.g.
a="1234" gg b=5678 c="1 2 3"
The result should have 4 groups:
a="1234"
gg
b=5678
c="1 2 3"
So far I have this
/[^\s]+(=".*?"|=".*?[^s]+|=[^\s]+|=)/g
but this cannot capture the second group "gg". I can't check if there is space before and after the text, as this will include the string that has space within quotes.
Any help will be greatly appreciated! Thanks.
Edited
This is for javascript
In JavaScript, you may use the following regex:
/\w+(?:=(?:"[^"]*"|\S+)?)?/g
See the regex demo.
Details
\w+ - 1+ letters, digits or/and _
(?:=(?:"[^"]*"|\S+)?)? - an optional sequence of:
= - an equal sign
(?:"[^"]*"|\S+)? - an optional sequence of:
"[^"]*" - a ", then 0+ chars other than " and then "
| - or
\S+ - 1+ non-whitespace chars
JS demo:
var rx = /\w+(?:=(?:"[^"]*"|\S+)?)?/g;
var s = 'a="1234" gg b=5678 c="1 2 3" d=abcd e=';
console.log(s.match(rx));
if I did not misunderstand what you are saying this is what you are looking for.
\w+=(?|"([^"]*)"|(\d+))|(?|[a-z]+)
think of the or works as a fallback option there for use more complex one in front of the more generic ones.
alternatively, you can remove second ?| and it will capture it as a different group so you can check that group (group 2)

R- regex extracting a string between a dash and a period

First of all I apologize if this question is too naive or has been repeated earlier. I tried to find it in the forum but I'm posting it as a question because I failed to find an answer.
I have a data frame with column names as follows;
head(rownames(u))
[1] "A17-R-Null-C-3.AT2G41240" "A18-R-Null-C-3.AT2G41240" "B19-R-Null-C-3.AT2G41240"
[4] "B20-R-Null-C-3.AT2G41240" "A21-R-Transgenic-C-3.AT2G41240" "A22-R-Transgenic-C-3.AT2G41240"
What I want is to use regex in R to extract the string in between the first dash and the last period.
Anticipated results are,
[1] "R-Null-C-3" "R-Null-C-3" "R-Null-C-3"
[4] "R-Null-C-3" "R-Transgenic-C-3" "R-Transgenic-C-3"
I tried following with no luck...
gsub("^[^-]*-|.+\\.","\\2", rownames(u))
gsub("^.+-","", rownames(u))
sub("^[^-]*.|\\..","", rownames(u))
Would someone be able to help me with this problem?
Thanks a lot in advance.
Shani.
Here is a solution to be used with gsub:
v <- c("A17-R-Null-C-3.AT2G41240", "A18-R-Null-C-3.AT2G41240", "B19-R-Null-C-3.AT2G41240", "B20-R-Null-C-3.AT2G41240", "A21-R-Transgenic-C-3.AT2G41240", "A22-R-Transgenic-C-3.AT2G41240")
gsub("^[^-]*-([^.]+).*", "\\1", v)
See IDEONE demo
The regex matches:
^[^-]* - zero or more characters other than -
- - a hyphen
([^.]+) - Group 1 matching and capturing one or more characters other than a dot
.* - any characters (even including a newline since perl=T is not used), any number of occurrences up to the end of the string.
This can easily be achieved with the following regex:
-([^.]+)
# look for a dash
# then match everything that is not a dot
# and save it to the first group
See a demo on regex101.com. Outputs are:
R-Null-C-3
R-Null-C-3
R-Null-C-3
R-Null-C-3
R-Transgenic-C-3
R-Transgenic-C-3
Regex
-([^.]+)\\.
Description
- matches the character - literally
1st Capturing group ([^\\.]+)
[^\.]+ match a single character not present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
. matches the character . literally
\\. matches the character . literally
Debuggex Demo
Output
MATCH 1
1. [4-14] `R-Null-C-3`
MATCH 2
1. [29-39] `R-Null-C-3`
MATCH 3
1. [54-64] `R-Null-C-3`
MATCH 4
1. [85-95] `R-Null-C-3`
MATCH 5
1. [110-126] `R-Transgenic-C-3`
MATCH 6
1. [141-157] `R-Transgenic-C-3`
This seems an appropriate case for lookarounds:
library(stringr)
str_extract(v, '(?<=-).*(?=\\.)')
where
(?<= ... ) is a positive lookbehind, i.e. it looks for a - immediately before the next captured group;
.* is any character . repeated 0 or more times *;
(?= ... ) is a positive lookahead, i.e. it looks for a period (escaped as \\.) following what is actually captured.
I used stringr::str_extract above because it's more direct in terms of what you're trying to do. It is possible to do the same thing with sub (or gsub), but the regex has to be uglier:
sub('.*?(?<=-)(.*)(?=\\.).*', '\\1', v, perl = TRUE)
.*? looks for any character . from 0 to as few as possible times *? (lazy evaluation);
the lookbehind (?<=-) is the same as above;
now the part we want .* is put in a captured group (...), which we'll need later;
the lookahead (?=\\.) is the same;
.* captures any character, repeated 0 to as many as possible times (here the end of the string).
The replacement is \\1, which refers to the first captured group from the pattern regex.