Need to fix the perl regex to handle multiple cases - regex

I'm trying to handle some cases of strings with the regex:
(.*note(?:'|")?\s*=>\s*)("|')?(.*?)\2(.*)
Strings:
note => "note goes here",
note => 'note goes here',
note => $note,
note => "$note",
note => '$note',
note => '$note'
note => $note . $note2 (can go longer, think it as key value of the perl hash)
# note => '$note',
There can be multiple spaces in start/end/in between. I need to capture " (or '), $note, ,or whatever is left after note_section. There can be # in beginning if this line is a comment, so, I've included .* in beginning. Given regex is failing in case 3 as there is \2 as null.
Edit:
Requirement is that I'm reading a file, and replacing the value of note with some tag say NOTETAG, and all other things around remain same, including inverted commas and spaces. For that,
we need to capture the everything from beginning till we start writing the value
We should capture inverted commas too, so that I can write it back exactly
We need to capture the value of the note
We should capture things after the note value as well.
e.g. note => "kamal" , will become note => "NOTETAG" , (notice we didnt ate , from last)

s{
\b
note
\s*
=>
\s*
\K
(?: (.*)
| '[^']*'
| "[^"]*"
)
}{
defined($1)
? $1 =~ s{\$note\b}{"NOTETAG"}gr
: '"NOTETAG"'
}exg;

Yuo could try (note\s*=>\s*(?:"|')?)[^'",]+
Explanation:
(...) - capturing group
note - match note literally
\s* - match zero or more of whitespaces
=> - match => literally
(?:..) - non-capturing group
"|' - alternation: match either ' or "
? - match preceding pattern zero or one time
[^'",]+ - negated character class - match one or more chraacters (due to + operator) other than ', ", ,
Demo
As a replacement use \1NOTETAG, where \1 means first capturing group

Related

How do I write a regex for this...?

Given a string in the format someName_v0001, how do I write a regex that can give me the name (the bit before _v) and version (the bit after _v), where the version suffix is optional.
e.g. for the input
input => (name, version)
abc_v0001 => (abc, 0001)
abc_v10000 => (abc, 10000)
abc_vwx_v0001 => (abc_vwx,1)
abc => (abc, null)
I've tried this...
(.*)_v\(d*)
... but I don't know how to handle the case of the optional version suffix.
You can use
^(.*?)(?:_v0*(\d+))?$
See the regex demo.
Details
^ - start of string
(.*?) - Capturing group 1: any zero or more chars other than line break chars as few as possible
(?:_v0*(\d+))? - an optional sequence of
_v - a _v substring
0* - zero or more 0 chars
(\d+) - Capturing group 2: any one or more digits
$ - end of string.
To say that a character (or group of characters) is optional, use a ? after it.
For example ABC? would match both “AB” and “ABC”, while AB(CD)? would match both “AB” and “ABCSD”.
I assume you want to make the “_v” part of the version optional as well. In that case, you need to enclose it in a non-capturing group, (?: ), so that you can make it optional using ? without also capturing it.
The correct regex for your scenario is (.*)(?:_v(\d+))?. Capture group 1 will be the name and capture group 2 will be the version, if it exists.
You could try:
(.*)_v([\d]*)
The first capture group is name, the second is version.
() => capture result
.* => any character
_v => the string "_v" this should compensate for the special cases
[\d]* => any number of any digits

Regex: Regular expression to pick the nth parameter of the response

Consider the example below:
AT+CEREG?
+CEREG: "4s",123,"7021","28","8B7200B",8,,,"00000010","10110100"
The desired response would be to pick n
n=1 => "4s"
n=2 => 123
n=8 =>
n=10 => 10110100
In my case, I am enquiring some details from an LTE modem and above is the type of response I receive.
I have created this regex which captures the (n+1)th member under group 2 including the last member, however, I can't seem to work out how to pick the 1st parameter in the approach I have taken.
(?:([^,]*,)){5}([^,].*?(?=,|$))?
Could you suggest an alternative method or complete/correct mine?
You may start matching from : (or +CEREG: if it is a static piece of text) and use
:\s*(?:[^,]*,){min}([^,]*)
where min is the n-1 position of the expected value.
See the regex demo. This solution is std::regex compatible.
Details
: - a colon
\s* - 0+ whitespaces
(?:[^,]*,){min} - min occurrences of any 0+ chars other than , followed with ,
([^,]*) - Capturing group 1: 0+ chars other than ,.
A boost::regex solution might look neater since you may easily capture substrings inside double quotes or substrings consisting of chars other than whitespace and commas using a branch reset group:
:\s*(?:[^,]*,){0}(?|"([^"]*)"|([^,\s]+))
See the regex demo
Details
:\s*(?:[^,]*,){min} - same as in the first pattern
(?| - start of a branch reset group where each alternative branch shares the same IDs:
"([^"]*)" - a ", then Group 1 holding any 0+ chars other than " and then a " is just matched
| - or
([^,\s]+)) - (still Group 1): one or more chars other than whitespace and ,.

Explode string with comma when comma is not inside any brackets

I have string "xyz(text1,(text2,text3)),asd" I want to explode it with , but only condition is that explode should happen only on , which are not inside any brackets (here it is ()).
I saw many such solutions on stackoverflow but it didn't work with my pattern. (example1) (example2)
What is correct regex for my pattern?
In my case xyz(text1,(text2,text3)),asd
result should be
xyz(text1,(text2,text3)) and asd.
You may use a matching approach using a regex with a subroutine:
preg_match_all('~\w+(\((?:[^()]++|(?1))*\))?~', $s, $m)
See the regex demo
Details
\w+ - 1+ word chars
(\((?:[^()]++|(?1))*\))? - an optional capturing group matching
\( - a (
(?:[^()]++|(?1))* - zero or more occurrences of
[^()]++ - 1+ chars other than ( and )
| - or
(?1) - the whole Group 1 pattern
\) - a ).
PHP demo:
$rx = '/\w+(\((?:[^()]++|(?1))*\))?/';
$s = 'xyz(text1,(text2,text3)),asd';
if (preg_match_all($rx, $s, $m)) {
print_r($m[0]);
}
Output:
Array
(
[0] => xyz(text1,(text2,text3))
[1] => asd
)
If the requirement is to split at , but only outside nested parenthesis another idea would be to use preg_split and skip the parenthesized stuff also by use of a recursive pattern.
$res = preg_split('/(\((?>[^)(]*(?1)?)*\))(*SKIP)(*F)|,/', $str);
See this pattern demo at regex101 or a PHP demo at eval.in
The left side of the pipe character is used to match and skip what is inside the parenthesis.
On the right side it will match remaining commas that are left outside of the parenthesis.
The pattern used is a variant of different common patterns to match nested parentehsis.

Vim complex regex

I have these strings in a file:
a b
a-b
a / b / c
I want to replace these with:
"a b" => a_b
"a-b" => a_b
"a / b / c" => a_b_c
How do I write the regex ? Please also explain the regex and name the concepts involved.
Yet another way:
:g/^/co.|-s/.*/"&" =>/|+s/\W\+/_/g|-j
Overview:
For every line, :g/^/, copy a line (:copy) and then substitute to add the "..." => on the first line and do a substitution on the non-alpha characters on the next line with _. Then join the two line, -j.
Glory of Details:
:g/{pat}/{cmd} - run {cmd} on each line matching {pat}. Use ^ to match every line
copy . - copy the current line below the current line (.). Short: co.
-1s/.*/.../ - :s the line above (-1). Replace entire line, .*
"&" => - & is the entire match (or \0 in PRCE)
+s/\W\+/_/g - do a global :s on the next line (+1) for all non-alphanumeric characters with _
-j - do a :join starting from the line above with the next line
For more help:
:h :g
:h :copy
:h :s
:h :j
:h :range
This is beyond simple capturing and reordering in the replacement. The modification of the non-alphabetic characters to _ requires a contained substitution of the match. This can be done via :help sub-replace-expr:
:%substitute/.*/\='"' . submatch(0) . '" => ' . substitute(submatch(0), '\A\+', '_', 'g')/
Basically, this matches entire lines, then replaces with the match in double quotes, followed by =>, followed by the match with non-alphabetic character sequences (\A\+) replaced with a single _.
alternative
You can also do this in two separate steps: First duplicating and quoting the line:
:%substitute/.*/"&" => &/
Then, the second copy needs to be modified. To apply the substitution to only match after the => separator, a positive lookbehind (must match after => + any characters) must be given:
:%substitute/\%(=> .*\)\#<=\A\+/_/g
This achieves what you're asking for, although the question is somewhat ambiguous:
%s/\(\a\)\A\+/\1_/g
%s/[find_pattern]/[replace_pattern/g does find and replace for every line (%) in a file, and does any number of matches (g), as opposed to the default behaviour of just the first one.
(\a) captures a group (brackets have to be escaped), containing an alphabetic character.
\A+ means one or more non-alphabetic character
/1 is a backreference to the first captured group in the pattern. In this case the alphabetic character in brackets.
_ is just the literal.
So together it replaces every letter followed by 1 or more non-letters with that letter followed by _. So this only works when the line ends with the last letter.
One way of doing this:
:%s/[\ -]\/*\ */_/g
[\ -] looks for either a space \ (note the space between \ and -) or a dash -.
The asterisk * means 0 or N occurrences. So \/* 0 or N occurrences of slash /; \ * 0 or N occurrences of space. Finally g replace all occurrences in the line.
[Edit]
I had misunderstood the question. Your problem can be solved using multiple sub-expressions in 2 steps.
step 1) Put an underscore before the c
:%s/c/_c/g
step 2) find and replace
:%s/a\([\ -]\/*\ *\)b\(\1\)*\(_\)*\(c\)*/"a\1b\2\4" => a_b\3\4/g
This will give you
"a b" => a_b
"a-b" => a_b
"a / b / c" => a_b_c
Explanation:
\(\) denotes a sub-expression, order of appearance matters so \1 matches to sub-expression one and so forth.
The trick is to add a _ somewhere so we can use it and at the same get information about the length. Because it only appears before c, the subexpression \3 will only match _ for that line.
Now, by replacing by "a\1b\2\4" we skip \3 avoiding to add an underscore.
:%s:[\ /-]\+:_:g
Explanation:
s: : : - Substitute command (with delimiter `:`)
[\ /-] - Match a ` ` (space), `/`, or `-` character
\+ - Match one or more of the previous group consecutively
_ - Replace with one `_` character
g - Replace all matches in line
% - Execute command on every line in file (optional)
I interpreted your question to be very generic. If you need to match more specific patterns, please indicate exactly what needs to be matched.
[Edit]
If you need to match ' / ' exactly, use:
:%s:\ /\ \|[\ -]:_:g
s: : : - Substitute command (with delimiter `:`)
\| - Match left pattern OR right pattern
\ /\ - Match ` / ` exactly
[\ -] - Match a ` ` (space) or `-` character
_ - Replace with one `_` character
g - Replace all matches in line
% - Execute command on every line in file (optional)
[Edit 2]
I misunderstood what you wanted to substitute.
You're making your life very difficult if you're trying to do this with a
single regex. It will get so complicated, at that point you're better off
writing a small function, like some of the other answers. But you should be
able to get away with two substitution commands without it getting too crazy.
One for the first two strings (a b and a-b), and one for the third
(a / b / c).
%s:\v(\a+)[\ -](\a+):"\0"\ =>\ \1_\2
%s:\v(\a+)\s*/\s*(\a+)\s*/\s*(\a+):"\0"\ =>\ \1_\2_\3
Explanation:
%s:\v(\a+)[\ -](\a+):"\0"\ =>\ \1_\2
s: : - Substitute command (with delimiter `:`)
\v - Very Magic mode *
( ) ( ) - Capture contained matches into numbered sub-expressions
\a+ \a+ - Match at least one alphanumeric character
[\ -] - Match either ` ` (space) or `-`
" "\ =>\ _ - Literal text
\0 - Replace with entire matched text
\1 \2 - Replace with first and second `()` sub-expression, respectively
% - Execute command on every line in file (optional)
%s:\v(\a+)\s*/\s*(\a+)\s*/\s*(\a+):"\0"\ =>\ \1_\2_\3
s: : - Substitute command (with delimiter `:`)
\v - Very Magic mode *
( ) ( ) ( ) - Capture contained matches into numbered sub-expressions
\a+ \a+ \a+ - Match at least one alphanumeric character
\s*/\s* \s*/\s* - Match a `/` and any surrounding spaces
" "\ =>\ _ _ - Literal text
\0 - Replace with entire matched text
\1 \2 \3 - Replace with first, second, and third `()` sub-expression, respectively
% - Execute command on every line in file (optional)
* This eliminates the need for a lot of ugly backslashes.
See `:h /magic` and `:h /\v`

Regex to capture string until another string is encountered

I want to match string1 and anything that appears in the following lines:
['string1','string2','string3']
['string1' , 'string2' , 'string3']
['string1.domain.com' , 'string2.domain.com' , 'string3.domain.com']
['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']
Until it encounters the following:
string2
So with the right regex in the above 4 cases the results in bold would be matched:
['string1','string2','string3']
['string1' , 'string2' , 'string3']
['string1.domain.com' , 'string2.domain.com' , 'string3.domain.com']
['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']
I tried using the following thread to solve my issue with https://regex101.com/
The regex I tried is from Question 8020848, but was not successful with matching the string correctly:
((^|\.lpdomain\.com:8080' , ')(string1))+$
But I was not successful in only matching the part I wanted to in this text:
['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']
The following is what I received using the regex that you suggested
## -108,7 +108,7 ## node stringA, stringB, stringC,stringD inherits default {
'ssl_certificate_file' => 'test.domain.net_sha2_n.crt',
'ssl_certificate_key_file'=> 'test.domain.net_sha2.key' }
},
- service_upstream_members => ['string1.domain.com:8080', 'string2.domain.com:8080', 'string3.domain.com:8080', 'string4.domain.com:8080', 'string5.domain.com:8080'],
+ service_upstream_members => [ 'string2.domain.com:8080', 'string3.domain.com:8080', 'string4.domain.com:8080', 'string5.domain.com:8080'],
service2_upstream_members => ['string9:8080','string10:8080'],
service3_upstream_members => ['string11.domain.com:8080','string12.domain.com:8080','string13.domain.com:8080'],
service_name => 'test_web_nginx_z1',
As you can see, there is a preceding space that for some reason wasn't removed, even regex101.com demonstrates that all whitespaces are captured in the regex using
'string1[^']*'\s*,\s*
This is what I'm currently using (where server is a variable already defined in the script)
sed -i '' "s/'${server}[^']*'\s*,\s*//"
To match a string starting with ' then having string1, then any chars other than ', 0 or more occurrences, and then optional number of whitespaces, a comma and again 0+ whitespaces, you may use
'string1[^']*'\s*,\s*
See the regex demo.
Breakdown:
'string1 - a literal char sequence 'string1
[^']* - zero or more (*) characters other than ' (due to the negated character class [^...])
' - an apostrophe
\s* - 0+ whitespaces
, - a comma
\s* - 0+ whitespaces.
This should match what you ask (according to your bold highlights) allowing for an unknown amount of spaces, etc.
(?:…) is a non-capturing group.
…+? is a non-greedy match (as few as possible of x)
(string1.+?)(?:'string2)
(string1.+?)'string2
See example: https://regex101.com/r/lFPSEM/3