Regex to capture string until another string is encountered

Regex to capture string until another string is encountered - regex

I want to match string1 and anything that appears in the following lines:
['string1','string2','string3']
['string1' , 'string2' , 'string3']
['string1.domain.com' , 'string2.domain.com' , 'string3.domain.com']
['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']
Until it encounters the following:
string2
So with the right regex in the above 4 cases the results in bold would be matched:
['string1','string2','string3']
['string1' , 'string2' , 'string3']
['string1.domain.com' , 'string2.domain.com' , 'string3.domain.com']
['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']
I tried using the following thread to solve my issue with https://regex101.com/
The regex I tried is from Question 8020848, but was not successful with matching the string correctly:
((^|\.lpdomain\.com:8080' , ')(string1))+$
But I was not successful in only matching the part I wanted to in this text:
['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']
The following is what I received using the regex that you suggested
## -108,7 +108,7 ## node stringA, stringB, stringC,stringD inherits default {
'ssl_certificate_file' => 'test.domain.net_sha2_n.crt',
'ssl_certificate_key_file'=> 'test.domain.net_sha2.key' }
},
- service_upstream_members => ['string1.domain.com:8080', 'string2.domain.com:8080', 'string3.domain.com:8080', 'string4.domain.com:8080', 'string5.domain.com:8080'],
+ service_upstream_members => [ 'string2.domain.com:8080', 'string3.domain.com:8080', 'string4.domain.com:8080', 'string5.domain.com:8080'],
service2_upstream_members => ['string9:8080','string10:8080'],
service3_upstream_members => ['string11.domain.com:8080','string12.domain.com:8080','string13.domain.com:8080'],
service_name => 'test_web_nginx_z1',
As you can see, there is a preceding space that for some reason wasn't removed, even regex101.com demonstrates that all whitespaces are captured in the regex using
'string1[^']*'\s*,\s*
This is what I'm currently using (where server is a variable already defined in the script)
sed -i '' "s/'${server}[^']*'\s*,\s*//"

To match a string starting with ' then having string1, then any chars other than ', 0 or more occurrences, and then optional number of whitespaces, a comma and again 0+ whitespaces, you may use
'string1[^']*'\s*,\s*
See the regex demo.
Breakdown:
'string1 - a literal char sequence 'string1
[^']* - zero or more (*) characters other than ' (due to the negated character class [^...])
' - an apostrophe
\s* - 0+ whitespaces
, - a comma
\s* - 0+ whitespaces.

This should match what you ask (according to your bold highlights) allowing for an unknown amount of spaces, etc.
(?:…) is a non-capturing group.
…+? is a non-greedy match (as few as possible of x)
(string1.+?)(?:'string2)
(string1.+?)'string2
See example: https://regex101.com/r/lFPSEM/3

Related

split string on commas, ignore single quote and comma in single quote

I have a string
10013751290,STUBBY'S GYM,HELLO ( Mate (, HEY 'duran,duran',this is [ possible[ ] ]possible ,1232424
I want to split by , using regex in java, so the result I would be expecting is like
10013751290
STUBBY'S GYM
HELLO ( Mate (
HEY 'duran,duran'
this is [ possible[ ] ]possible
1232424
I tried the following expression ,(?=(?:[^']*\'[^']*\')*[^']*$)
I am getting only 4 matches whereas I should be getting 6.
Please can some one help. I have checked and its not a duplicate, slightly different.

You want to ignore a ' that is an apostrophe, thus, you need to add this as an alternative to the pattern that matches any char but ', that is, [^'] -> (?:[^']|\b'\b). Or, (?:[^']|(?<=[a-zA-Z])'(?=[a-zA-Z])) or a fully Unicode (?:[^']|(?<=\p{L})'(?=\p{L})) (if supported) to only match ' in between alphabetic chars, letters.
\s*,\s*(?=(?:(?:(?:[^']|\b'\b)*'){2})*(?:[^']|\b'\b)*$)
See the regex demo. Details:
\s*,\s* - a comma enclosed with zero or more whitespace chars
(?=(?:(?:(?:[^']|\b'\b)*'){2})*(?:[^']|\b'\b)*$) - a positive lookahead that requires, immediately to the right of the current location:
(?:(?:(?:[^']|\b'\b)*'){2})* - zero or more occurrences of two repetitions of
(?:[^']|\b'\b)* - zero or more occurrences of any char other than a ' or a ' in between any word chars
' - a single quotation mark
(?:[^']|\b'\b)* - zero or more occurrences of any char other than a ' or a ' in between any word chars
$ - end of string.

you can try something like this :
(?<=^|,)(((?:\s|\w)*'(?:\w|,)*')|.*?)(?=$|,)
result
10013751290
STUBBY'S GYM
HELLO ( Mate (
HEY 'duran,duran'
this is [ possible[ ] ]possible
1232424
see demo

Need to fix the perl regex to handle multiple cases

I'm trying to handle some cases of strings with the regex:
(.*note(?:'|")?\s*=>\s*)("|')?(.*?)\2(.*)
Strings:
note => "note goes here",
note => 'note goes here',
note => $note,
note => "$note",
note => '$note',
note => '$note'
note => $note . $note2 (can go longer, think it as key value of the perl hash)
# note => '$note',
There can be multiple spaces in start/end/in between. I need to capture " (or '), $note, ,or whatever is left after note_section. There can be # in beginning if this line is a comment, so, I've included .* in beginning. Given regex is failing in case 3 as there is \2 as null.
Edit:
Requirement is that I'm reading a file, and replacing the value of note with some tag say NOTETAG, and all other things around remain same, including inverted commas and spaces. For that,
we need to capture the everything from beginning till we start writing the value
We should capture inverted commas too, so that I can write it back exactly
We need to capture the value of the note
We should capture things after the note value as well.
e.g. note => "kamal" , will become note => "NOTETAG" , (notice we didnt ate , from last)

s{
\b
note
\s*
=>
\s*
\K
(?: (.*)
| '[^']*'
| "[^"]*"
)
}{
defined($1)
? $1 =~ s{\$note\b}{"NOTETAG"}gr
: '"NOTETAG"'
}exg;

Yuo could try (note\s*=>\s*(?:"|')?)[^'",]+
Explanation:
(...) - capturing group
note - match note literally
\s* - match zero or more of whitespaces
=> - match => literally
(?:..) - non-capturing group
"|' - alternation: match either ' or "
? - match preceding pattern zero or one time
[^'",]+ - negated character class - match one or more chraacters (due to + operator) other than ', ", ,
Demo
As a replacement use \1NOTETAG, where \1 means first capturing group

Options matching in a command

I'm actually creating a discord bot and I'm trying to match some command options and I have a problem getting the value between the square brackets. (if there is)
I've already tried to add a ? to match one or more of these but it's not working, searching about how I could match between two characters but found nothing that helped me.
Here is the pattern I've got so far : https://regexr.com/4icgi
and here it is in text : /[+|-](.+)(\[(.+)\])?/g
What I expect it to do is from an option like that : +user[someRandomPeople]
to extract the parameter user and the value someRandomPeople and if there is no square brackets, it will only extract the parameter.

You may use
^[+-](.*?)(?:\[(.*?)\])?$
Or, if there should be no square brackets inside the optional [...] substring at the end:
^[+-](.*?)(?:\[([^\][]*)\])?$
Or, if the matches are searched for on different lines:
^[+-](.*?)(?:\[([^\][\r\n]*)\])?$
See the regex demo and the regex graph:
Details
^ - start of string
[+-] - + or - (note that | inside square brackets matches a literal | char)
(.*?) - Group 1: any 0 or more chars other than line break chars as few as possible
(?:\[(.*?)\])? - an optional sequence of
\[ - a [ char
(.*?) - Group 2: any 0 or more chars other than line break chars as few as possible ([^\][]* matches 0 or more chars other than [ and ])
\] - a ] char
$ - end of string.

Filter out a expression from Regex match

I have a regex query which works fine for most of the input patterns but few.
Regex query I have is
("(?!([1-9]{1}[0-9]*)-(([1-9]{1}[0-9]*))-)^(([1-9]{1}[0-9]*)|(([1-9]{1}[0-9]*)( |-|( ?([1-9]{1}[0-9]*))|(-?([1-9]{1}[0-9]*)){1})*))$")
I want to filter out a certain type of expression from the input strings i.e except for the last character for the input string every dash (-) should be surrounded by the two separate integers i.e (integer)(dash)(integer).
Two dashes sharing 3 integers is not allowed even if they have integers on either side like (integer)(dash)(integer)(dash)(integer).
If the dash is the last character of input preceded by the integer that's an acceptable input like (integer)(dash)(end of the string).
Also, two consecutive dashes are not allowed. Any of the above-mentioned formats can have space(s) between them.
To give the gist, these dashes are used in my input string to provide a range.
Some example of expressions that I want to filter out are
1-5-10, 1 - 5 - 10, 1 - - 5, -5
Update - There are some rules which will drive the input string. My job is to make sure I allow only those input strings which follow the format. Rules for the format are -
1. Space (‘ ‘) delimited numbers. But dash line doesn’t need to have a space. For example, “10 20 - 30” or “10 20-30” are all valid values.
2. A dash line (‘-‘) is used to set range (from, to). It also can used to set to the end of job queue list. For example, “100-150 200-250 300-“ is a valid value.
3. A dash-line without start job number is not allowed. For example, “-10” is not allowed.
Thanks

You might use:
^(?:(?:[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]*|[1-9][0-9]*)(?: (?:[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]*|[1-9][0-9]*))*(?: [1-9][0-9]*-)?|[1-9][0-9]*-?)[ ]*$
Regex demo
Explanation
^ Assert start of the string
(?: Non capturing group
(?: Non capturing group
[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]* Match number > 0, an optional space, a dash, an optional space and number > 0. The space is in a character class [ ] for clarity.
| Or
[1-9][0-9]* Match number > 0
) Close non capturing group
(?:[ ] Non capturing group followed by a space
(?: Non capturing group
[1-9][0-9]*[ ]?-[ ]?[1-9][0-9]* Match number > 0, an optional space, a dash, an optional space and number > 0.
| Or
[1-9][0-9]* Match number > 0
) close non capturing group
)* close non capturing group and repeat zero or more times
(?: [1-9][0-9]*-)? Optional part that matches a space followed by a number > 0
| Or
[1-9][0-9]*-? Match a number > 0 followed by an optional dash
) close non capturing group
[ ]*$ Match zero or more times a space and assert the end of the string
NoteIf you want to match zero or more times a space instead of an optional space, you could update [ ]? to [ ]*. You can write [1-9]{1} as [1-9]

After the update the question got quite a lot of complexity. Since some parts of the regex are reused multiple times I took the liberty of working this out in Ruby and cleaned it up afterwards. I'll show you the build process so the regex can be understood. Ruby uses #{variable} for regex and string interpolation.
integer = /[1-9][0-9]*/
integer_or_range = /#{integer}(?: *- *#{integer})?/
integers_or_ranges = /#{integer_or_range}(?: +#{integer_or_range})*/
ending = /#{integer} *-/
regex = /^(?:#{integers_or_ranges}(?: +#{ending})?|#{ending})$/
#=> /^(?:(?-mix:(?-mix:(?-mix:[1-9][0-9]*)(?: *- *(?-mix:[1-9][0-9]*))?)(?: +(?-mix:(?-mix:[1-9][0-9]*)(?: *- *(?-mix:[1-9][0-9]*))?))*)(?: +(?-mix:(?-mix:[1-9][0-9]*) *-))?|(?-mix:(?-mix:[1-9][0-9]*) *-))$/
Cleaning up the above regex leaves:
^(?:[1-9][0-9]*(?: *- *[1-9][0-9]*)?(?: +[1-9][0-9]*(?: *- *[1-9][0-9]*)?)*(?: +[1-9][0-9]* *-)?|[1-9][0-9]* *-)$
You can replace [0-9] with \d if you like, but since you used the [0-9] syntax in your question I used it for the answer as well. Keep in mind that if you do replace [0-9] with \d you'll have to escape the backslash in string context. eg. "[0-9]" equals "\\d"
You mention in your question that
Any of the above-mentioned formats can have space(s) between them.
I assumed that this means space(s) are not allowed before or after the actual content, only between the integers and -.
Valid:
15 - 10
1234 -
Invalid:
15 - 10
123
If this is not the case simply add * to the start and end.
^ *... *$
Where ... is the rest of the regex.
You can test the regex in my demo, but it should be clear from the build process what the regex does.
var
inputs = [
'1-5-10',
'1 - 5 - 10',
'1 - - 5',
'-5',
'15-10',
'15 - 10',
'15 - 10',
'1510',
'1510-',
'1510 -',
'1510 ',
' 1510',
' 15 - 10',
'10 20 - 30',
'10 20-30',
'100-150 200-250 300-',
'100-150 200-250 300- ',
'1-2526-27-28-',
'1-25 26-2728-',
'1-25 26-27 28-',
],
regex = /^(?:[1-9][0-9]*(?: *- *[1-9][0-9]*)?(?: +[1-9][0-9]*(?: *- *[1-9][0-9]*)?)*(?: +[1-9][0-9]* *-)?|[1-9][0-9]* *-)$/,
logInputAndMatch = input => {
console.log(`input: "${input}"`);
console.log(input.match(regex))
};
inputs.forEach(logInputAndMatch);

Replace commas enclosed in curly braces (but not quite)

This seems to be a duplicate question of an already asked one but not really. What I'm looking for is one or more regular expressions without the help of any programming language to change the following text String.Concat( new string[] { "some", "random", "text", string1, string2, "end" }) into "some" + "random" + "text" + string1 + string2 + "end".
I was thinking of using two regular expressions, replacing the commas with pluses, and then removing the String.Concat( new string[] { ... }). The second part is quite easy, but I am struggling with the first regular expression. I used a positive look-behind expression, but it matches only the first comma: (?<=String\.Concat\(new string\[\] \{[^,}]*),
I'm not an expert but I think that this is a limitation of the regular expression engine. Once the first comma is matched, the regular expression engine moves the starting matching index after the comma and it doesn't match anymore the look-behind group before it.
Is there a regular expression to make this substitution, pluses instead of commas, without the help on any programming language?

Just like you said: first replace all comma by plus signs:
Regex 1: /,/g
Replacement 1: " +"
Then remove all unnecessary stuff, capture what you need and use a backreference to the captured group as replacement:
Regex 2: /String\.Concat\(\s*new\s*string\s*\[\]\s*{\s*(.*?)\s*}\)/g
Replacement 2: "$1"(or however you can specify backreferences).

I'm assuming you're using a text editor:
,
substitution:
+
See the demo
Second:
.*?{(.*?)}.*
Replacement
$1
See the demo

You should do it in two steps.
First substitution
s/^\s*String\s*\.Concat\s*\(\s*new\s+string\[\]\s*\{\s*("[^\}]*")\s*\}\)\s*$/$1/i
gets you
"some", "random", "text", string1, string2, "end"
Second one
s/\s*("[^"]*"|\b\w+\b)\s*,\s*/$1 + /g
returns your desired output
"some" + "random" + "text" + string1 + string2 + "end"
Check this: https://ideone.com/QuRKmt (sample code in Perl).

Update Notepad++ single pass solution
(Note you can also do this in RegexFormat8 using the Boost extended replacement option)
Find (?:String\.Concat\(\s*new\s*string\s*\[\]\s*{\s*([^,})]*?)\s*(?=,|}\))|\G(?!^)\s*,\s*([^,})]*?)\s*(?=,|}\)))(?:}\)|(?=(?:\s*,\s*[^,})]*)+}\)))
Replace (?1$1: + $2)
(without conditional replace its + $1$2 https://regex101.com/r/Vqe44r/1)
Formatted-expand mode
(?:
String \. Concat\( \s* new \s* string \s* \[\] \s* { \s*
( [^,})]*? ) # (1)
\s*
(?= , | }\) )
|
\G
(?! ^ )
\s* , \s*
( [^,})]*? ) # (2)
\s*
(?= , | }\) )
)
(?:
}\)
|
(?=
(?:
\s* , \s*
[^,})]*
)+
}\)
)
)
Output
"some" + "random" + "text" + string1 + string2 + "end"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex to capture string until another string is encountered - regex

This should match what you ask (according to your bold highlights) allowing for an unknown amount of spaces, etc. (?:…) is a non-capturing group. …+? is a non-greedy match (as few as possible of x) (string1.+?)(?:'string2) (string1.+?)'string2 See example: https://regex101.com/r/lFPSEM/3

Related

split string on commas, ignore single quote and comma in single quote

Need to fix the perl regex to handle multiple cases

Options matching in a command

Filter out a expression from Regex match

Replace commas enclosed in curly braces (but not quite)

Categories

Resources