Perl regular expressions to capture parts of command output - regex

Edited the question : I have added a third function. I have come up with a regex which seems to be correct . (1st and 2nd function work as expected. )
I have written a couple of functions which are in a library, and I call the functions from my test script. I'm having some issues with the regular expressions. Can somebody help me out with the regular expressions?
Function 1:
sub ipsec_version {
my ($self) = #_;
my $cmd = 'sudo -s ipsec version ';
my $version = 0;
#execute the command
$self->execute($cmd);
foreach my $line ( #{ $self->get_stdout() } ) {
if ( $line =~ m/strongSwan/msx ) {
$version = $1;
}
}
return $version;
}
Function call:
$self->{'ipsec_version'} = $self->{'ipsec_obj'}->ipsec_version();
INFO('[Startup] ipsec version is : ' . $self->{'ipsec_version'} );
Actual output:
Use of uninitialized value in concatenation (.) or string at ... line 37.
ipsec version is :
Expected output:
strongSwan U5.1.2/K3.16.0-30-generic
Command output:
I need the script to capture the expected output string from this
Linux strongSwan U5.1.2/K3.16.0-30-generic
Institute for Internet Technologies and Applications
University of Applied Sciences Rapperswil, Switzerland
See 'ipsec --copyright' for copyright information.
Function 2:
sub ipsec_status {
my ($self,$connection_name) = #_;
my $cmd = 'sudo -s ipsec status ' . $connection_name;
my $status = 0;
#execute the command
$self->execute($cmd);
foreach my $line ( #{ $self->get_stdout() } ) {
if ( $line =~ m/Security\sassociations\d\()/ ) {
$status = $1;
}
}
return $status;
}
Function call:
$self->{'ipsec_status'} = $self->{'ipsec_obj'}->ipsec_status('connection');
('[Startup] ipsec status is : ' . $self->{'ipsec_status'} );
Actual output:
INFO [Startup] ipsec status is : 0
Expected output:
Security Associations (1 up, 0 connecting)
Command output:
I need the script to capture the expected output string from this
Security Associations (1 up, 0 connecting):
connection[3]: ESTABLISHED 3 seconds ago, 1.1.1.19[1.1.1.19]...10.81.1.50[10.81.1.50]
connection{3}: INSTALLED, TUNNEL, ESP in UDP SPIs: cb343e86_i abf6d1f2_o
Function 3 :
sub ipsec_restart {
my ($self) = #_;
my $cmd = 'sudo -s ipsec restart';
my $restart = 0;
$self->execute($cmd);
foreach my $line ( #{ $self->get_stdout() } ) {
if ( $line =~ /(Starting strongSwan.*IPsec$)/ ) {
$restart = $1;
last;
}
}
return $restart;
}
Function call :
$self->{'ipsec_restart'} = $self->{'ipsec_obj'}->ipsec_restart();
('[Startup] ipsec restart status is : ' . $self->{'ipsec_restart'} );
Expected output : See the highlighted text below.
I checked in https://regex101.com/ . My regex seems to be correct. /(Starting strongSwan.*IPsec$)/
Starting strongSwan 5.1.2 IPsec
Actual output is : 0

The problem is that you're using $1 which is set only if there are captures in the regular expression that you are using. Also, you should add a last so that you don't go on searching for a matching line once you have found one. lastly, it is wrong to habitually add the /m, /s and /x modifiers to every regex match regardless of whether they make any difference, and a leading m/ is only necessary if you are using something other than the default slash delimiters.
The first function should contain
for my $line ( #{ $self->get_stdout } ) {
if ( $line =~ /(strongSwan.*\S)/i ) {
$version = $1;
last;
}
}
and the second function is similar, although it looks like you're trying to match Security associations immediately followed by a digit, which isn't what is in the text. You also have a small a instead of a capital, which is fine if you also use an /i modifier. This captures Security Associations up to and including the last closing parenthesis on the same line. Is that what you need?
for my $line ( #{ $self->get_stdout } ) {
if ( $line =~ /(Security\s+Associations.*\))/i ) {
$status = $1;
last;
}
}

For function 1, assuming that all that changes is the version number, we can use
m/(strongSwan.*?) Institute/ms
as the regular expression. This will match everything from strongSwan to Institute, in a non-greedy way, and store all of it except Institute in $1.
For function 2, we will use the fact that the unknown data is contained in parentheses.
m/(Security associations \(.*?\))/

Related

Dynamic regular expression for Nesting brackets failed due to unknow bugs

rencently I have met a strange bug when use a dynamic regular expressions in perl for Nesting brackets' match. The origin string is " {...test{...}...} ", I want to grep the pair brace begain with test, "test{...}". actually there are probably many pairs of brace before and end this group , I don't really know the deepth of them.
Following is my match scripts: nesting_parser.pl
#! /usr/bin/env perl
use Getopt::Long;
use Data::Dumper;
my %args = #ARGV;
if(exists$args{'-help'}) {printhelp();}
unless ($args{'-file'}) {printhelp();}
unless ($args{'-regex'}) {printhelp();}
my $OpenParents;
my $counts;
my $NestedGuts = qr {
(?{$OpenParents = 0})
(?>
(?:
[^{}]+
| \{ (?{$OpenParents++;$counts++; print "\nLeft:".$OpenParents." ;"})
| \} (?(?{$OpenParents ne 0; $counts++}) (?{$OpenParents--;print "Right: ".$OpenParents." ;"})) (?(?{$OpenParents eq 0}) (?!))
)*
)
}x;
my $string = `cat $args{'-file'}`;
my $partten = $args{'-regex'} ;
print "####################################################\n";
print "Grep [$partten\{...\}] from $args{'-file'}\n";
print "####################################################\n";
while ($string =~ /($partten$NestedGuts)/xmgs){
print $1."}\n";
print $2."####\n";
}
print "Regex has seen $counts brackts\n";
sub printhelp{
print "Usage:\n";
print "\t./nesting_parser.pl -file [file] -regex '[regex expression]'\n";
print "\t[file] : file path\n";
print "\t[regex] : regex string\n";
exit;
}
Actually my regex is:
our $OpenParents;
our $NestedGuts = qr {
(?{$OpenParents = 0})
(?>
(?:
[^{}]+
| \{ (?{$OpenParents++;})
| \} (?(?{$OpenParents ne 0}) (?{$OpenParents--})) (?(?{$OpenParents eq 0} (?!))
)*
)
}x;
I have add brace counts in nesting_parser.pl
I also write a string generator for debug: gen_nesting.pl
#! /usr/bin/env perl
use strict;
my $buffer = "{{{test{";
unless ($ARGV[0]) {print "Please specify the nest pair number!\n"; exit}
for (1..$ARGV[0]){
$buffer.= "\n\{\{\{\{$_\}\}\}\}";
#$buffer.= "\n\{\{\{\{\{\{\{\{\{$_\}\}\}\}\}\}\}\}\}";
}
$buffer .= "\n\}}}}";
open TEXT, ">log_$ARGV[0]";
print TEXT $buffer;
close TEXT;
You can generate a test file by
./gen_nesting.pl 1000
It will create a log file named log_1000, which include 1000 lines brace pairs
Now we test our match scripts:
./nesting_parser.pl -file log_1000 -regex "test" > debug_1000
debug_1000 looks like a great perfect result, matched successfully! But when I gen a 4000 lines test log file and match it again, it seem crashed:
./gen_nesting.pl 4000
./nesting_parser.pl -file log_4000 -regex "test" > debug_4000
The end of debug_4000 shows
{{{{3277}
####
Regex has seen 26213 brackts
I don't know what's wrong with the regex expresions, mostly it works well for paired brackets, untill recently I found it crashed when I try to match a text file more than 600,000 lines.
I'm really confused by this problems,
I really hope to solve this problem.
thank you all!
First for matching nested brackets I normally use Regexp::Common.
Next, I'm guessing that your problem is that Perl's regular expression engine breaks after matching 32767 groups. You can verify this by turning on warnings and looking for a message like Complex regular subexpression recursion limit (32766) exceeded.
If so, you can rewrite your code using /g and \G and pos. The idea being that you match the brackets in a loop like this untested code:
my $start = pos($string);
my $open_brackets = 0;
my $failed;
while (0 < $open_brackets or $start == pos($string)) {
if ($string =~ m/\G[^{}]*(\{|\})/g) {
if ($1 eq '{') {
$open_brackets++;
}
else {
$open_brackets--;
}
}
else {
$failed = 1;
break; # WE FAILED TO MATCH
}
}
if (not $failed and 0 == $open_brackets) {
my $matched = substr($string, $start, pos($string));
}

Perl Regex - Print the matched Conditional Regex

I am trying to extract some patterns out of a log file but I am unable to print them properly.
Examples of log strings :
1) sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666
2) sequence_history/buckets/FPJ.INV_DOM_16_PRD.41987.9616
I want to extract 3 things :
A = "FPJ.INV_DOM_16_PRD" B = "47269" C = 9616 or 2644666 (if the line
has endid then C = 2644666 else it's 9616)
log line can either be of type 1 or 2. I am able to extract A and B but I am stuck with C as I need a conditional statement for it and I am not able to extract it properly. I am pasting my code :
my $string='/sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666';
if ($string =~ /sequence_history\/buckets\/(.*)/){
my $line = $1;
print "$line\n";
if($line =~ /(FPJ.*PRD)\.(\d*)\./){
my $topic_type_string = $1;
my $topic_id = $2;
print "$1\n$2\n";
}
if($string =~ /(?(?=endid=)\d*$)/){
# how to print match pattern here?
print "match\n";
}
Thanks in advance!
This will do the job:
use Modern::Perl;
use Data::Dumper;
my $re = qr/(FPJ.+?PRD)\.(\d+)\..*?(\d+)$/;
while(<DATA>) {
chomp;
my (#l) = $_ =~ /$re/g;
say Dumper\#l;
}
__DATA__
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666
sequence_history/buckets/FPJ.INV_DOM_16_PRD.41987.9616
Output:
$VAR1 = [
'FPJ.INV_DOM_16_PRD',
'47269',
'2644666'
];
$VAR1 = [
'FPJ.INV_DOM_16_PRD',
'41987',
'9616'
];
Explanation:
( : start group 1
FPJ : literally FPJ
.+? : 1 or more any character but newline, not greedy
PRD : literally PRD
) : end group 1
\. : a dot
( : start group 2
\d+ : 1 or more digit
) : end group 2
\. : a dot
.*? : 0 or more any character not greedy
( : start group 3
\d+ : 1 or more digit
) : end group 3
$ : end of string
If you are trying to fetch some entries in log file, then you can use file handles in perl. In below code i'm trying to fetch the entries from a log file named as test.log
Entries of the log are as below.
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.2644?startid=2644000&endid=2644666
sequence_history/buckets/FPJ.INV_DOM_16_PRD.41987.9616
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.69886?startid=2644000&endid=26765849
sequence_history/buckets/FPJ.INV_DOM_16_PRD.47269.24465?startid=2644000&endid=836783741
Below is the perl script to fetch required data.
#!/usr/bin/perl
use strict;
use warnings;
open (FH, "test.log") || die "Not able to open test.log $!";
my ($a,$b,$c);
while (my $line=<FH>)
{
if ($line =~ /sequence_history\/buckets\/.*endid=(\d*)/)
{
$c= $1;
if ($line =~ /(FPJ.*PRD)\.(\d*)\.(\d*)\?/)
{
$a=$1;
$b=$2;
}
}
else
{
if ($line =~ /sequence_history\/buckets\/(FPJ.*PRD)\.(\d*)\.(\d*)/)
{
$a=$1;
$b=$2;
$c=$3;
}
}
print "\n \$a=$a\n \$b=$b\n \$c=$c \n";
}
Output:
$a=FPJ.INV_DOM_16_PRD
$b=47269
$c=2644666
$a=FPJ.INV_DOM_16_PRD
$b=41987
$c=9616
$a=FPJ.INV_DOM_16_PRD
$b=47269
$c=26765849
$a=FPJ.INV_DOM_16_PRD
$b=47269
$c=836783741
You can use the above code by replacing "test.log" by log file name (along with its path) from which you want to fetch data as shown below.
open (FH, "/path/to/log/file/test.log") || die "Not able to open test.log $!";

Data extract from MQ output issue in perl code

I'm having problem with my perl code, the code that I wrote is suppose to grab some information from MQ command dis ql(*) all,
below is one of the output example from above command,
AMQ8409: Display Queue details.
QUEUE(XXX.DATATYPE.NETSTATVM) TYPE(QLOCAL)
ACCTQ(QMGR) ALTDATE(2016-08-01)
ALTTIME(18.33.19) BOQNAME( )
BOTHRESH(0) CLUSNL( )
CLUSTER( ) CLCHNAME( )
CLWLPRTY(0) CLWLRANK(0)
CLWLUSEQ(QMGR) CRDATE(2016-08-01)
CRTIME(18.33.19) CURDEPTH(0)
CUSTOM( ) DEFBIND(OPEN)
DEFPRTY(0) DEFPSIST(YES)
DEFPRESP(SYNC) DEFREADA(NO)
DEFSOPT(SHARED) DEFTYPE(PREDEFINED)
DESCR(Queue for XXX.DataType.netstatvm)
DISTL(NO) GET(ENABLED)
HARDENBO INITQ( )
IPPROCS(1) MAXDEPTH(20000)
MAXMSGL(33554432) MONQ(QMGR)
MSGDLVSQ(PRIORITY) NOTRIGGER
NPMCLASS(NORMAL) OPPROCS(0)
PROCESS( ) PUT(ENABLED)
PROPCTL(COMPAT) QDEPTHHI(80)
QDEPTHLO(20) QDPHIEV(DISABLED)
QDPLOEV(DISABLED) QDPMAXEV(ENABLED)
QSVCIEV(NONE) QSVCINT(999999999)
RETINTVL(999999999) SCOPE(QMGR)
SHARE STATQ(QMGR)
TRIGDATA( ) TRIGDPTH(1)
TRIGMPRI(0) TRIGTYPE(FIRST)
USAGE(NORMAL)
Above output is grab from one of the queue in MQ instead of all queue which the command run.
From above, I want to extract the value from QUEUE, CURDEPTH and MAXDEPTH, as below:-
QUEUE(XXX.DATATYPE.NETSTATVM)
CURDEPTH(0)
MAXDEPTH(20000)
So, I wrote a perl code to obtain the value from QUEUE, CURDEPTH and MAXDEPTH, below is my code,
my $qm = XXX;
open (CHS_OUT, "echo 'dis ql(*) all'|runmqsc $qm|");
while (<CHS_OUT>) {
if ( /QUEUE\(/ ){
my $QueueName =~ /QUEUE/(/\S+)/g;
}
if ( /CURDEPTH\(/ ){
my $CurDepth =~ s/\D//g;
chomp $CurDepth;
print "$CurDepth \n";
}
if ( /MAXDEPTH\(/ ){
my $MaxDepth =~ s/\D//g;
chomp $MaxDepth;
print "$MaxDepth \n";
}
}
The output suppose to be as below,
XXX.DATATYPE.NETSTATVM
0
20000
However, I received a multiple error to extract all of this 3 information, one of the error as below,
Use of uninitialized value $MaxDepth in substitution (s///) at mq_test.pl line 26, line 7361.
Use of uninitialized value $MaxDepth in scalar chomp at mq_test.pl line 27, line 7361.
Use of uninitialized value $MaxDepth in concatenation (.) or string at mq_test.pl line 28, line 7361.
This make me confuse since I already do multiples changes of this code but still not success.
You could use the following regular Expression
(?:QUEUE|CURDEPTH|MAXDEPTH)\(\K[^()]+
See a demo on regex101.com.
That is
(?:QUEUE|CURDEPTH|MAXDEPTH) # one of the alternatives
\( # an opening bracket
\K # "forget" everything
[^()]+ # not (), at least once
In Perl this would be:
my #matches = $str =~ /(?:QUEUE|CURDEPTH|MAXDEPTH)\(\K[^()]+/g;
print "#matches\n";
# XXX.DATATYPE.NETSTATVM
# 0
# 20000
=~ is the binding operator. It binds the left hand side string to the match on the right hand side. But you have my $variable on the LHS - so the string is empty. What you want is to match against the implicit variable, and possibly store a part of the match. This is done by normal assignment in list context:
#!/usr/bin/perl
use warnings;
use strict;
while (<>) {
if ( /QUEUE\(/ ) {
my ($QueueName) = /QUEUE\((\S+)\)/;
print "QN: $QueueName\n";
}
if ( /CURDEPTH\(/ ) {
my ($CurDepth) = /CURDEPTH\((\d+)/;
print "CD: $CurDepth\n";
}
if ( /MAXDEPTH\(/ ) {
my ($MaxDepth) = /MAXDEPTH\((\d+)/;
print "MD: $MaxDepth\n";
}
}
You can combine all the regexes into one, too, and use a hash to store the values keyed by the word before the parenthesis:
#!/usr/bin/perl
use warnings;
use strict;
my %info;
while (<>) {
if (my ($key, $value)
= / ( QUEUE | CURDEPTH | MAXDEPTH ) \( ( [^)]+ ) /x
) {
$info{$key} = $value;
}
}
for my $key (keys %info) {
print "$key: $info{$key}\n";
}
I use the following to do something similar with awk:
echo "DIS QL(*) CURDEPTH MAXDEPTH"|runmqsc $qm | grep -o '^\w\+:\|\w\+[(][^)]\+[)]' | awk -F '[()]' -v OFS='\n' 'function printValues() { if ("QUEUE" in p) { print p["QUEUE"], p["CURDEPTH"], p["MAXDEPTH"], "" } } /^\w+:/ { printValues(); delete p; next } { p[$1] = $2 } END { printValues() }'
Output would look like this:
XXX.DATATYPE.NETSTATVM
0
20000
YYY.DATATYPE.NETSTATVM
50
10000

perl count line in double looping, if match regular expression plus 1

I open a file by putting the line to an array. Inside this file based on the regular expression that contains a duplicate value. If the regular expression is a match I want to count it. The regular expression may look like this
$b =~ /\/([^\/]+)##/. I want to match $1 value.
my #array = do
{
open my $FH, '<', 'abc.txt' or die 'unable to open the file\n';
<$FH>;
};
Below is the way I do, it will get the same line in my file. Thank for help.
foreach my $b (#array)
{
$conflictTemp = 0;
$b =~ /\/([^\/]+)##/;
$b = $1;
#print "$b\n";
foreach my $c (#array)
{
$c =~ /\/([^\/]+)##/;
$c = $1;
if($b eq $c)
{
$conflictTemp ++;
#print "$b , $c \n"
#if($conflictTemp > 1)
#{
# $conflict ++;
#}
}
}
}
Below is the some sample data, two sentences are duplicates
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.**cdtbuild##**/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3
It looks like you're trying to iterate each element of the array, select some data via pattern match, and then count dupes. Is that correct?
Would it not be easier to:
my %count_of;
while ( <$FH> ) {
my ( $val ) = /\/([^\/]+)##/;
$count_of{$val}++;
}
And then, for the variables that have more than one (e.g. there's a duplicate):
print join "\n", grep { $count_of{$_} > 1 } keys %count_of;
Alternatively, if you're just wanting to play 'spot the dupe':
#!/usr/bin/env perl
use strict;
use warnings;
my %seen;
my $match = qr/\/([^\/]+)##/;
while ( <DATA> ) {
my ( $value ) = m/$match/ or next;
print if $seen{$value}++;
}
__DATA__
/a/b/c/d/code/Debug/atlantis_digital/c/d/code/Debug/atlantis_digital.map##/main/place.09/2
/a/b/c/d/code/C5537_mem_map.cmd##/main/place.09/0
/a/b/c/d/code/.settings/org.eclipse.cdt.managedbuilder.core.prefs##/main/4
/a/b/c/d/code/.project_initial##/main/2
/a/b/c/d/code/.project##/main/CSS5/5
/a/b/c/d/code/.cproject##/main/CSS5/10
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtproject##/main/place.09/0
/a/b/c/d/code/.cdtbuild_initial##/main/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.cdtbuild##/main/CSS5/2
/a/b/c/d/code/.ccsproject##/main/CSS5/3
The problem has been solved by the previous answer - I just want to offer an alternate flavour that;
Spells out the regex
Uses the %seen hash to record the line the pattern first appears; to enable
slightly more detailed reporting
use v5.12;
use warnings;
my $regex = qr/
\/ # A literal slash followed by
( # Capture to $1 ...
[^\/]+ # ... anything that's not a slash
) # close capture to $1
## # Must be immdiately followed by literal ##
/x;
my %line_num ;
while (<>) {
next unless /$regex/ ;
my $pattern = $1 ;
if ( $line_num{ $pattern } ) {
say "'$pattern' appears on lines ", $line_num{ $pattern }, " and $." ;
next ;
}
$line_num{ $pattern } = $. ; # Record the line number
}
# Ran on data above will produce;
# '.cdtproject' appears on lines 7 and 8
# '.cdtbuild' appears on lines 10 and 11

Finding results from and between groups of parentheses with regexp

Text format:
(Superships)
Eirik Raude - olajkutató fúrósziget
(Eirik Raude - Oil Patch Explorer)
I need regex to match text beetween first set of parentheses. Results: text1.
I need regex to match text beetween first set of parentheses and second set of parentheses. Results: text2.
I need regex to match text beetween second set of parentheses. Results: text3.
text1: Superships, represent english title,
text2: Eirik Raude - olajkutató fúrósziget, represent hungarian subtitle,
text3: Eirik Raude - Oil Patch Explorer, represent english subtitle.
I need regex for perl script to match this title and subtitle. Example script:
($anchor) = $tree->look_down(_tag=>"h1", class=>"blackbigtitle");
if ($anchor) {
$elem = $anchor;
my ($engtitle, $engsubtitle, $hunsubtitle #tmp);
while (($elem = $elem->right()) &&
((ref $elem) && ($elem->tag() ne "table"))) {
#tmp = get_all_text($elem);
push #lines, #tmp;
$line = join(' ', #tmp);
if (($engtitle) = $line =~ m/**regex need that return text1**/) {
push #{$prog->{q(title)}}, [$engtitle, 'en'];
t "english-title added: $engtitle";
}
elsif (($engsubtitle) = $line =~ m/**regex need that return text3**/) {
push #{$prog->{q(sub-title)}}, [$subtitle, 'en'];
t "english_subtitle added: $engsubtitle";
}
elsif (($hunsubtitle) = $line =~ m/**regex need that return text2**/) {
push #{$prog->{q(hun-subtitle)}}, [$hunsubtitle, 'hu'];
t "hungarinan_subtitle added: $hunsubtitle";
}
}
}
Considering your comment, you can do something like :
if (($english_title) = $line =~ m/^\(([^)]+)\)$/) {
$found_english_title = 1;
# do stuff
} elsif (($english-subtitle) = $line =~ m/^([^()]+)$/) {
# do stuff
} elsif ($found_english_title && ($hungarian-title) = $line =~ m/^\(([^)]+)\)$/) {
# do stuff
}
If you need to match them all in one expression:
\(([^)]+)\)([^(]+)\(([^)]+)\)
This matches (, then anything that's not ), then ), then anything that's not (, then, (, ... I think you get the picture.
First group will be text1, second group will be text2, third group will be text3.
You can also just make a more generix regex that matches something like "(text1)", "(text1)text2(text3)" or "text1(text2)" when applied several times:
(?:^|[()])([^()])(?:[()]|$)
This matches the beginning of the string or ( or ), then characters that are not ( or ), then ( or ) or the end of the string. :? is for non-capturing group, so the first group will have the string. Something more complex is necessary to match ( with ) every time, i.e., it can match "(text1(".