How can I find sentences nested deeper than one bracket '()' set?

How can I find sentences nested deeper than one bracket '()' set? - regex

I want to print sentences from text file placed in () brackets deeper than one pair of brackets.
For example for this text file :
blabla(nothing(print me)) nanana (nanan)
blablabla(aaaaaaa(eeee(bbbb(cccc)bbb))aa)
blabla (blabla(hhhhh))
the output should be :
print me
eeee(bbbb(cccc)bbb)
bbbb(cccc)bbb
cccc
hhhhh
This is what I've done so far:
#!/usr/bin/perl -w
open(FILE, "<", $ARGV[0]) or die "file open error";
if ( #ARGV ) #if there are args
{
if ( -f $ARGV[0] ) #if its regular file
{
while(<FILE>)
{
my #array = split('\)',$_);
foreach(#array)
{
if ($_ =~ /.*\((.*)/)
{
print "$1\n";
}
}
}
close(FILE);
}
else{
print "Arg is not a file\n";}
}
else{
print "no args\n";}
My code can't separate the sentences placed in deeper brackets.

Assuming brackets are balanced:
use strict;
use warnings;
my #a;
while (<DATA>) {
while (/\(([^()]*(?:\(((?1))\)[^()]*(?{push #a, $2}))*+)\)/g){}
}
print join "\n", #a;
__DATA__
blabla(nothing(print me)) nanana (nanan)
blablabla(aaaaaaa(eeee(bbbb(cccc)bb(xxxx)b))aa)
blabla (blabla(hhhhh))
It returns:
print me
cccc
xxxx
bbbb(cccc)bb(xxxx)b
eeee(bbbb(cccc)bb(xxxx)b)
hhhhh
The idea is to store the capture group 2 content after each recursion, using the (?{...}) construct to execute code in the pattern.
Note that the order of results isn't ideal since the innermost content appears first. Unfortunately, I didn't find a way to change the order of results.
Pattern details:
\( # opening bracket level 1
( # open capture group 1
[^()]* # all that is not a bracket
(?:
\( # opening bracket for level 2 (or more when a recursion occurs)
( # capture group 2: to store the result
(?1) # recursion
)
\) # closing bracket for level 2 (or more ...)
[^()]* #
(?{push #a, $2}) # store the capture group 2 content in #a
)*+ # repeat when needed
)
\) # closing bracket level 1
EDIT: This pattern assumes that brackets are balanced, but if it isn't the case, this may cause problems of unwanted results for certain strings. The reason is that results are stored before the whole pattern succeeds.
Example with the string 1234 ( 5678 (abcd(efgh)ijkl) where a closing bracket is missing:
1234 ( 5678 (abcd(efgh)ijkl)
# ^ ^---- second attempt succeeds, "efgh" is stored
# '---- first attempt fails, but "efgh", "abcd(efgh)ijkl" are stored
To solve the problem, you can choose between two default behaviours:
the strict behaviour that only accepts balanced brackets. All you need is to store the results in a temporary array and to reset this array in the while loop or when a closing bracket is missing. In this case the result will only be "efgh":
my #a;
my #b;
while (<DATA>) {
while (/\(([^()]*(?:\(((?1))\)[^()]*(?{push #b, $2}))*+)(?:\)|(?{undef #b})(*F))/g) {
push #a, #b;
undef #b;
}
}
a more tolerant behaviour that doesn't make mandatory the closing bracket. To do that you must replace each \) with (?:\)|$). In this case, the first attempt succeeds and consumes characters until the end of the string (in other words, there isn't a second attempt). The results are "efgh" and "abcd(efgh)ijkl"

This is probably easiest, and the most maintainable with a two-pass solution.
The initial pass captures all first level parentheses. The second pass captures all enclosed parenthesis groups, only advancing a single character in order to match every level of embedded paren groups:
#!/usr/bin/env perl
use strict;
use warnings;
use v5.10;
my $data = do { local $/; <DATA> };
my $parens_content_re = qr{
\(
(
(?:
[^()]*+
|
\( (?1) \)
)*
)
\)
}x;
say for map {/(?=$parens_content_re)\(/g} map {/$parens_content_re/g} $data;
__DATA__
blabla(nothing(print me)) nanana (nanan)
blablabla(aaaaaaa(eeee(bbbb(cccc)bbb))aa)
blabla (blabla(hhhhh))
----(----(aaaa(123)bbbb(456)cccc)----)----
Outputs:
$ perl parens.pl
print me
eeee(bbbb(cccc)bbb)
bbbb(cccc)bbb
cccc
hhhhh
aaaa(123)bbbb(456)cccc
123
456

This code works by capturing levels recursively, using a simple regex for ) and split-ing by ( for the opening paren. It first prepares by peeling off the two starting layers of nesting. It works for shown examples, and a few others. However, there are other ways to nest pairs, for which rules are not specified. Also, this is probably rough around the edges. There is no magic of any kind involved and adjusting code for new cases should be feasible.
use warnings;
use strict;
my ($lev, #el, #res, $rret);
while (my $str = <DATA>)
{
print "\nString: $str\n";
#res = ();
# Drop two layers to start: strip last two ), split by ( and drop 0,1
$str =~ s/ (.*) \) [^)]* \) [^)]* $/$1/x;
#el = split '\(', $str;
#el = #el[2..$#el];
# Edge case: may have one element and be done, but with extra )
if (#el > 1) { $lev = join '(', #el }
else { ($lev = $el[0]) =~ s|\)||g }
push #res, $lev;
# Get next level and join string back, recursively
while ( $rret = nest_one($lev) ) {
$lev = join '(', #$rret;
push #res, $lev;
last if #$rret == 1;
}
print "\t$_\n" for #res;
}
# Strip last ) and past it, split by ( and drop first element
sub nest_one {
(my $lev = $_[0]) =~ s/(.*) \) [^)]* $/$1/x;
my #el = split '\(', $lev;
shift #el;
return (#el) ? \#el : undef;
}
__DATA__
blabla(nothing(print me)) nanana (nanan)
blablabla(aaaaaaa(eeee(bbbb(cccc)bbb))aa)
blabla (blabla(hhhhh))
It prints
blabla(nothing(print me)) nanana (nanan)
print me
blablabla(aaaaaaa(eeee(bbbb(cccc)bbb))aa)
eeee(bbbb(cccc)bbb)
bbbb(cccc)bbb
cccc
blabla (blabla(hhhhh))
hhhhh

Related

Dynamic regular expression for Nesting brackets failed due to unknow bugs

rencently I have met a strange bug when use a dynamic regular expressions in perl for Nesting brackets' match. The origin string is " {...test{...}...} ", I want to grep the pair brace begain with test, "test{...}". actually there are probably many pairs of brace before and end this group , I don't really know the deepth of them.
Following is my match scripts: nesting_parser.pl
#! /usr/bin/env perl
use Getopt::Long;
use Data::Dumper;
my %args = #ARGV;
if(exists$args{'-help'}) {printhelp();}
unless ($args{'-file'}) {printhelp();}
unless ($args{'-regex'}) {printhelp();}
my $OpenParents;
my $counts;
my $NestedGuts = qr {
(?{$OpenParents = 0})
(?>
(?:
[^{}]+
| \{ (?{$OpenParents++;$counts++; print "\nLeft:".$OpenParents." ;"})
| \} (?(?{$OpenParents ne 0; $counts++}) (?{$OpenParents--;print "Right: ".$OpenParents." ;"})) (?(?{$OpenParents eq 0}) (?!))
)*
)
}x;
my $string = `cat $args{'-file'}`;
my $partten = $args{'-regex'} ;
print "####################################################\n";
print "Grep [$partten\{...\}] from $args{'-file'}\n";
print "####################################################\n";
while ($string =~ /($partten$NestedGuts)/xmgs){
print $1."}\n";
print $2."####\n";
}
print "Regex has seen $counts brackts\n";
sub printhelp{
print "Usage:\n";
print "\t./nesting_parser.pl -file [file] -regex '[regex expression]'\n";
print "\t[file] : file path\n";
print "\t[regex] : regex string\n";
exit;
}
Actually my regex is:
our $OpenParents;
our $NestedGuts = qr {
(?{$OpenParents = 0})
(?>
(?:
[^{}]+
| \{ (?{$OpenParents++;})
| \} (?(?{$OpenParents ne 0}) (?{$OpenParents--})) (?(?{$OpenParents eq 0} (?!))
)*
)
}x;
I have add brace counts in nesting_parser.pl
I also write a string generator for debug: gen_nesting.pl
#! /usr/bin/env perl
use strict;
my $buffer = "{{{test{";
unless ($ARGV[0]) {print "Please specify the nest pair number!\n"; exit}
for (1..$ARGV[0]){
$buffer.= "\n\{\{\{\{$_\}\}\}\}";
#$buffer.= "\n\{\{\{\{\{\{\{\{\{$_\}\}\}\}\}\}\}\}\}";
}
$buffer .= "\n\}}}}";
open TEXT, ">log_$ARGV[0]";
print TEXT $buffer;
close TEXT;
You can generate a test file by
./gen_nesting.pl 1000
It will create a log file named log_1000, which include 1000 lines brace pairs
Now we test our match scripts:
./nesting_parser.pl -file log_1000 -regex "test" > debug_1000
debug_1000 looks like a great perfect result, matched successfully! But when I gen a 4000 lines test log file and match it again, it seem crashed:
./gen_nesting.pl 4000
./nesting_parser.pl -file log_4000 -regex "test" > debug_4000
The end of debug_4000 shows
{{{{3277}
####
Regex has seen 26213 brackts
I don't know what's wrong with the regex expresions, mostly it works well for paired brackets, untill recently I found it crashed when I try to match a text file more than 600,000 lines.
I'm really confused by this problems,
I really hope to solve this problem.
thank you all!

First for matching nested brackets I normally use Regexp::Common.
Next, I'm guessing that your problem is that Perl's regular expression engine breaks after matching 32767 groups. You can verify this by turning on warnings and looking for a message like Complex regular subexpression recursion limit (32766) exceeded.
If so, you can rewrite your code using /g and \G and pos. The idea being that you match the brackets in a loop like this untested code:
my $start = pos($string);
my $open_brackets = 0;
my $failed;
while (0 < $open_brackets or $start == pos($string)) {
if ($string =~ m/\G[^{}]*(\{|\})/g) {
if ($1 eq '{') {
$open_brackets++;
}
else {
$open_brackets--;
}
}
else {
$failed = 1;
break; # WE FAILED TO MATCH
}
}
if (not $failed and 0 == $open_brackets) {
my $matched = substr($string, $start, pos($string));
}

Perl Parsing CSV file with embedded commas

I'm parsing a CSV file with embedded commas, and obviously, using split() has a few limitations due to this.
One thing I should note is that the values with embedded commas are surrounded by parentheses, double quotes, or both...
for example:
(Date, Notional),
"Date, Notional",
"(Date, Notional)"
Also, I'm trying to do this without using any modules for certain reasons I don't want to go into right now...
Can anyone help me out with this?

This should do what you need. It works in a very similar way to the code in Text::CSV_PP, but doesn't allow for escaped characters within the field as you say you have none
use strict;
use warnings;
use 5.010;
my $re = qr/(?| "\( ( [^()""]* ) \)" | \( ( [^()]* ) \) | " ( [^"]* ) " | ( [^,]* ) ) , \s* /x;
my $line = '(Date, Notional 1), "Date, Notional 2", "(Date, Notional 3)"';
my #fields = "$line," =~ /$re/g;
say "<$_>" for #fields;
output
<Date, Notional 1>
<Date, Notional 2>
<Date, Notional 3>
Update
Here's a version for older Perls (prior to version 10) that don't have the regex branch reset construct. It produces identical output to the above
use strict;
use warnings;
use 5.010;
my $re = qr/(?: "\( ( [^()""]* ) \)" | \( ( [^()]* ) \) | " ( [^"]* ) " | ( [^,]* ) ) , \s* /x;
my $line = '(Date, Notional 1), "Date, Notional 2", "(Date, Notional 3)"';
my #fields = grep defined, "$line," =~ /$re/g;
say "<$_>" for #fields;

I know you already have a working solution with Borodin's answer, but for the record there is also a simple solution with split (see the results at the bottom of the online demo). This situation sounds very similar to regex match a pattern unless....
#!/usr/bin/perl
$regex = '(?:\([^\)]*\)|"[^"]*")(*SKIP)(*F)|\s*,\s*';
$subject = '(Date, Notional), "Date, Notional", "(Date, Notional)"';
#splits = split($regex, $subject);
print "\n*** Splits ***\n";
foreach(#splits) { print "$_\n"; }
How it Works
The left side of the alternation | matches complete (parentheses) and (quotes), then deliberately fails. The right side matches commas, and we know they are the right commas because they were not matched by the expression on the left.
Possible Refinements
If desired, the parenthess-matching portion could be made recursive to match (nested(parens))
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...

I know that this is quite old question, but for completeness I would like to add solution from great book "Mastering Regular Expressions" by Jeffrey Friedl (page 271):
sub parse_csv {
my $text = shift; # record containing comma-separated values
my #fields = ( );
my $field;
chomp($text);
while ($text =~ m{\G(?:^|,)(?:"((?>[^"]*)(?:""[^"]*)*)"|([^",]*))}gx) {
if (defined $2) {
$field = $2;
} else {
$field = $1;
$field =~ s/""/"/g;
}
# print "[$field]";
push #fields, $field;
}
return #fields;
}
Try it against test row:
my $line = q(Ten Thousand,10000, 2710 ,,"10,000",,"It's ""10 Grand"", baby",10K);
my #fields = parse_csv($line);
my $i;
for ($i = 0; $i < #fields; $i++) {
print "$fields[$i],";
}
print "\n";

How to remove strings which do not start or end with specific substring?

Unfortunately, I'm not a regex expert, so I need a little help.
I'm looking for the solution how to grep an array of strings to get two lists of strings which do not start (1) or end (2) with the specific substring.
Let's assume we have an array with strings matching to the following rule:
[speakerId]-[phrase]-[id].txt
i.e.
10-phraseone-10.txt 11-phraseone-3.txt 1-phraseone-2.txt
2-phraseone-1.txt 3-phraseone-1.txt 4-phraseone-1.txt
5-phraseone-3.txt 6-phraseone-2.txt 7-phraseone-2.txt
8-phraseone-10.txt 9-phraseone-2.txt 10-phrasetwo-1.txt
11-phrasetwo-1.txt 1-phrasetwo-1.txt 2-phrasetwo-1.txt
3-phrasetwo-1.txt 4-phrasetwo-1.txt 5-phrasetwo-1.txt
6-phrasetwo-3.txt 7-phrasetwo-10.txt 8-phrasetwo-1.txt
9-phrasetwo-1.txt 10-phrasethree-10.txt 11-phrasethree-3.txt
1-phrasethree-1.txt 2-phrasethree-11.txt 3-phrasethree-1.txt
4-phrasethree-3.txt 5-phrasethree-1.txt 6-phrasethree-3.txt
7-phrasethree-1.txt 8-phrasethree-1.txt 9-phrasethree-1.txt
Let's introduce variables:
$speakerId
$phrase
$id1, $id2
I would like to grep a list and obtain an array:
with elements which contain specific $phrase but we exclude those strigns which simultaneously start with specific $speakerId AND end with one of specified id's (for instance $id1 or $id2)
with elements which have specific $speakerId and $phrase but do NOT contain one of specific ids at the end (warning: remember to not exclude the 10 or 11 for $id=1 , etc.)
Maybe someone coulde use the following code to write the solution:
#AllEntries = readdir(INPUTDIR);
#Result1 = grep(/blablablahere/, #AllEntries);
#Result2 = grep(/anotherblablabla/, #AllEntries);
closedir(INPUTDIR);

Assuming a basic pattern to match your example:
(?:^|\b)(\d+)-(\w+)-(?!1|2)(\d+)\.txt(?:\b|$)
Which breaks down as:
(?:^|\b) # starts with a new line or a word delimeter
(\d+)- # speakerid and a hyphen
(\w+)- # phrase and a hyphen
(\d+) # id
\.txt # file extension
(?:\b|$) # end of line or word delimeter
You can assert exclusions using negative look-ahead. For instance, to include all matches that do not have the phrase phrasetwo you can modify the above expression to use a negative look-ahead:
(?:^|\b)(\d+)-(?!phrasetwo)(\w+)-(\d+)\.txt(?:\b|$)
Note how I include (?!phrasetwo). Alternatively, you find all phrasethree entries that end in an even number by using a look-behind instead of a look-ahead:
(?:^|\b)(\d+)-phrasethree-(\d+)(?<![13579])\.txt(?:\b|$)
(?<![13579]) just makes sure the last number of the ID falls on an even number.

It sounds a bit like you're describing a query function.
#!/usr/bin/perl -Tw
use strict;
use warnings;
use Data::Dumper;
my ( $set_a, $set_b ) = query( 2, 'phrasethree', [ 1, 3 ] );
print Dumper( { a => $set_a, b => $set_b } );
# a) fetch elements which
# 1. match $phrase
# 2. exclude $speakerId
# 3. match #ids
# b) fetch elements which
# 1. match $phrase
# 2. match $speakerId
# 3. exclude #ids
sub query {
my ( $speakerId, $passPhrase, $id_ra ) = #_;
my %has_id = map { ( $_ => 0 ) } #{$id_ra};
my ( #a, #b );
while ( my $filename = glob '*.txt' ) {
if ( $filename =~ m{\A ( \d+ )-( .+? )-( \d+ ) [.] txt \z}xms ) {
my ( $_speakerId, $_passPhrase, $_id ) = ( $1, $2, $3 );
if ( $_passPhrase eq $passPhrase ) {
if ( $_speakerId ne $speakerId
&& exists $has_id{$_id} )
{
push #a, $filename;
}
if ( $_speakerId eq $speakerId
&& !exists $has_id{$_id} )
{
push #b, $filename;
}
}
}
}
return ( \#a, \#b );
}

I like the approach with pure regular expressions using negative lookaheads and -behinds. However, it's a little bit hard to read. Maybe code like this could be more self-explanatory. It uses standard perl idioms that are readable like english in some cases:
my #all_entries = readdir(...);
my #matching_entries = ();
foreach my $entry (#all_entries) {
# split file name
next unless /^(\d+)-(.*?)-(\d+).txt$/;
my ($sid, $phrase, $id) = ($1, $2, $3);
# filter
next unless $sid eq "foo";
next unless $id == 42 or $phrase eq "bar";
# more readable filter rules
# match
push #matching_entries, $entry;
}
# do something with #matching_entries
If you really want to express something that complex in a grep list transformation, you could write code like this:
my #matching_entries = grep {
/^(\d)-(.*?)-(\d+).txt$/
and $1 eq "foo"
and ($3 == 42 or $phrase eq "bar")
# and so on
} readdir(...)

Regular expressions to match protected separated values

I'd like to have a regular expression to match a separated values with some protected values that can contain the separator character.
For instance:
"A,B,{C,D,E},F"
would give:
"A"
"B"
"{C,D,E}"
"F"
Please note the protected values can be nested, as follows:
"A,B,{C,D,{E,F}},G"
would give:
"A"
"B"
"{C,D,{E,F}}"
"G"
I already coded that feature with a character iteration as follow:
sub Parse
{
my #item;
my $curly;
my $string;
foreach(split //)
{
$_ eq "{" and ++$curly;
$_ eq "}" and --$curly;
if(!$curly && /[,:]/)
{
push #item, $string;
undef $string;
next;
}
$string .= $_;
}
push #item, $string;
return #item;
}
But it would definitively be so much nicer with a regexp.

A regex that supports nesting would look as follows:
my #items;
push #items, $1 while
/
(?: ^ | \G , )
(
(?: [^,{}]+
| (
\{
(?: [^{}]
| (?2)
)*
\}
)
| # Empty
)
)
/xg;
$ perl -E'$_ = shift; ... say for #items;' 'A,B,{C,D,{E,F}},G'
A
B
{C,D,{E,F}}
G
Assumes valid input since it can't extract and validate at the same time. (Well, not without making things really messy.)

Improved from nhahtdh's answer.
$_ = "A,B,{C,D,E},F";
while ( m/(\{.*?\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "[$&]\n";
}
Improved it again. Please look at this one!
$_ = "A,B,{C,D,{E,F}},G";
while ( m/(\{.*\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "$&\n";
}
It will get:
A
B
{C,D,{E,F}}
G

$a = "A,B,{C,D,E},F";
while ($a =~ s/(\{[\{\}\w,]+\}|\w)//) {
push (#res, $1);
}
print "\#res: #res\n"
Result:
#res: A B {C,D,E} F
Explanation : we try to match either the protected block \{[\{\}\w,]+\} or just a single character \w successively in a loop, deleting it from the original string if there is a match. Every time there is a match, we store it (meaning the $1) in the array, et voilà!

Here is a regex in bash:
chronos#localhost / $ echo "A,B,{C,D,E},F" | grep -oE "(\{[^\}]*\}|[A-Z])"
A
B
{C,D,E}
F

Try this regex. Use the regex to match and extract the token.
/(\{.*?\}|(?<=,|^).*?(?=,|$))/
I have not tested this code in Perl.
There is an assumption about on how the regex engine works here (I assume that it will try to match the first part \{.*?\} before the second part). I also assume that there are no nested curly bracket, and badly paired curly brackets.

$s = "A,B,{C,D,E},F";
#t = split /,(?=.*{)|,(?!.*})/, $s;

Matching balanced parenthesis in Perl regex

I have an expression which I need to split and store in an array:
aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } }, aaa="bbb{}" { aa="b}b" }, aaa="bbb,ccc"
It should look like this once split and stored in the array:
aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } }
aaa="bbb{}" { aa="b}b" }
aaa="bbb,ccc"
I use Perl version 5.8 and could someone resolve this?

Use the perl module "Regexp::Common". It has a nice balanced parenthesis Regex that works well.
# ASN.1
use Regexp::Common;
$bp = $RE{balanced}{-parens=>'{}'};
#genes = $l =~ /($bp)/g;

There's an example in perlre, using the recursive regex features introduced in v5.10. Although you are limited to v5.8, other people coming to this question should get the right solution :)
$re = qr{
( # paren group 1 (full function)
foo
( # paren group 2 (parens)
\(
( # paren group 3 (contents of parens)
(?:
(?> [^()]+ ) # Non-parens without backtracking
|
(?2) # Recurse to start of paren group 2
)*
)
\)
)
)
}x;

I agree with Scott Rippey, more or less, about writing your own parser. Here's a simple one:
my $in = 'aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } }, ' .
'aaa="bbb{}" { aa="b}b" }, ' .
'aaa="bbb,ccc"'
;
my #out = ('');
my $nesting = 0;
while($in !~ m/\G$/cg)
{
if($nesting == 0 && $in =~ m/\G,\s*/cg)
{
push #out, '';
next;
}
if($in =~ m/\G(\{+)/cg)
{ $nesting += length $1; }
elsif($in =~ m/\G(\}+)/cg)
{
$nesting -= length $1;
die if $nesting < 0;
}
elsif($in =~ m/\G((?:[^{}"]|"[^"]*")+)/cg)
{ }
else
{ die; }
$out[-1] .= $1;
}
(Tested in Perl 5.10; sorry, I don't have Perl 5.8 handy, but so far as I know there aren't any relevant differences.) Needless to say, you'll want to replace the dies with something application-specific. And you'll likely have to tweak the above to handle cases not included in your example. (For example, can quoted strings contain \"? Can ' be used instead of "? This code doesn't handle either of those possibilities.)

To match balanced parenthesis or curly brackets, and if you want to take under account backslashed (escaped) ones, the proposed solutions would not work. Instead, you would write something like this (building on the suggested solution in perlre):
$re = qr/
( # paren group 1 (full function)
foo
(?<paren_group> # paren group 2 (parens)
\(
( # paren group 3 (contents of parens)
(?:
(?> (?:\\[()]|(?![()]).)+ ) # escaped parens or no parens
|
(?&paren_group) # Recurse to named capture group
)*
)
\)
)
)
/x;

Try something like this:
use strict;
use warnings;
use Data::Dumper;
my $exp=<<END;
aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } } , aaa="bbb{}" { aa="b}b" }, aaa="bbb,ccc"
END
chomp $exp;
my #arr = map { $_ =~ s/^\s*//; $_ =~ s/\s* $//; "$_}"} split('}\s*,',$exp);
print Dumper(\#arr);

Although Recursive Regular Expressions can usually be used to capture "balanced braces" {}, they won't work for you, because you ALSO have the requirement to match "balanced quotes" ".
This would be a very tricky task for a Perl Regular Expression, and I'm fairly certain it's not possible. (In contrast, it could probably be done with Microsoft's "balancing groups" Regex feature).
I would suggest creating your own parser. As you process each character, you count each " and {}, and only split on , if they are "balanced".

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can I find sentences nested deeper than one bracket '()' set? - regex

Related

Dynamic regular expression for Nesting brackets failed due to unknow bugs

Perl Parsing CSV file with embedded commas

How to remove strings which do not start or end with specific substring?

Regular expressions to match protected separated values

Matching balanced parenthesis in Perl regex

Categories

Resources