Related
I would like to find a regex that will pick out all commas that fall outside quote sets.
For example:
'foo' => 'bar',
'foofoo' => 'bar,bar'
This would pick out the single comma on line 1, after 'bar',
I don't really care about single vs double quotes.
Has anyone got any thoughts? I feel like this should be possible with readaheads, but my regex fu is too weak.
This will match any string up to and including the first non-quoted ",". Is that what you are wanting?
/^([^"]|"[^"]*")*?(,)/
If you want all of them (and as a counter-example to the guy who said it wasn't possible) you could write:
/(,)(?=(?:[^"]|"[^"]*")*$)/
which will match all of them. Thus
'test, a "comma,", bob, ",sam,",here'.gsub(/(,)(?=(?:[^"]|"[^"]*")*$)/,';')
replaces all the commas not inside quotes with semicolons, and produces:
'test; a "comma,"; bob; ",sam,";here'
If you need it to work across line breaks just add the m (multiline) flag.
The below regexes would match all the comma's which are present outside the double quotes,
,(?=(?:[^"]*"[^"]*")*[^"]*$)
DEMO
OR(PCRE only)
"[^"]*"(*SKIP)(*F)|,
"[^"]*" matches all the double quoted block. That is, in this buz,"bar,foo" input, this regex would match "bar,foo" only. Now the following (*SKIP)(*F) makes the match to fail. Then it moves on to the pattern which was next to | symbol and tries to match characters from the remaining string. That is, in our output , next to pattern | will match only the comma which was just after to buz . Note that this won't match the comma which was present inside double quotes, because we already make the double quoted part to skip.
DEMO
The below regex would match all the comma's which are present inside the double quotes,
,(?!(?:[^"]*"[^"]*")*[^"]*$)
DEMO
While it's possible to hack it with a regex (and I enjoy abusing regexes as much as the next guy), you'll get in trouble sooner or later trying to handle substrings without a more advanced parser. Possible ways to get in trouble include mixed quotes, and escaped quotes.
This function will split a string on commas, but not those commas that are within a single- or double-quoted string. It can be easily extended with additional characters to use as quotes (though character pairs like « » would need a few more lines of code) and will even tell you if you forgot to close a quote in your data:
function splitNotStrings(str){
var parse=[], inString=false, escape=0, end=0
for(var i=0, c; c=str[i]; i++){ // looping over the characters in str
if(c==='\\'){ escape^=1; continue} // 1 when odd number of consecutive \
if(c===','){
if(!inString){
parse.push(str.slice(end, i))
end=i+1
}
}
else if(splitNotStrings.quotes.indexOf(c)>-1 && !escape){
if(c===inString) inString=false
else if(!inString) inString=c
}
escape=0
}
// now we finished parsing, strings should be closed
if(inString) throw SyntaxError('expected matching '+inString)
if(end<i) parse.push(str.slice(end, i))
return parse
}
splitNotStrings.quotes="'\"" // add other (symmetrical) quotes here
Try this regular expression:
(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*=>\s*(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*,
This does also allow strings like “'foo\'bar' => 'bar\\',”.
MarkusQ's answer worked great for me for about a year, until it didn't. I just got a stack overflow error on a line with about 120 commas and 3682 characters total. In Java, like this:
String[] cells = line.split("[\t,](?=(?:[^\"]|\"[^\"]*\")*$)", -1);
Here's my extremely inelegant replacement that doesn't stack overflow:
private String[] extractCellsFromLine(String line) {
List<String> cellList = new ArrayList<String>();
while (true) {
String[] firstCellAndRest;
if (line.startsWith("\"")) {
firstCellAndRest = line.split("([\t,])(?=(?:[^\"]|\"[^\"]*\")*$)", 2);
}
else {
firstCellAndRest = line.split("[\t,]", 2);
}
cellList.add(firstCellAndRest[0]);
if (firstCellAndRest.length == 1) {
break;
}
line = firstCellAndRest[1];
}
return cellList.toArray(new String[cellList.size()]);
}
#SocialCensus, The example you gave in the comment to MarkusQ, where you throw in ' alongside the ", doesn't work with the example MarkusQ gave right above that if we change sam to sam's: (test, a "comma,", bob, ",sam's,",here) has no match against (,)(?=(?:[^"']|["|'][^"']")$). In fact, the problem itself, "I don't really care about single vs double quotes", is ambiguous. You have to be clear what you mean by quoting either with " or with '. For example, is nesting allowed or not? If so, to how many levels? If only 1 nested level, what happens to a comma outside the inner nested quotation but inside the outer nesting quotation? You should also consider that single quotes happen by themselves as apostrophes (ie, like the counter-example I gave earlier with sam's). Finally, the regex you made doesn't really treat single quotes on par with double quotes since it assumes the last type of quotation mark is necessarily a double quote -- and replacing that last double quote with ['|"] also has a problem if the text doesn't come with correct quoting (or if apostrophes are used), though, I suppose we probably could assume all quotes are correctly delineated.
MarkusQ's regexp answers the question: find all commas that have an even number of double quotes after it (ie, are outside double quotes) and disregard all commas that have an odd number of double quotes after it (ie, are inside double quotes). This is generally the same solution as what you probably want, but let's look at a few anomalies. First, if someone leaves off a quotation mark at the end, then this regexp finds all the wrong commas rather than finding the desired ones or failing to match any. Of course, if a double quote is missing, all bets are off since it might not be clear if the missing one belongs at the end or instead belongs at the beginning; however, there is a case that is legitimate and where the regex could conceivably fail (this is the second "anomaly"). If you adjust the regexp to go across text lines, then you should be aware that quoting multiple consecutive paragraphs requires that you place a single double quote at the beginning of each paragraph and leave out the quote at the end of each paragraph except for at the end of the very last paragraph. This means that over the space of those paragraphs, the regex will fail in some places and succeed in others.
Examples and brief discussions of paragraph quoting and of nested quoting can be found here http://en.wikipedia.org/wiki/Quotation_mark .
When I run the code below I get the unexpected result where \* also captures É. Is there a way to make it only capture * like I wanted?
let s =
"* A
ÉR
* B"
let result = System.Text.RegularExpressions.Regex.Replace(s, "\n(?!\*)", "", Text.RegularExpressions.RegexOptions.Multiline)
printfn "%s" result
Result After Running Replace
* AÉR
* B
Expected Result
"* A
ÉR
* B"
UPDATE
This seems to be working, when I use a pattern like so \n(?=\*). I guess I needed a positive lookahead.
You're right that you need to use positive lookahead instead of negative lookahead to get the result you want. However, to clarify an issue that came up in the comments, in F# a string delimited by just "" is not quite like either a plain C# string delimited by "" or a C# string delimited by #"" - if you want the latter you should also use #"" in F#. The difference is that in a normal F# string, backslashes will be treated as escape sequences only when used in front of a valid character to escape (see the table towards the top of Strings (F#)). Otherwise, it is treated as a literal backslash character. So, since '*' is not a valid character to escape, you luckily see the behavior you expected (in C#, by contrast, this would be a syntax error because it's an unrecognized escape). I would recommend that you not rely on this and should use a verbatim #"" string instead.
In other words, in F# the following three strings are all equivalent:
let s1 = "\n\*"
let s2 = "\n\\*"
let s3 = #"
\*"
I think that the C# design is more sensible because it prevents confusion on what exactly is being escaped.
I would like to find a regex that will pick out all commas that fall outside quote sets.
For example:
'foo' => 'bar',
'foofoo' => 'bar,bar'
This would pick out the single comma on line 1, after 'bar',
I don't really care about single vs double quotes.
Has anyone got any thoughts? I feel like this should be possible with readaheads, but my regex fu is too weak.
This will match any string up to and including the first non-quoted ",". Is that what you are wanting?
/^([^"]|"[^"]*")*?(,)/
If you want all of them (and as a counter-example to the guy who said it wasn't possible) you could write:
/(,)(?=(?:[^"]|"[^"]*")*$)/
which will match all of them. Thus
'test, a "comma,", bob, ",sam,",here'.gsub(/(,)(?=(?:[^"]|"[^"]*")*$)/,';')
replaces all the commas not inside quotes with semicolons, and produces:
'test; a "comma,"; bob; ",sam,";here'
If you need it to work across line breaks just add the m (multiline) flag.
The below regexes would match all the comma's which are present outside the double quotes,
,(?=(?:[^"]*"[^"]*")*[^"]*$)
DEMO
OR(PCRE only)
"[^"]*"(*SKIP)(*F)|,
"[^"]*" matches all the double quoted block. That is, in this buz,"bar,foo" input, this regex would match "bar,foo" only. Now the following (*SKIP)(*F) makes the match to fail. Then it moves on to the pattern which was next to | symbol and tries to match characters from the remaining string. That is, in our output , next to pattern | will match only the comma which was just after to buz . Note that this won't match the comma which was present inside double quotes, because we already make the double quoted part to skip.
DEMO
The below regex would match all the comma's which are present inside the double quotes,
,(?!(?:[^"]*"[^"]*")*[^"]*$)
DEMO
While it's possible to hack it with a regex (and I enjoy abusing regexes as much as the next guy), you'll get in trouble sooner or later trying to handle substrings without a more advanced parser. Possible ways to get in trouble include mixed quotes, and escaped quotes.
This function will split a string on commas, but not those commas that are within a single- or double-quoted string. It can be easily extended with additional characters to use as quotes (though character pairs like « » would need a few more lines of code) and will even tell you if you forgot to close a quote in your data:
function splitNotStrings(str){
var parse=[], inString=false, escape=0, end=0
for(var i=0, c; c=str[i]; i++){ // looping over the characters in str
if(c==='\\'){ escape^=1; continue} // 1 when odd number of consecutive \
if(c===','){
if(!inString){
parse.push(str.slice(end, i))
end=i+1
}
}
else if(splitNotStrings.quotes.indexOf(c)>-1 && !escape){
if(c===inString) inString=false
else if(!inString) inString=c
}
escape=0
}
// now we finished parsing, strings should be closed
if(inString) throw SyntaxError('expected matching '+inString)
if(end<i) parse.push(str.slice(end, i))
return parse
}
splitNotStrings.quotes="'\"" // add other (symmetrical) quotes here
Try this regular expression:
(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*=>\s*(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*,
This does also allow strings like “'foo\'bar' => 'bar\\',”.
MarkusQ's answer worked great for me for about a year, until it didn't. I just got a stack overflow error on a line with about 120 commas and 3682 characters total. In Java, like this:
String[] cells = line.split("[\t,](?=(?:[^\"]|\"[^\"]*\")*$)", -1);
Here's my extremely inelegant replacement that doesn't stack overflow:
private String[] extractCellsFromLine(String line) {
List<String> cellList = new ArrayList<String>();
while (true) {
String[] firstCellAndRest;
if (line.startsWith("\"")) {
firstCellAndRest = line.split("([\t,])(?=(?:[^\"]|\"[^\"]*\")*$)", 2);
}
else {
firstCellAndRest = line.split("[\t,]", 2);
}
cellList.add(firstCellAndRest[0]);
if (firstCellAndRest.length == 1) {
break;
}
line = firstCellAndRest[1];
}
return cellList.toArray(new String[cellList.size()]);
}
#SocialCensus, The example you gave in the comment to MarkusQ, where you throw in ' alongside the ", doesn't work with the example MarkusQ gave right above that if we change sam to sam's: (test, a "comma,", bob, ",sam's,",here) has no match against (,)(?=(?:[^"']|["|'][^"']")$). In fact, the problem itself, "I don't really care about single vs double quotes", is ambiguous. You have to be clear what you mean by quoting either with " or with '. For example, is nesting allowed or not? If so, to how many levels? If only 1 nested level, what happens to a comma outside the inner nested quotation but inside the outer nesting quotation? You should also consider that single quotes happen by themselves as apostrophes (ie, like the counter-example I gave earlier with sam's). Finally, the regex you made doesn't really treat single quotes on par with double quotes since it assumes the last type of quotation mark is necessarily a double quote -- and replacing that last double quote with ['|"] also has a problem if the text doesn't come with correct quoting (or if apostrophes are used), though, I suppose we probably could assume all quotes are correctly delineated.
MarkusQ's regexp answers the question: find all commas that have an even number of double quotes after it (ie, are outside double quotes) and disregard all commas that have an odd number of double quotes after it (ie, are inside double quotes). This is generally the same solution as what you probably want, but let's look at a few anomalies. First, if someone leaves off a quotation mark at the end, then this regexp finds all the wrong commas rather than finding the desired ones or failing to match any. Of course, if a double quote is missing, all bets are off since it might not be clear if the missing one belongs at the end or instead belongs at the beginning; however, there is a case that is legitimate and where the regex could conceivably fail (this is the second "anomaly"). If you adjust the regexp to go across text lines, then you should be aware that quoting multiple consecutive paragraphs requires that you place a single double quote at the beginning of each paragraph and leave out the quote at the end of each paragraph except for at the end of the very last paragraph. This means that over the space of those paragraphs, the regex will fail in some places and succeed in others.
Examples and brief discussions of paragraph quoting and of nested quoting can be found here http://en.wikipedia.org/wiki/Quotation_mark .
As stated in the title, is there a way, using regular expressions, to match a text pattern for text that appears outside of quotes. Ideally, given the following examples, I would want to be able to match the comma that is outside of the quotes, but not the one in the quotes.
This is some text, followed by "text, in quotes!"
or
This is some text, followed by "text, in quotes" with more "text, in quotes!"
Additionally, it would be nice if the expression would respect nested quotes as in the following example. However, if this is technically not feasible with regular expressions then it wold simply be nice to know if that is the case.
The programmer looked up from his desk, "This can't be good," he exclaimed, "the system is saying 'File not found!'"
I have found some expressions for matching something that would be in the quotes, but nothing quite for something outside of the quotes.
Easiest is matching both commas and quoted strings, and then filtering out the quoted strings.
/"[^"]*"|,/g
If you really can't have the quotes matching, you could do something like this:
/,(?=[^"]*(?:"[^"]*"[^"]*)*\Z)/g
This could become slow, because for each comma, it has to look at the remaining characters and count the number of quotes. \Z matches the end of the string. Similar to $, but will never match line ends.
If you don't mind an extra capture group, it could be done like this instead:
/\G((?:[^"]*"[^"]*")*?[^"]*?)(,)/g
This will only scan the string once. It counts the quotes from the beginning of the string instead. \G will match the position where last match ended.
The last pattern could need an example.
Input String: 'This is, some text, followed by "text, in quotes!" and more ,-as'
Matches:
1. ['This is', ',']
2. [' some text', ',']
3. [' and followed by "text, in quotes!" and more ', ',']
It matches the string leading up to the comma, as well as the comma.
This can be done with modern regexes due to the massive number of hacks to regex engines that exist, but let me be the one to post the "Don't Do This With Regular Expressions" answer.
This is not a job for regular expressions. This is a job for a full-blown parser. As an example of something you can't do with (classical) regular expressions, consider this:
()(())(()())
No (classical) regex can determine if those parenthesis are matched properly, but doing so without a regex is trivial:
/* C code */
char string[] = "()(())(()())";
int parens = 0;
for(char *tmp = string; tmp; tmp++)
{
if(*tmp == '(') parens++;
if(*tmp == ')') parens--;
}
if(parens > 0)
{
printf("%s too many open parenthesis.\n", parens);
}
else if(parens < 0)
{
printf("%s too many closing parenthesis.\n", -parens);
}
else
{
printf("Parenthesis match!\n");
}
# Perl code
my $string = "()(())(()())";
my $parens = 0;
for(split(//, $string)) {
$parens++ if $_ eq "(";
$parens-- if $_ eq ")";
}
die "Too many open parenthesis.\n" if $parens > 0;
die "Too many closing parenthesis.\n" if $parens < 0;
print "Parenthesis match!";
See how simple it was to write some non-regex code to do the job for you?
EDIT: Okay, back from seeing Adventureland. :) Try this (written in Perl, commented to help you understand what I'm doing if you don't know Perl):
# split $string into a list, split on the double quote character
my #temp = split(/"/, $string);
# iterate through a list of the number of elements in our list
for(0 .. $#temp) {
# skip odd-numbered elements - only process $list[0], $list[2], etc.
# the reason is that, if we split on "s, every other element is a string
next if $_ & 1;
if($temp[$_] =~ /regex/) {
# do stuff
}
}
Another way to do it:
my $bool = 0;
my $str;
my $match;
# loop through the characters of a string
for(split(//, $string)) {
if($_ eq '"') {
$bool = !$bool;
if($bool) {
# regex time!
$match += $str =~ /regex/;
$str = "";
}
}
if(!$bool) {
# add the current character to our test string
$str .= $_;
}
}
# get trailing string match
$match += $str =~ /regex/;
(I give two because, in another language, one solution may be easier to implement than the other, not just because There's More Than One Way To Do It™.)
Of course, as your problems grow in complexity, there will arise certain benefits of constructing a full-blown parser, but that's a different horse. For now, this will suffice.
As mentioned before, regexp cannot match any nested pattern, since it is not a Context-free language.
So if you have any nested quotes, you are not going to solve this with a regex.
(Except with the "balancing group" feature of a .Net regex engine - as mentioned by Daniel L in the comments - , but I am not making any assumption of the regex flavor here)
Except if you add further specification, like a quote within a quote must be escaped.
In that case, the following:
text before string "string with \escape quote \" still
within quote" text outside quote "within quote \" still inside" outside "
inside" final outside text
would be matched successfully with:
(?ms)((?:\\(?=")|[^"])+)(?:"((?:[^"]|(?<=\\)")+)(?<!\\)")?
group1: text preceding a quoted text
group2: text within double quotes, even if \" are present in it.
Here is an expression that gets the match, but it isn't perfect, as the first match it gets is the whole string, removing the final ".
[^"].*(,).*[^"]
I have been using my Free RegEx tester to see what works.
Test Results
Group Match Collection # 1
Match # 1
Value: This is some text, followed by "text, in quotes!
Captures: 1
Match # 2
Value: ,
Captures: 1
You should better build yourself a simple parser (pseudo-code):
quoted := False
FOR char IN string DO
IF char = '"'
quoted := !quoted
ELSE
IF char = "," AND !quoted
// not quoted comma found
ENDIF
ENDIF
ENDFOR
This really depends on if you allow nested quotes or not.
In theory, with nested quotes you cannot do this (regular languages can't count)
In practice, you might manage if you can constrain the depth. It will get increasingly ugly as you add complexity. This is often how people get into grief with regular expressions (trying to match something that isn't actually regular in general).
Note that some "regex" libraries/languages have added non-regular features.
If this sort of thing gets complicated enough, you'll really have to write/generate a parser for it.
You need more in your description. Do you want any set of possible quoted strings and non-quoted strings like this ...
Lorem ipsum "dolor sit" amet, "consectetur adipiscing" elit.
... or simply the pattern you asked for? This is pretty close I think ...
(?<outside>.*?)(?<inside>(?=\"))
It does capture the "'s however.
Maybe you could do it in two steps?
First you replace the quoted text:
("[^"]*")
and then you extract what you want from the remaining string
,(?=(?:[^"]*"[^"]*")*[^"]*\z)
Regexes may not be able to count, but they can determine whether there's an odd or even number of something. After finding a comma, the lookahead asserts that, if there are any quotation marks ahead, there's an even number of them, meaning the comma is not inside a set of quotes.
This can be tweaked to handle escaped quotes if needed, though the original question didn't mention that. Also, if your regex flavor supports them, I would add atomic groups or possessive quantifiers to keep backtracking in check.
I would like to find a regex that will pick out all commas that fall outside quote sets.
For example:
'foo' => 'bar',
'foofoo' => 'bar,bar'
This would pick out the single comma on line 1, after 'bar',
I don't really care about single vs double quotes.
Has anyone got any thoughts? I feel like this should be possible with readaheads, but my regex fu is too weak.
This will match any string up to and including the first non-quoted ",". Is that what you are wanting?
/^([^"]|"[^"]*")*?(,)/
If you want all of them (and as a counter-example to the guy who said it wasn't possible) you could write:
/(,)(?=(?:[^"]|"[^"]*")*$)/
which will match all of them. Thus
'test, a "comma,", bob, ",sam,",here'.gsub(/(,)(?=(?:[^"]|"[^"]*")*$)/,';')
replaces all the commas not inside quotes with semicolons, and produces:
'test; a "comma,"; bob; ",sam,";here'
If you need it to work across line breaks just add the m (multiline) flag.
The below regexes would match all the comma's which are present outside the double quotes,
,(?=(?:[^"]*"[^"]*")*[^"]*$)
DEMO
OR(PCRE only)
"[^"]*"(*SKIP)(*F)|,
"[^"]*" matches all the double quoted block. That is, in this buz,"bar,foo" input, this regex would match "bar,foo" only. Now the following (*SKIP)(*F) makes the match to fail. Then it moves on to the pattern which was next to | symbol and tries to match characters from the remaining string. That is, in our output , next to pattern | will match only the comma which was just after to buz . Note that this won't match the comma which was present inside double quotes, because we already make the double quoted part to skip.
DEMO
The below regex would match all the comma's which are present inside the double quotes,
,(?!(?:[^"]*"[^"]*")*[^"]*$)
DEMO
While it's possible to hack it with a regex (and I enjoy abusing regexes as much as the next guy), you'll get in trouble sooner or later trying to handle substrings without a more advanced parser. Possible ways to get in trouble include mixed quotes, and escaped quotes.
This function will split a string on commas, but not those commas that are within a single- or double-quoted string. It can be easily extended with additional characters to use as quotes (though character pairs like « » would need a few more lines of code) and will even tell you if you forgot to close a quote in your data:
function splitNotStrings(str){
var parse=[], inString=false, escape=0, end=0
for(var i=0, c; c=str[i]; i++){ // looping over the characters in str
if(c==='\\'){ escape^=1; continue} // 1 when odd number of consecutive \
if(c===','){
if(!inString){
parse.push(str.slice(end, i))
end=i+1
}
}
else if(splitNotStrings.quotes.indexOf(c)>-1 && !escape){
if(c===inString) inString=false
else if(!inString) inString=c
}
escape=0
}
// now we finished parsing, strings should be closed
if(inString) throw SyntaxError('expected matching '+inString)
if(end<i) parse.push(str.slice(end, i))
return parse
}
splitNotStrings.quotes="'\"" // add other (symmetrical) quotes here
Try this regular expression:
(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*=>\s*(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*,
This does also allow strings like “'foo\'bar' => 'bar\\',”.
MarkusQ's answer worked great for me for about a year, until it didn't. I just got a stack overflow error on a line with about 120 commas and 3682 characters total. In Java, like this:
String[] cells = line.split("[\t,](?=(?:[^\"]|\"[^\"]*\")*$)", -1);
Here's my extremely inelegant replacement that doesn't stack overflow:
private String[] extractCellsFromLine(String line) {
List<String> cellList = new ArrayList<String>();
while (true) {
String[] firstCellAndRest;
if (line.startsWith("\"")) {
firstCellAndRest = line.split("([\t,])(?=(?:[^\"]|\"[^\"]*\")*$)", 2);
}
else {
firstCellAndRest = line.split("[\t,]", 2);
}
cellList.add(firstCellAndRest[0]);
if (firstCellAndRest.length == 1) {
break;
}
line = firstCellAndRest[1];
}
return cellList.toArray(new String[cellList.size()]);
}
#SocialCensus, The example you gave in the comment to MarkusQ, where you throw in ' alongside the ", doesn't work with the example MarkusQ gave right above that if we change sam to sam's: (test, a "comma,", bob, ",sam's,",here) has no match against (,)(?=(?:[^"']|["|'][^"']")$). In fact, the problem itself, "I don't really care about single vs double quotes", is ambiguous. You have to be clear what you mean by quoting either with " or with '. For example, is nesting allowed or not? If so, to how many levels? If only 1 nested level, what happens to a comma outside the inner nested quotation but inside the outer nesting quotation? You should also consider that single quotes happen by themselves as apostrophes (ie, like the counter-example I gave earlier with sam's). Finally, the regex you made doesn't really treat single quotes on par with double quotes since it assumes the last type of quotation mark is necessarily a double quote -- and replacing that last double quote with ['|"] also has a problem if the text doesn't come with correct quoting (or if apostrophes are used), though, I suppose we probably could assume all quotes are correctly delineated.
MarkusQ's regexp answers the question: find all commas that have an even number of double quotes after it (ie, are outside double quotes) and disregard all commas that have an odd number of double quotes after it (ie, are inside double quotes). This is generally the same solution as what you probably want, but let's look at a few anomalies. First, if someone leaves off a quotation mark at the end, then this regexp finds all the wrong commas rather than finding the desired ones or failing to match any. Of course, if a double quote is missing, all bets are off since it might not be clear if the missing one belongs at the end or instead belongs at the beginning; however, there is a case that is legitimate and where the regex could conceivably fail (this is the second "anomaly"). If you adjust the regexp to go across text lines, then you should be aware that quoting multiple consecutive paragraphs requires that you place a single double quote at the beginning of each paragraph and leave out the quote at the end of each paragraph except for at the end of the very last paragraph. This means that over the space of those paragraphs, the regex will fail in some places and succeed in others.
Examples and brief discussions of paragraph quoting and of nested quoting can be found here http://en.wikipedia.org/wiki/Quotation_mark .