regex to remove comma between double quotes notepad++ - regex

I am trying to remove commas inside double quotes from a csv file in notepad++, this is what I have:
1070,17,2,GN3-670,"COLLAR B, M STAY","2,606.45"
and I need this:
1070,17,2,GN3-670,"COLLAR B M STAY","2606.45"
I ma trying to use notepad find/replace option with a reg exp. pattern.
I tried all kind of combination but didn't manage to do :( The file contains 1 million rows.
After whole today I am not anymore sure if a simple regex can do? Maybe I should go with a script...python?

mrki, this will do what you want (tested in N++):
Search: ("[^",]+),([^"]+")
Replace: $1$2 or \1\2
How does this work? The first parentheses capture the beginning of the string up to (but not including) the comma into Group 1. The second parentheses capture the end of the string after the comma into Group 2. The replacement substitutes the string with a concatenation of Group 1 and Group 2.
In more detail: in the first parentheses, we match the opening double quotes then any number of characters that are not a comma. That is the meaning of [^,]+. In the second parentheses, we match any number of characters that are not a double quote with [^"]+, then the closing double quotes .

Try the following
import re
print re.sub(',(?=[^"]*"[^"]*(?:"[^"]*"[^"]*)*$)',"",string)
This will remove comma between quotes

Just an update to #zx81's brilliant solution.
Lets say you have 2commas between quotes
Then the regex search has to be modified as follows:
("[^",]+),([^",]+),([^"]+")
Replace needs to be modified as
$1$2$3
So on modify it depending on the # of commas.
I tried exploring to see if recursive regex was possible but the does not seem to be possible as of now

For a line with multiple instances of "comma within double quotes", I can think of the following perl script - you need to have a header line without this kind of instance so that you know how many comma-separated fields there should be.
#! /usr/bin/perl -w
use strict;
my $n_fields = "";
while (<>) {
s/\s+$//;
if (/^\#/) { # header line
my #t = split(/,/);
$n_fields = scalar(#t); # total number of fields
} else { # actual data
my $n_commas = $_ =~s/,/,/g; # total number of commas
foreach my $i (0 .. $n_commas - $n_fields) { # iterate ($n_commas - $n_fields + 1) times
s/(\"[^",]+),([^"]+\")/$1\\x2c$2/g; # single replacement per previous answers
}
s/\"//g; # removal of double quotes (if you want)
}
print "$_\n";
}

Related

Replace double quoted strings by single quoted except for GStrings

My OCD has gotten the better of me and I'm going through my groovy codebase replacing simple strings with double quotes around them into single quoted strings.
However, I want to avoid GStrings that actually contain dollar symbols and variables.
I'm using IntelliJ to do the substitution, and the following almost works:
From: "([^$\"\n\r]+)"
To: '$1'
It captures strings without any dollars in, but only partially skips any strings that contain them.
For example it matches the quotes between two double quoted strings in this case:
foo("${var}": "bar")
^^^^
Is it possible to create a regex that would skip a whole string that contained dollars, so in the above case it skips "${var}" and selects "bar", instead of erroneously selecting ": "?
EDIT: Here's a section of code to try against
table.columns.elements.each{ columnName, column ->
def columnText = "${columnName} : ${column.dataType}"
cols += "${columnText}\n"
if (columnText.length() > width) {
width = columnText.length()
}
height++
}
builder."node"("id": table.elementName) {
builder."data"("key": "d0") {
builder."y:ShapeNode"()
}
}
def foo() {
def string = """
a multiline quote using triple quotes with ${var} gstring vars in.
"""
}
Do single and triple quote replacements separately.
Single quotes:
Use a look ahead for an even number of quotes after your hit. A negative look behind stops it matching the inner quotes of triple quoted strings.
Find: (?<!")"([^"$]*)"(?=(?:(?:[^"\r\n]*"){2})*[^"]*$)
Replace: '$1'
See live demo.
Triple quotes:
Use a simpler match for triple quoted strings, since they are on their own lines.
Find: """([^"$]*?)"""
Replace: '''$1'''
See live demo, which includes a triple-quoted string that contains a variable.
You need to make sure the first quote comes after even number of quotes:
^[^\n\r"]*(?:(?:"[^"\n\r]*){2})*"([^$\"\n\r]+)"
Here you can play with it.
Explanation:
^[^"\n\r]* - some non-quotes at the beginning
"[^"\n\r]* - a quote, then some more non-quotes
(?:"[^"\n\r]*){2} - let's have two of this
(?:(?:...)) - actually, let's have 0, 2, 4, 6, ... whatever amount of this
Then your regex comes to match the right string: "([^$\"\n\r]+)"
If intellij supports that, then you can make it faster by replacing the non-capturing groups (?:...) with atomic groups (?>...).
This regex finds the last string in the line so you'll have to run the replace several times.
Update
Updated the negated character classes with the newline characters. Now it works well for multi-line texts too. Still, you'll have to run it several times because it finds only one string per line.

perl Regex replace for specific string length

I am using Perl to do some prototyping.
I need an expression to replace e by [ee] if the string is exactly 2 chars and finishes by "e".
le -> l [ee]
me -> m [ee]
elle -> elle : no change
I cannot test the length of the string, I need one expression to do the whole job.
I tried:
`s/(?=^.{0,2}\z).*e\z%/[ee]/g` but this is replacing the whole string
`s/^[c|d|j|l|m|n|s|t]e$/[ee]/g` same result (I listed the possible letters that could precede my "e")
`^(?<=[c|d|j|l|m|n|s|t])e$/[ee]/g` but I have no match, not sure I can use ^ on a positive look behind
EDIT
Guys you're amazing, hours of search on the web and here I get answers minutes after I posted.
I tried all your solutions and they are working perfectly directly in my script, i.e. this one:
my $test2="le";
$test2=~ s/^(\S)e$/\1\[ee\]/g;
print "test2:".$test2."\n";
-> test2:l[ee]
But I am loading these regex from a text file (using Perl for proto, the idea is to reuse it with any language implementing regex):
In the text file I store for example (I used % to split the line between match and replace):
^(\S)e$% \1\[ee\]
and then I parse and apply all regex like that:
my $test="le";
while (my $row = <$fh>) {
chomp $row;
if( $row =~ /%/){
my #reg = split /%/, $row;
#if no replacement, put empty string
if($#reg == 0){
push(#reg,"");
}
print "reg found, reg:".$reg[0].", replace:".$reg[1]."\n";
push #regs, [ #reg ];
}
}
print "orgine:".$test."\n";
for my $i (0 .. $#regs){
my $p=$regs[$i][0];
my $r=$regs[$i][1];
$test=~ s/$p/$r/g;
}
print "final:".$test."\n";
This technique is working well with my other regex, but not yet when I have a $1 or \1 in the replace... here is what I am obtaining:
final:\1\ee\
PS: you answered to initial question, should I open another post ?
Something like s/(?i)^([a-z])e$/$1[ee]/
Why aren't you using a capture group to do the replacement?
`s/^([c|d|j|l|m|n|s|t])e$/\1 [ee]/g`
If those are the characters you need and if it is indeed one word to a line with no whitespace before it or after it, then this will work.
Here's another option depending on what you are looking for. It will match a two character string consisting of one a-z character followed by one 'e' on its own line with possible whitespace before or after. It will replace this will the single a-z character followed by ' [ee]'
`s/^\s*([a-z])e\s*$/\1 [ee]/`
^(\S)e$
Try this.Replace by $1 [ee].See demo.
https://regex101.com/r/hR7tH4/28
I'd do something like this
$word =~ s/^(\w{1})(e)$/$1$2e/;
You can use following regex which match 2 character and then you can replace it with $1\[$2$2\]:
^([a-zA-Z])([a-zA-Z])$
Demo :
$my_string =~ s/^([a-zA-Z])([a-zA-Z])$/$1[$2$2]/;
See demo https://regex101.com/r/iD9oN4/1

Regular Expressions: querystring parameters matching

I'm trying to learn something about regular expressions.
Here is what I'm going to match:
/parent/child
/parent/child?
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/
/parent/child/?
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789
My expression should "grabs" abc123 and def456.
And now just an example about what I'm not going to match ("question mark" is missing):
/parent/child/firstparam=abc123&secondparam=def456
Well, I built the following expression:
^(?:/parent/child){1}(?:^(?:/\?|\?)+(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?)?
But that doesn't work.
Could you help me to understand what I'm doing wrong?
Thanks in advance.
UPDATE 1
Ok, I made other tests.
I'm trying to fix the previous version with something like this:
/parent/child(?:(?:\?|/\?)+(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?)?$
Let me explain my idea:
Must start with /parent/child:
/parent/child
Following group is optional
(?: ... )?
The previous optional group must starts with ? or /?
(?:\?|/\?)+
Optional parameters (I grab values if specified parameters are part of querystring)
(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*)?
End of line
$
Any advice?
UPDATE 2
My solution must be based just on regular expressions.
Just for example, I previously wrote the following one:
/parent/child(?:[?&/]*(?:firstparam=([^&]*)|secondparam=([^&]*)|[^&]*))*$
And that works pretty nice.
But it matches the following input too:
/parent/child/firstparam=abc123&secondparam=def456
How could I modify the expression in order to not match the previous string?
You didn't specify a language so I'll just usre Perl. So basically instead of matching everything, I just matched exactly what I thought you needed. Correct me if I am wrong please.
while ($subject =~ m/(?<==)\w+?(?=&|\W|$)/g) {
# matched text = $&
}
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
= # Match the character “=” literally
)
\\w # Match a single character that is a “word character” (letters, digits, and underscores)
+? # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
& # Match the character “&” literally
| # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
\\W # Match a single character that is a “non-word character”
| # Or match regular expression number 3 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
Output:
This regex will work as long as you know what your parameter names are going to be and you're sure that they won't change.
\/parent\/child\/?\?(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam)\=([\w]+)&?)?
Whilst regex is not the best solution for this (the above code examples will be far more efficient, as string functions are way faster than regexes) this will work if you need a regex solution with up to 3 parameters. Out of interest, why must the solution use only regex?
In any case, this regex will match the following strings:
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789
It will now only match those containing query string parameters, and put them into capture groups for you.
What language are you using to process your matches?
If you are using preg_match with PHP, you can get the whole match as well as capture groups in an array with
preg_match($regex, $string, $matches);
Then you can access the whole match with $matches[0] and the rest with $matches[1], $matches[2], etc.
If you want to add additional parameters you'll also need to add them in the regex too, and add additional parts to get your data. For example, if you had
/parent/child/?secondparam=def456&firstparam=abc123&fourthparam=jkl01112&thirdparam=ghi789
The regex will become
\/parent\/child\/?\?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?(?:(?:firstparam|secondparam|thirdparam|fourthparam)\=([\w]+)&?)?
This will become a bit more tedious to maintain as you add more parameters, though.
You can optionally include ^ $ at the start and end if the multi-line flag is enabled. If you also need to match the whole lines without query strings, wrap this whole regex in a non-capture group (including ^ $) and add
|(?:^\/parent\/child\/?\??$)
to the end.
You're not escaping the /s in your regex for starters and using {1} for a single repetition of something is unnecessary; you only use those when you want more than one repetition or a range of repetitions.
And part of what you're trying to do is simply not a good use of a regex. I'll show you an easier way to deal with that: you want to use something like split and put the information into a hash that you can check the contents of later. Because you didn't specify a language, I'm just going to use Perl for my example, but every language I know with regexes also has easy access to hashes and something like split, so this should be easy enough to port:
# I picked an example to show how this works.
my $route = '/parent/child/?first=123&second=345&third=678';
my %params; # I'm going to put those URL parameters in this hash.
# Perl has a way to let me avoid escaping the /s, but I wanted an example that
# works in other languages too.
if ($route =~ m/\/parent\/child\/\?(.*)/) { # Use the regex for this part
print "Matched route.\n";
# But NOT for this part.
my $query = $1; # $1 is a Perl thing. It contains what (.*) matched above.
my #items = split '&', $query; # Each item is something like param=123
foreach my $item (#items) {
my ($param, $value) = split '=', $item;
$params{$param} = $value; # Put the parameters in a hash for easy access.
print "$param set to $value \n";
}
}
# Now you can check the parameter values and do whatever you need to with them.
# And you can add new parameters whenever you want, etc.
if ($params{'first'} eq '123') {
# Do whatever
}
My solution:
/(?:\w+/)*(?:(?:\w+)?\?(?:\w+=\w+(?:&\w+=\w+)*)?|\w+|)
Explain:
/(?:\w+/)* match /parent/child/ or /parent/
(?:\w+)?\?(?:\w+=\w+(?:&\w+=\w+)*)? match child?firstparam=abc123 or ?firstparam=abc123 or ?
\w+ match text like child
..|) match nothing(empty)
If you need only query string, pattern would reduce such as:
/(?:\w+/)*(?:\w+)?\?(\w+=\w+(?:&\w+=\w+)*)
If you want to get every parameter from query string, this is a Ruby sample:
re = /\/(?:\w+\/)*(?:\w+)?\?(\w+=\w+(?:&\w+=\w+)*)/
s = '/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789'
if m = s.match(re)
query_str = m[1] # now, you can 100% trust this string
query_str.scan(/(\w+)=(\w+)/) do |param,value| #grab parameter
printf("%s, %s\n", param, value)
end
end
output
secondparam, def456
firstparam, abc123
thirdparam, ghi789
This script will help you.
First, i check, is there any symbol like ?.
Then, i kill first part of line (left from ?).
Next, i split line by &, where each value splitted by =.
my $r = q"/parent/child
/parent/child?
/parent/child?firstparam=abc123
/parent/child?secondparam=def456
/parent/child?firstparam=abc123&secondparam=def456
/parent/child?secondparam=def456&firstparam=abc123
/parent/child?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child?thirdparam=ghi789
/parent/child/
/parent/child/?
/parent/child/?firstparam=abc123
/parent/child/?secondparam=def456
/parent/child/?firstparam=abc123&secondparam=def456
/parent/child/?secondparam=def456&firstparam=abc123
/parent/child/?thirdparam=ghi789&secondparam=def456&firstparam=abc123
/parent/child/?secondparam=def456&firstparam=abc123&thirdparam=ghi789
/parent/child/?thirdparam=ghi789";
for my $string(split /\n/, $r){
if (index($string,'?')!=-1){
substr($string, 0, index($string,'?')+1,"");
#say "string = ".$string;
if (index($string,'=')!=-1){
my #params = map{$_ = [split /=/, $_];}split/\&/, $string;
$"="\n";
say "$_->[0] === $_->[1]" for (#params);
say "######next########";
}
else{
#print "there is no params!"
}
}
else{
#say "there is no params!";
}
}

Regular expression help in Perl

I have following text pattern
(2222) First Last (ab-cd/ABC1), <first.last#site.domain.com> 1224: efadsfadsfdsf
(3333) First Last (abcd/ABC12), <first.last#site.domain.com> 1234, 4657: efadsfadsfdsf
I want the number 1224 or 1234, 4657 from the above text after the text >.
I have this
\((\d+)\)\s\w*\s\w*\s\(\w*\/\w+\d*\),\s<\w*\.\w*\#\w*\.domain.com>\s\d+:
which will take the text before : But i want the one after email till :
Is there any easy regular expression to do this? or should I use split and do this
Thanks
Edit: The whole text is returned by a command line tool.
(3333) First Last (abcd/ABC12), <first.last#site.domain.com> 1234, 4657: efadsfadsfdsf
(3333) - Unique ID
First Last - First and last names
<first.last#site.domain.com> - Email address in format FirstName.LastName#sub.domain.com
1234, 4567 - database primary Keys
: xxxx - Headline
What I have to do is process the above and get hte database ID (in ex: 1234, 4567 2 separate ID's) and query the tables
The above is the output (like this I will get many entries) from the tool which I am calling via my Perl script.
My idea was to use a regular expression to get the database id's. Guess I could use regular expression for this
you can fudge the stuff you don't care about to make the expression easier, say just 'glob' the parts between the parentheticals (and the email delimiters) using non-greedy quantifiers:
/(\d+)\).*?\(.*?\),\s*<.*?>\s*(\d+(?:,\s*\d+)*):/ (not tested!)
there's only two captured groups, the (1234), and the (1234, 4657), the second one which I can only assume from your pattern to mean: "a digit string, followed by zero or more comma separated digit strings".
Well, a simple fix is to just allow all the possible characters in a character class. Which is to say change \d to [\d, ] to allow digits, commas and space.
Your regex as it is, though, does not match the first sample line, because it has a dash - in it (ab-cd/ABC1 does not match \w*\/\w+\d*\). Also, it is not a good idea to rely too heavily on the * quantifier, because it does match the empty string (it matches zero or more times), and should only be used for things which are truly optional. Use + otherwise, which matches (1 or more times).
You have a rather strict regex, and with slight variations in your data like this, it will fail. Only you know what your data looks like, and if you actually do need a strict regex. However, if your data is somewhat consistent, you can use a loose regex simply based on the email part:
sub extract_nums {
my $string = shift;
if ($string =~ /<[^>]*> *([\d, ]+):/) {
return $1 =~ /\d+/g; # return the extracted digits in a list
# return $1; # just return the string as-is
} else { return undef }
}
This assumes, of course, that you cannot have <> tags in front of the email part of the line. It will capture any digits, commas and spaces found between a <> tag and a colon, and then return a list of any digits found in the match. You can also just return the string, as shown in the commented line.
There would appear to be something missing from your examples. Is this what they're supposed to look like, with email?
(1234) First Last (ab-cd/ABC1), <foo.bar#domain.com> 1224: efadsfadsfdsf
(1234) First Last (abcd/ABC12), <foo.bar#domain.com> 1234, 4657: efadsfadsfdsf
If so, this should work:
\((\d+)\)\s\w*\s\w*\s\(\w*\/\w+\d*\),\s<\w*\.\w*\#\w*\.domain\.com>\s\d+(?:,\s(\d+))?:
$string =~ /.*>\s*(.+):.+/;
$numbers = $1;
That's it.
Tested.
With number catching:
$string =~ /.*>\s*(?([0-9]|,)+):.+/;
$numbers = $1;
Not tested but you get the idea.

Replace patterns that are inside delimiters using a regular expression call

I need to clip out all the occurances of the pattern '--' that are inside single quotes in long string (leaving intact the ones that are outside single quotes).
Is there a RegEx way of doing this?
(using it with an iterator from the language is OK).
For example, starting with
"xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
I should end up with:
"xxxx rt / $ 'dfdffggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g 'ggh' vcbcvb"
So I am looking for a regex that could be run from the following languages as shown:
+-------------+------------------------------------------+
| Language | RegEx |
+-------------+------------------------------------------+
| JavaScript | input.replace(/someregex/g, "") |
| PHP | preg_replace('/someregex/', "", input) |
| Python | re.sub(r'someregex', "", input) |
| Ruby | input.gsub(/someregex/, "") |
+-------------+------------------------------------------+
I found another way to do this from an answer by Greg Hewgill at Qn138522
It is based on using this regex (adapted to contain the pattern I was looking for):
--(?=[^\']*'([^']|'[^']*')*$)
Greg explains:
"What this does is use the non-capturing match (?=...) to check that the character x is within a quoted string. It looks for some nonquote characters up to the next quote, then looks for a sequence of either single characters or quoted groups of characters, until the end of the string. This relies on your assumption that the quotes are always balanced. This is also not very efficient."
The usage examples would be :
JavaScript: input.replace(/--(?=[^']*'([^']|'[^']*')*$)/g, "")
PHP: preg_replace('/--(?=[^\']*'([^']|'[^']*')*$)/', "", input)
Python: re.sub(r'--(?=[^\']*'([^']|'[^']*')*$)', "", input)
Ruby: input.gsub(/--(?=[^\']*'([^']|'[^']*')*$)/, "")
I have tested this for Ruby and it provides the desired result.
This cannot be done with regular expressions, because you need to maintain state on whether you're inside single quotes or outside, and regex is inherently stateless. (Also, as far as I understand, single quotes can be escaped without terminating the "inside" region).
Your best bet is to iterate through the string character by character, keeping a boolean flag on whether or not you're inside a quoted region - and remove the --'s that way.
If bending the rules a little is allowed, this could work:
import re
p = re.compile(r"((?:^[^']*')?[^']*?(?:'[^']*'[^']*?)*?)(-{2,})")
txt = "xxxx rt / $ 'dfdf--fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '--ggh--' vcbcvb"
print re.sub(p, r'\1-', txt)
Output:
xxxx rt / $ 'dfdf-fggh-dfgdfg' ghgh- dddd -- 'dfdf' ghh-g '-ggh-' vcbcvb
The regex:
( # Group 1
(?:^[^']*')? # Start of string, up till the first single quote
[^']*? # Inside the single quotes, as few characters as possible
(?:
'[^']*' # No double dashes inside theses single quotes, jump to the next.
[^']*?
)*? # as few as possible
)
(-{2,}) # The dashes themselves (Group 2)
If there where different delimiters for start and end, you could use something like this:
-{2,}(?=[^'`]*`)
Edit: I realized that if the string does not contain any quotes, it will match all double dashes in the string. One way of fixing it would be to change
(?:^[^']*')?
in the beginning to
(?:^[^']*'|(?!^))
Updated regex:
((?:^[^']*'|(?!^))[^']*?(?:'[^']*'[^']*?)*?)(-{2,})
Hm. There might be a way in Python if there are no quoted apostrophes, given that there is the (?(id/name)yes-pattern|no-pattern) construct in regular expressions, but it goes way over my head currently.
Does this help?
def remove_double_dashes_in_apostrophes(text):
return "'".join(
part.replace("--", "") if (ix&1) else part
for ix, part in enumerate(text.split("'")))
Seems to work for me. What it does, is split the input text to parts on apostrophes, and replace the "--" only when the part is odd-numbered (i.e. there has been an odd number of apostrophes before the part). Note about "odd numbered": part numbering starts from zero!
You can use the following sed script, I believe:
:again
s/'\(.*\)--\(.*\)'/'\1\2'/g
t again
Store that in a file (rmdashdash.sed) and do whatever exec magic in your scripting language allows you to do the following shell equivalent:
sed -f rmdotdot.sed < file containing your input data
What the script does is:
:again <-- just a label
s/'\(.*\)--\(.*\)'/'\1\2'/g
substitute, for the pattern ' followed by anything followed by -- followed by anything followed by ', just the two anythings within quotes.
t again <-- feed the resulting string back into sed again.
Note that this script will convert '----' into '', since it is a sequence of two --'s within quotes. However, '---' will be converted into '-'.
Ain't no school like old school.