Replace a double backslash followed by quote (\\') using sed? - regex

I am unable to replace double backslash followed by quote \\' in sed. This is my current command
echo file.txt | sed "s:\\\\':\\':g"
The above command not only replaces \\' with \' it also replaces \' with '
How could I just replace exact match?
Input:
'one', 'two \\'change', 'three \'unchanged'
Expected:
'one', 'two \'change', 'three \'unchanged'
Actual:
'one', 'two \'change', 'three 'unchanged'

$ sed "s/\\\\\\\'/\\\'/g" file
'one', 'two \'change', 'three \'unchanged'
Here is a discussion on why sed needs 3 backslashes for one

You can also use:
sed "s/\\\\\'/\\\'/g"

Related

sed command to replace " with ' only in lines that doesn't match a pattern

My text looks like this:
'test' file:
123,James,123,"
hello "X"
this is a "string", cool.
another "string", here.
",7
My goal:
in all lines that doesn't match the pattern [^number,string,number] or [^",number]
replace
" with '.
meaning, the out put should like:
123,James,123,"
hello 'X'
this is a 'string', cool.
another 'string', here.
",7
my sed command so far is:
sed '/\(^[0-9]*,.*,[0-9]*\|^",[0-9]\)/!s/"/'/g' test
my problem is in the substitue part, i'm trying to escape but it doesn't let me, and i can't seems to find a solution to it. if i try to switch ' with # for example, it works.
I tried : ...!s/"/\'/g' but it doesn't work.
Would love your help! thanks!
[SOLVED]:
for those who also have this problem, i switched the open paranthese to " and escaped also the !
the solution:
sed "/\(^[0-9]*,.*,[0-9]*\|^\",[0-9]\)/\!s/\"/'/g"
With your shown samples please try following awk code. Written and tested in GNU awk should work in any awk version.
awk '!/^"/ && !/^[0-9]+.*[a-zA-Z]+[0-9]+/{gsub(/"/,"\047")} 1' Input_file
Using sed
$ sed -E '/^(",)?[0-9]+($|,[[:alpha:]]+,[0-9])/!s/"/'"'"'/g' input_file
123,James,123,"
hello 'X'
this is a 'string', cool.
another 'string', here.
",7
You can use sed -E so that you don't have to escape the parenthesis.
Note that:
if you use [0-9]* you will match optional digits, [0-9]+ matches 1 or more digits.
For the string part you use .* but that matches the whole line
For example
sed -E "/(^[0-9]+,.*,[0-9]|^\",[0-9])/!s/\"/'/g" test
Using awk and if for example in the first check there can be only 2 comma's:
awk '!/^(",[0-9]|[0-9]+,[^,]*,[0-9])/{gsub(/"/,"\047")}1' test
Both will output:
123,James,123,"
hello 'X'
this is a 'string', cool.
another 'string', here.
",7
This might work for you (GNU sed):
sed -E '/^("|[0-9]+,[^,]+),[0-9]+/!y/"/'\''/' file
If the line does not match the required strings then translate all "'s to ''s.
N.B. The idiom '\'' punches a hole into the shell and \' then quotes the single quote.

Regex: select each occurrence of a character up until another character

I have a couple of lines in a document which looks something like that:
foo-bar-foo[Foo - Bar]
I'd like to select every - character up until the first [ bracket on every line. Thus the - in the square brackets shouldn't be selected.
How can I achieve that with a Regex?
I already have this regex /.+?(?=\[)/g, which selects every character until the first [ but I only want the -.
Edit: I want to replace these selected characters with the sed command (GNU).
You can use
sed -E ':a; s/^([^[-]+)-/\1/; ta'
See an online demo:
#!/bin/bash
s='foo-bar-foo[Foo - Bar]'
sed -E ':a; s/^([^[-]+)-/\1/; ta' <<< "$s"
# => foobarfoo[Foo - Bar]
Details:
-E - enabling POSIX ERE syntax (so that there is no need to escape capturing parentheses and the + quantifier)
:a - an a label
s/^([^[-]+)-/\1/ - finds one or more chars other than [ and - from the start of string capturing this substring into Group 1 (\1) and then matches a - char
ta - jumps to a label upon a successful replacement

How do I replace part of a string using regex

I am trying to clean up a mongo db dump. I want to replace all the '\"' in strings where an alphanumeric character is followed by '\"' with spaces. This is what I have so far
sed -e 's/[a-zA-Z0-9]\\"/ /g' a.txt
The problem is, this sed replaces not just '\"', but the one character immediately preceding it. So 'mystring\"' becomes 'mystrin '. I wanted the output 'mystring '
You may use a capturing group in the regex pattern and the \1 placeholder in the replacement part that would restore the alphanumeric char:
sed -e 's/\([a-zA-Z0-9]\)\\"/\1 /g' a.txt
^^ ^^ ^^
You may replace [a-zA-Z0-9] with [[:alnum:]] to make the regex a bit more idiomatic ([:alnum:] matches any alphanumeric chars).
Online sed demo:
s='mystring\"'
sed -e 's/\([a-zA-Z0-9]\)\\"/\1 /g' <<< "$s"
# => mystring
sed -e 's/\([[:alnum:]]\)\\"/\1 /g' <<< "$s"
# => mystring

Regex & Sed: How to suppress the first and the second comma in a string containing exactly 9 commas?

I would like to suppress the two first commas in a string containing 10 and only 10 commas (11 Fields). I don't want to erase the commas of the 9 commas line.
I tried this:
sed '/^\([^,]*,\)\{10\}[^,]*$/s/,//1;s/,//2'
But it deletes commas even in the sentences containing less than 10 commas and it deletes the first and the third commas.
Example:
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGIË,06346641,0636641,NL
Leonarfdsdy Dandfiel, Ingendfdfdfieur - Leon.ing,rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
Result expected:
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGIË,06346641,0636641,NL
Leonarfdsdy Dandfiel Ingendfdfdfieur - Leon.ing rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
You may use
sed -E 's/^([^,]*),([^,]*),([^,]*)((,[^,]*){7})$/\1\2\3\4/'
Details
^ - start of a line
([^,]*) - Group 1 (\1): any 0+ chars other than ,
,([^,]*) - , and Group 2 (\2) matching any 0+ chars other than ,
,([^,]*) - , and Group 3 (\3) matching any 0+ chars other than ,
((,[^,]*){7}) - seven occurrences of , followed with any 0+ chars other than ,
$ - end of string.
See the online sed demo:
s="Leonarfdsdy Dandfiel, Ingendfdfdfieur - Leon.inrombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR"
sed -E 's/^([^,]*),([^,]*),([^,]*)((,[^,]*){7})$/\1\2\3\4/' <<< "$s"
# => Leonarfdsdy Dandfiel Ingendfdfdfieur - Leon.inrombach Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
I guess you're using MacOS sed / BSD sed, try this:
sed -e '/^\([^,]*,\)\{10\}[^,]*$/s/,//; tLB' -e 'b' -e ':LB' -e 's/,/ /'
I used --posix to emulate, but not sure it will work on your OS:
$ cat file
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGI?,06346641,0636641,NL
Leonarfdsdy Dandfiel, Ingendfdfdfieur - Leon.ing,rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
$ sed --posix -e '/^\([^,]*,\)\{10\}[^,]*$/s/,//; tLB' -e 'b' -e ':LB' -e 's/,/ /' file
DE, LAEIES,Vlzgstraat, 16,2260,NIJLEN,BELGI?,06346641,0636641,NL
Leonarfdsdy Dandfiel Ingendfdfdfieur - Leon.ing rombach, Hinderusen, 485,47580,SANKT VITH,BELGIQUE,0442345,2058560,FR
Note that the second s command, I changed to replace to a space, since Leon.ing,rombah no space inside, simpy strip the , will become Leon.ingrombach.
This might work too:
sed -e '/^\([^,]*,\)\{10\}[^,]*$/{' -e 's/,/ /' -e 's/,/ /}'
Btw, I think it's high time for you to start using GNU sed:
brew install gnu-sed
ln -s /usr/local/bin/gsed /usr/local/bin/sed
This problem is also easier to use awk instead:
awk -F, 'NF==11{sub(",","");sub(","," ")}1' file
Replace only when there're 11 comma separated fields.
This might work for you (GNU sed):
sed 's/,/&/9;T;s//&/10;t;s///;s///' file
If there are not at least 9 ,'s leave line as is. If there are 10 or more ,'s leave line as is. Otherwise remove the first 2 ,'s.
An alternative:
sed -r 's/^([^,]*),([^,]*),(([^,]*,){7}[^,]*)$/\1\2\3/' file

Regex to find string without curly braces but "\{", "\}" is allowed

I have a regex to find string without curly braces "([^\{\}]+)". So that it can extract "cde" from follwing string:
"ab{cde}f"
Now I need to escape "{" with "\{" and "}" with "\}".
So if my original string is "ab{cd\{e\}}f" then I need to extract "cd{e}" or "cd\{e\}" (I can remove "\" later).
Thanks in advance.
This should work:
([^{}\\]|\\{|\\})+
To allow escapes inside your braces you can use:
{((?:[^\\{}]+|\\.)*)}
Perl example:
my $str = "ab{cd\\{e\\}} also foo{ad\\}ok\\{a\\{d}";
print "$str\n";
print join ', ', $str =~ /{((?:[^\\{}]+|\\.)*)}/g;
Output:
ab{cd\{e\}} also foo{ad\}ok\{a\{d}
cd\{e\}, ad\}ok\{a\{d
Note that any regex special characters are effectively escaped by putting them inside a range (i.e. square brackets). So:
[.] matches a literal period.
[[] matches a left square bracket.
[a] matches the letter a.
[{] matches a left curly brace.
So:
$ echo "ab{cde}f" | sed -r 's/[^{]*[{](.+)}.*/\1/'
cde
$ echo "ab{c\{d\}e}f" | sed -r 's/[^{]*[{](.+)}.*/\1/'
c\{d\}e
Or:
$ echo "ab{cde}f" | sed 's/[^{]*{//;s/}[^}]*$//'
cde
$ echo "ab{c\{d\}e}f" | sed 's/[^{]*{//;s/}[^}]*$//'
c\{d\}e
Or even:
$ php -r '$s="ab{cde}f"; print preg_replace("/[^{]*[{](.+)}.*", "$1", $s) . "\n";'
cde
$ php -r '$s="ab{c\{d\}e}f"; print preg_replace("/[^{]*[{](.+)}.*/", "$1", $s) . "\n";'
c\{d\}e
Obviously, this does not handle escaped backslashes. :-)
\{(.+)\} would extract everything between the first and last curly bracket