Regex for string with double quotes and environment variable - regex

Using sed, I need to replace a string that contains double quotes with an environment variable:
BUCKET_FOLDER=\"dev\"
(or any derivative of 'dev') needs to convert to:
BUCKET_FOLDER=bucket1/$ID
where $ID = abcde, ie
BUCKET_FOLDER=bucket1/abcde
To expand the $ID environment variable, I need to put double quotes around the sed substitution expression:
sed -e "s/BUCKET_FOLDER=\\"(.*?)\\"/BUCKET_FOLDER=bucket1\/$ID/g" $string
but this is then preventing a match on the double quotes in the source string.
Would appreciate any advice. I can make it work with 2 steps, but would prefer 1.

ID=abcde
echo 'A=\"x\" BUCKET_FOLDER=\"dev\" B=\"y\"' |sed -r "s|(.*)(BUCKET_FOLDER=)([^ ]+)(.*)|\1\2bucket1/$ID\4|g"
A=\"x\" BUCKET_FOLDER=bucket1/abcde B=\"y\"
Using | as seprator in sed. As you mentioned, used double quotes to expand $ID and captured BUCKET_FOLDER= as first group.

I believe you escaped the quote correctly in the sed command. On the other hand, the way you specified the rest of the regex isn't how it's supposed to be. Here is my take on the story:
echo BUCKET_FOLDER='"'dev'"' | \
sed -e "s/BUCKET_FOLDER=\"\(.*\?\)\"/BUCKET_FOLDER=bucket1\/$ID/g"
Explanation:
I made the assumption that the \" parts in your $string variable are just escape sequences. Thus I used the echo BUCKET_FOLDER='"'dev'"' command. It's output is BUCKET_FOLDER="dev".
The non-greedy qualifier looks like this: .*\?. I.e. you need to escape the question mark.
You need to escape the parentheses too: \(...\). This should work without the group too, because you don't use backreferences like \1.
Alternatives:
If you want to eliminate the capturing group, then the sed expression becomes this:
sed -e "s/BUCKET_FOLDER=\".*\?\"/BUCKET_FOLDER=bucket1\/$ID/g"
If the backslash is really part of your string, you can match them with the character class [\] :
echo BUCKET_FOLDER=\\'"'dev\\'"' | \
sed -e "s/BUCKET_FOLDER=[\]\".*\?[\]\"/BUCKET_FOLDER=bucket1\/$ID/g

Related

Bash script to enclose words in single quotes

I'm trying to write a bash script to enclose words contained in a file with single quotes.
Word - Hello089
Result - 'Hello089',
I tried the following regex but it doesn't work. This works in Notepad++ with find and replace. I'm not sure how to tweak it to make it work in bash scripting.
sed "s/(.+)/'$1',/g" file.txt > result.txt
Replacement backreferences (also called placeholders) are defined with \n syntax, not $n (this is perl-like backreference syntax).
Note you do not need groups here, though, since you want to wrap the whole lines with single quotation marks. This is also why you do not need the g flags, they are only necessary when you plan to find multiple matches on the same line, input string.
You can use the following POSIX BRE and ERE (the one with -E) solutions:
sed "s/..*/'&',/" file.txt > result.txt
sed -E "s/.+/'&',/" file.txt > result.txt
In the POSIX BRE (first) solution, ..* matches any char and then any 0 or more chars (thus emulating .+ common PCRE pattern). The POSIX ERE (second) solution uses .+ pattern to do the same. The & in the right-hand side is used to insert the whole match (aka \0). Certainly, you may enclose the whole match with capturing parentheses and then use \1, but that is redundant:
sed "s/\(..*\)/'\1',/" file.txt > result.txt
sed -E "s/(.+)/'\1',/" file.txt > result.txt
See the escaping, capturing parentheses in POSIX BRE must be escaped.
See the online sed demo.
s="Hello089";
sed "s/..*/'&',/" <<< "$s"
# => 'Hello089',
sed -E "s/.+/'&',/" <<< "$s"
# => 'Hello089',
$1 is expanded by the shell before sed sees it, but it's the wrong back reference anyway. You need \1. You also need to escape the parentheses that define the capture group. Because the sed script is enclosed in double quotes, you'll need to escape all the backslashes.
$ echo "Hello089" | sed "s/\\(.*\\)/'\1',/g"
'Hello089',
(I don't recall if there is a way to specify single quotes using an ASCII code instead of a literal ', which would allow you to use single quotes around the script.)

sed match dollar and single quote characters

I have the following string in my file:
"sequence A$_{0}$B$_{}$C$_{'0}$"
I want to move any single quotes that appear after a $_{ to go before it, i.e.
"sequence A$_{0}$B$_{}$C'$_{0}$"
This is my sed command (using # as a delimiter) for just the part with the quote:
$ echo "$_{'0}$" | sed "s#$_{'#'\$_{#g"
'$_{0}$
So this works. However my text contains strings that shouldn't be matched, e.g.
$ echo "$_{0}$" | sed "s#$_{'#'\$_{#g"
/home/ppatest/texlive/2010/texmf{0}$`
I understand that $_ gives the last argument of previous command. I checked:
$ echo $_
/home/ppatest/texlive/2010/texmf
But I don't understand why $_{' matches "$_{0}$"
Furthermore, I found that to prevent the Unix shell from interpreting the dollar sign as a shell variable, the script should be put in single quotes. But I can't do that as I am also matching on single quotes.
Your current approach uses double quotes in sed to be able to handle the single quotes. However, as you can see, this produces the expansion of $, so that you can end up having broader problems.
What I recommend is to use a sed expression with single quotes. To match and replace single quotes, you need to close the leading ', the enclose the ' within " and then open the expression again:
$ echo "he'llo" | sed 's#'"'"'#X#'
heXllo
In your case:
sed 's#$_{'"'"'#'"'"'$_{#g' file
This way, you keep using single quotes and prevent the expansion of $.
Test
$ cat a
hello $_{'0}$ bye
$_{'0}$
yeah
$ sed 's#$_{'"'"'#'"'"'$_{#g' a
hello '$_{0}$ bye
'$_{0}$
yeah
echo "\$_{'0}\$" | sed "s#\(\$_{\)'#'\1#g"
escape the $ when using double quote
use group avoiding several confusing \$ when possible
use double quote when simple quote are part of the pattern

How do I correctly escape this search string for Perl pie

I use this classic perl one liner to replace strings in multiple files recursively
perl -pi -e 's/oldstring/newstring/g' `grep -irl oldstring *`
But this has failed me as I want to find the string:
'$user->primaryorganisation->id'
and replace with
$user->primaryorganisation->id
I can't seem to escape the string correctly for the line to run successfully.
Any help gratefully received!
Try this one. Lots of escapes. Go with TLPs suggestion and use a source file.
perl -pi -e "s/'\\\$user->primaryorganisation->id'/\\\$user->primaryorganisation->id/g" `grep -irl "'\$user->primaryorganisation->id'" *`
Explanation:
three backslashes: the first two tell the shell to produce a literal backslash; the thrid one escapes the $ for the shell; that makes \$ for Perl, which needs the backslash to escape the variable interpolation
double quotes " to put single quotes ' inside them
one backslash and a dollar \$ for grep so the shell passes on a literal dollar sign
When you want to represent a single quote in a perl but can't because the one-liner uses single quotes itself, you can use \047, the octal code for single quote. So, this should work:
s/\047(\$user->primaryorganisation->id)\047/$1/g
I recommend Minimal Perl by Maher for more-than-you-wanted-to-know about the art of one-lining perl.
To produce
...'...
you can generically use
'...'\''...'
As such,
s/'(\$user->primaryorganisation->id)'/$1/g
becomes
's/'\''(\$user->primaryorganisation->id)'\''/$1/g'
so
find -type f \
-exec perl -i -pe's/'\''(\$user->primaryorganisation->id)'\''/$1/g' {} +

What do I need to quote in sed command lines?

There are many questions on this site on how to escape various elements for sed, but I'm looking for a more general answer. I understand that I might want to escape some characters to avoid shell expansion:
Bash:
Single quoted [strings] ('') are used to preserve the literal value of each character enclosed within the quotes. [However,] a single quote may not occur between single quotes, even when preceded by a backslash.
The backslash retains its meaning [in double quoted strings] only when followed by dollar, backtick, double quote, backslash or newline. Within double quotes, the backslashes are removed from the input stream when followed by one of these characters. Backslashes preceding characters that don't have a special meaning are left unmodified for processing by the shell interpreter.
sh: (I hope you don't have history expansion)
Single quoted string behaviour: same as bash
Enclosing characters in double quotes preserves the literal value of
all characters within the quotes, with the exception of dollar, single quote, backslash, and,
when history expansion is enabled, exclamation mark.
The characters dollar and single quote retain their special meaning within double quotes.
The backslash retains its special meaning only when followed by one of the following characters: $, ', ", \, or newline. A double quote may be quoted within double
quotes by preceding it with a backslash.
If enabled, history expansion will be performed unless an exclamation mark appearing in double quotes is escaped using a backslash. The backslash preceding the ! is not removed.
...but none of that explains why this stops working as soon as you remove any escaping:
sed -e "s#\(\w\+\) #\1\/#g" #find a sequence of characters in a line
# why? ↑ ↑ ↑ ↑ #replace the following space with a slash.
None of (, ), / or + (or [, or ]...) seem to have any special meaning that requires them to be escaped in order to work. Hell, even calling the command directly through Python makes sed not work properly, although the manpage doesn't seem to spell out anything about this (not when I search for backslash, anyway.)
$ lvdisplay -C --noheadings -o vg_name,name > test
$ python
>>> import os
>>> #Python requires backslash escaping of \1, even in triple quotes
>>> #lest \1 is read to mean "byte with value 0x01".
>>> output = os.execl("/bin/sed", "-e", "s#(\w+) #\\1/#g", "test")
(Output remains unchanged)
$ python
>>> import os
>>> output = os.execl("/bin/sed", "-e", "s#\(\w\+\) #\\1\/#g", "test")
(Correct output)
$ WHAT THE HELL
Have you tried using jQuery? It's perfect and it does all the things.
If I understood you right, your problem is not about bash/sh, it is about the regex flavour sed uses by default: BRE.
The other [= anything but dot, star, caret and dollar] BRE metacharacters require a backslash to give them their special meaning. The reason is that the oldest versions of UNIX grep did not support these.
Grouping (..) should be escaped to give it special meaning. same as + otherwise sed will try to match them as they are literal strings/chars. That's why your s#\(\w\+\) #...# should be escaped. The replacement part doesn't need escaping, so:
sed 's#\(\w\+\) #\1 /#'
should work.
sed has usually option to use extended regular expressions (now with ?, +, |, (), {m,n}); e.g. GNU sed has -r, then your one-liner could be:
sed -r 's#(\w+) #\1 /#'
I paste some examples here that may help you understand what's going on:
kent$ echo "abcd "|sed 's#\(\w\+\) #\1 /#'
abcd /
kent$ echo "abcd "|sed -r 's#(\w+) #\1 /#'
abcd /
kent$ echo "(abcd+) "|sed 's#(\w*+) #&/#'
(abcd+) /
What you're observing is correct. Certain characters like ?, +, (, ), {, } need to be escaped when using basic regular expressions.
Quoting from the sed manual:
The only difference between basic and extended regular expressions is
in the behavior of a few characters: ‘?’, ‘+’, parentheses, and braces
(‘{}’). While basic regular expressions require these to be escaped if
you want them to behave as special characters, when using extended
regular expressions you must escape them if you want them to match a
literal character.
(Emphasis mine.) These don't need to be escaped, though, when using extended regexps, except when matching a literal character (as mentioned in the last line quoted above.)
If you want a general answer,
Shell metacharacters need to be quoted or escaped from the shell;
Regex metacharacters need to be escaped if you want a literal interpretation;
Some regex constructs are formed by a backslash escape; depending on context, these backslashes may need quoting.
So you have the following scenarios;
# Match a literal question mark
echo '?' | grep \?
# or equivalently
echo '?' | grep "?"
# or equivalently
echo '?' | grep '?'
# Match a literal asterisk
echo '*' | grep \\\*
# or equivalently
echo '*' | grep "\\*"
# or equivalently
echo '*' | grep '\*'
# Match a backreference: any character repeated twice
echo 'aa' | grep \\\(.\\\)\\1
# or equivalently
echo 'aa' | grep "\(.\)\\1"
# or equivalently
echo 'aa' | grep '\(.\)\1'
As you can see, single quotes probably make the most sense most of the time.
If you are embedding into a language which requires backslash quoting of its own, you have to add yet another set of backslashes, or avoid invoking a shell.
As others have pointed out, extended regular expressions obey a slightly different syntax, but the general pattern is the same. Bottom line, to minimize interference from the shell, use single quotes whenever you can.
For literal characters, you can avoid some backslashitis by using a character class instead.
echo '*' | grep \[\*\]
# or equivalently
echo '*' | grep "[*]"
# or equivalently
echo '*' | grep '[*]'
FreeBSD sed, which is also used on Mac OS X, uses -E instead of -r for extended regular expressions.
Therefore, to have it portable, use basic regular expressions. + in extended-regular-expression mode, for example, would have to be replaced with \{1,\} in basic-regular-expression mode.
In basic- as well as extended-regular-expression mode, FreeBSD sed does not seem to recognize \w which has to be replaced with [[:alnum:]_] (cf. man re_format).
# using FreeBSD sed (on Mac OS X)
# output: Hello, world!
echo 'hello world' | sed -e 's/h/H/' -e 's/ \{1,\}/, /g' -e 's/\([[:alnum:]_]\{1,\}\)$/\1!/'
echo 'hello world' | sed -E -e 's/h/H/' -e 's/ +/, /g' -e 's/([[:alnum:]_]+)$/\1!/'
echo 'hello world' | sed -E -e 's/h/H/' -e 's/ +/, /g' -e 's/(\w+)$/\1!/' # does not work
# find a sequence of characters in a line
# replace the following space with a slash
# output: abcd+/abcd+/
echo 'abcd+ abcd+ ' > test
python
import os
output = os.execl('/usr/bin/sed', '-e', 's#\([[:alnum:]_+]\{1,\}\) #\\1/#g', 'test')
To use a single quote as part of a sed regular expression while keeping your outer single quotes for the sed regular expression, you can concatenate three separate strings each enclosed in single quotes to avoid possible shell expansion.
# man bash:
# "A single quote may not occur between single quotes, even when preceded by a backslash."
# cf. http://stackoverflow.com/a/9114512 & http://unix.stackexchange.com/a/82757
# concatenate: 's/doesn' + \' + 't/does not/'
echo "sed doesn't work for me" | sed -e 's/doesn'\''t/does not/'

Sed subexpressions not working as expected

I am trying to make a simple wikitext parser using sed/bash. When I run
echo "London has [[public transport]]" | sed s/\\[\\[[A-Za-z0-9\ ]*\\]\\]/link/
it gives me London has link
but when I try to use marked subexpressions to get the contents of the brackets using
sed s/\\[\\[\([A-Za-z0-9\ ]*\)\\]\\]/\1/
it just gives me London has [[public transport]]
That's because the regex doesn't match.
Since you're not surrounding your sed expression in quotes, you have to double-escape slashes for the shell - that's why you have \\[ instead of \[.
Now in sed default regex (basic regular expressions), capturing brackets are denoted by \( and \) in regex. Since you're typing this into the shell without surrounding with quote marks, you need to escape the backslash. And since bash interprets brackets, you have to escape them too:
echo "London has [[public transport]]" | sed s/\\[\\[\\\([A-Za-z0-9\ ]*\\\)\\]\\]/\\1/
I strongly recommend you just enclose your sed expression in single quotes for ease of writing:
echo "London has [[public transport]]" | sed 's/\[\[\([A-Za-z0-9\ ]*\)\]\]/\1/'
Much easier right?
echo "London has [[public transport]]" | sed 's#[[][[]\([A-Za-z0-9\ ]*\)[]][]]#\1#'
output
London has public transport
works on my machine.
I hope this helps.