How does escaping a literal dot work for sed - regex

The following sed command
echo '.' | sed "s/\\./foo/"
substitutes . with foo, as expected. However, if we escape the non-alphanumeric . in the above command
echo '.' | sed "s/\\\./foo/"
prints barely ., whereas foo is expected. sed should match the character . literally, but it doesn't. I cannot understand what is happening with the dot. I believe that I should simply put a backslash in front of every non-alphanumeric character in bash, if a string is double-quoted. A dot is a non-alphanumeric character so what is wrong about escaping it and why does it produce a different result?

This is because how backslashes work in bash double quote " escaping.
echo "\\"
\
echo "\."
\.
echo "s/\\./foo/"
s/\./foo/
echo "s/\\\./foo/"
s/\\./foo/
From man bash:
Within double quotes, the backslash retains its special meaning only when
followed by one of the following characters: $, `, ", \,or <newline>.
So in the first case, sed gets s/\./foo/ and interprets it as "replace a dot with foo". In the second case sed gets s/\\./foo/ and interprets it as "replace a backslash and one other character with foo.
You better use single quote escaping in this case:
echo 's/./foo/'
s/./foo/
echo 's/\./foo/'
s/\./foo/
which is probably what you wanted.

In your second one, the . is still a literal dot, but the regular expression only replaces the two-character sequence \. (not . by itself) with foo:
$ echo '=\.=' | sed "s/\\\./foo/"
=foo=

Related

Plus quantifier not working as expected using regex for substitution in sed

The input is #PermitRootLogin no. Why doesn't the following sed expression work with sed?
echo "#PermitRootLogin no" | sed 's/^#PermitRootLogin\s+.*/PermitRootLogin yes/'
but after I remove the + after the keyword it works?
echo "#PermitRootLogin no" | sed 's/^#PermitRootLogin\s.*/PermitRootLogin yes/'
I thought the + after a \s would mean one or more of the previous token.
sed gist
PS: Works either way with regex101.com
You have to escape the + sign:
In GNU sed, with basic regular expression syntax these characters ?, +, parentheses, braces ({}), and | do not have special meaning unless prefixed with a backslash \.
The plus sign + in your case means match a literal +, so it would match the plus in #PermitRootLogin +no. You have to escape it in \s\+ to be able to match one or more whitespace character #PermitRootLogin no
echo "#PermitRootLogin no" | sed 's/^#PermitRootLogin\s\+.*/PermitRootLogin yes/'
Output:
PermitRootLogin yes

sed not terminating at the end of this line (issue with escaping?)

The SQL applications that I'm using isn't properly escaping all of the strings that I have, so I'm trying to use sed to replace these instances. The issue is I'll have this:
`some string of characters that may include hyphens'
and the quote at the end won't get escaped (yes that's supposed to be a ` not a quote).
My plan was to use this:
sed 's/[^\\]\'[^,]/&\\\'&/g' testfile.txt
Logic: anything that isn't a backslash followed by a quote, then anything that isn't a comma will be replaced by the same text with with a backslash and a quote.
I would like for testfile.txt to have all instances of ' replaced with \', but I just keep getting > as if it isn't done the line
I try this using gnu sed,
$ cat d
already escaped quote \' won't be escaped
$ sed -E "s/([^\\]|^)'([^,]|$)/\1\\\'\2/" d
already escaped quote \' won\'t be escaped
What you're looking for are called lookaround assertions, where you match any ' not preceded by a \ or followed by an end of line. Unfortunately, sed doesn't support these. But you can use Perl:
perl -pe 's/(?<!\\)'\''(?!$)/\\'\''/g' testfile.txt
In unescaped form, this would look like s/(?<!\\)'(?!$)/\\'/g but we have to make allowances for the shell. No escapes are recognized in single quoted strings, so your original problem was \' not being recognized, and the string terminating early.
See here for example and detailed regexp breakdown: https://regex101.com/r/k8sonu/1

find | sed - regex matching apostrophe instead of full stop in search phrase

Got this really odd thing happening with a search and replace script I am trying to implement. The following command...
find './files' -type f \( -iname "*.js" \) -exec sed -i '' s/\$stateProvider.state\(\'app\./\$stateProvider.state\(\'app.myap\./ {} +
Is matching $stateProvider.state('app'
When it should only be matching $stateProvider.state('app.' <-- you can see that when the full stop is missing it should not match (since it is included in the matching pattern).
The confusion arises from the fact that you have not quoted the sed code. As a result, all the escapes are interpreted by the the shell. So the \. that you have included, is interpreted by the the shell to be a plain . and that's what sed sees. This is why sed matches it with any character. You can see what sed sees by typing the following in your shell:
$ echo s/\$stateProvider.state\(\'app\./\$stateProvider.state\(\'app.myap\./
s/$stateProvider.state('app./$stateProvider.state('app.myap./
or just try this:
$ echo \.
.
So you need to escape the escape character, i.e.
$ echo \\.
\.
Edit: To find the complete command, you have to think backwards. The command that we want to send to sed is the following:
s/\$stateProvider\.state('app\./$stateProvider.state('app.myap./
Notice that I have escaped the characters $ and . because they have special meaning in sed when used in the pattern. Now I have to escape the above string again, but bash (or whatever shell) this time:
s/\\\$stateProvider\\.state\(\'app\\./\$stateProvider.state\(\'app.myap\./
Notice that I have escaped the characters \, $, (, and ' because they have special meaning in bash.
So the complete command would be
sed -i '' s/\\\$stateProvider\\.state\(\'app\\./\$stateProvider.state\(\'app.myap\./
Alternatively, for the last step, I could have simply used quoting:
's/\$stateProvider\.state('\''app\./$stateProvider.state('\''app.myap./'
Notice that I only had to take special care for ', which has to be written as '\'' inside single quotes.

What do I need to quote in sed command lines?

There are many questions on this site on how to escape various elements for sed, but I'm looking for a more general answer. I understand that I might want to escape some characters to avoid shell expansion:
Bash:
Single quoted [strings] ('') are used to preserve the literal value of each character enclosed within the quotes. [However,] a single quote may not occur between single quotes, even when preceded by a backslash.
The backslash retains its meaning [in double quoted strings] only when followed by dollar, backtick, double quote, backslash or newline. Within double quotes, the backslashes are removed from the input stream when followed by one of these characters. Backslashes preceding characters that don't have a special meaning are left unmodified for processing by the shell interpreter.
sh: (I hope you don't have history expansion)
Single quoted string behaviour: same as bash
Enclosing characters in double quotes preserves the literal value of
all characters within the quotes, with the exception of dollar, single quote, backslash, and,
when history expansion is enabled, exclamation mark.
The characters dollar and single quote retain their special meaning within double quotes.
The backslash retains its special meaning only when followed by one of the following characters: $, ', ", \, or newline. A double quote may be quoted within double
quotes by preceding it with a backslash.
If enabled, history expansion will be performed unless an exclamation mark appearing in double quotes is escaped using a backslash. The backslash preceding the ! is not removed.
...but none of that explains why this stops working as soon as you remove any escaping:
sed -e "s#\(\w\+\) #\1\/#g" #find a sequence of characters in a line
# why? ↑ ↑ ↑ ↑ #replace the following space with a slash.
None of (, ), / or + (or [, or ]...) seem to have any special meaning that requires them to be escaped in order to work. Hell, even calling the command directly through Python makes sed not work properly, although the manpage doesn't seem to spell out anything about this (not when I search for backslash, anyway.)
$ lvdisplay -C --noheadings -o vg_name,name > test
$ python
>>> import os
>>> #Python requires backslash escaping of \1, even in triple quotes
>>> #lest \1 is read to mean "byte with value 0x01".
>>> output = os.execl("/bin/sed", "-e", "s#(\w+) #\\1/#g", "test")
(Output remains unchanged)
$ python
>>> import os
>>> output = os.execl("/bin/sed", "-e", "s#\(\w\+\) #\\1\/#g", "test")
(Correct output)
$ WHAT THE HELL
Have you tried using jQuery? It's perfect and it does all the things.
If I understood you right, your problem is not about bash/sh, it is about the regex flavour sed uses by default: BRE.
The other [= anything but dot, star, caret and dollar] BRE metacharacters require a backslash to give them their special meaning. The reason is that the oldest versions of UNIX grep did not support these.
Grouping (..) should be escaped to give it special meaning. same as + otherwise sed will try to match them as they are literal strings/chars. That's why your s#\(\w\+\) #...# should be escaped. The replacement part doesn't need escaping, so:
sed 's#\(\w\+\) #\1 /#'
should work.
sed has usually option to use extended regular expressions (now with ?, +, |, (), {m,n}); e.g. GNU sed has -r, then your one-liner could be:
sed -r 's#(\w+) #\1 /#'
I paste some examples here that may help you understand what's going on:
kent$ echo "abcd "|sed 's#\(\w\+\) #\1 /#'
abcd /
kent$ echo "abcd "|sed -r 's#(\w+) #\1 /#'
abcd /
kent$ echo "(abcd+) "|sed 's#(\w*+) #&/#'
(abcd+) /
What you're observing is correct. Certain characters like ?, +, (, ), {, } need to be escaped when using basic regular expressions.
Quoting from the sed manual:
The only difference between basic and extended regular expressions is
in the behavior of a few characters: ‘?’, ‘+’, parentheses, and braces
(‘{}’). While basic regular expressions require these to be escaped if
you want them to behave as special characters, when using extended
regular expressions you must escape them if you want them to match a
literal character.
(Emphasis mine.) These don't need to be escaped, though, when using extended regexps, except when matching a literal character (as mentioned in the last line quoted above.)
If you want a general answer,
Shell metacharacters need to be quoted or escaped from the shell;
Regex metacharacters need to be escaped if you want a literal interpretation;
Some regex constructs are formed by a backslash escape; depending on context, these backslashes may need quoting.
So you have the following scenarios;
# Match a literal question mark
echo '?' | grep \?
# or equivalently
echo '?' | grep "?"
# or equivalently
echo '?' | grep '?'
# Match a literal asterisk
echo '*' | grep \\\*
# or equivalently
echo '*' | grep "\\*"
# or equivalently
echo '*' | grep '\*'
# Match a backreference: any character repeated twice
echo 'aa' | grep \\\(.\\\)\\1
# or equivalently
echo 'aa' | grep "\(.\)\\1"
# or equivalently
echo 'aa' | grep '\(.\)\1'
As you can see, single quotes probably make the most sense most of the time.
If you are embedding into a language which requires backslash quoting of its own, you have to add yet another set of backslashes, or avoid invoking a shell.
As others have pointed out, extended regular expressions obey a slightly different syntax, but the general pattern is the same. Bottom line, to minimize interference from the shell, use single quotes whenever you can.
For literal characters, you can avoid some backslashitis by using a character class instead.
echo '*' | grep \[\*\]
# or equivalently
echo '*' | grep "[*]"
# or equivalently
echo '*' | grep '[*]'
FreeBSD sed, which is also used on Mac OS X, uses -E instead of -r for extended regular expressions.
Therefore, to have it portable, use basic regular expressions. + in extended-regular-expression mode, for example, would have to be replaced with \{1,\} in basic-regular-expression mode.
In basic- as well as extended-regular-expression mode, FreeBSD sed does not seem to recognize \w which has to be replaced with [[:alnum:]_] (cf. man re_format).
# using FreeBSD sed (on Mac OS X)
# output: Hello, world!
echo 'hello world' | sed -e 's/h/H/' -e 's/ \{1,\}/, /g' -e 's/\([[:alnum:]_]\{1,\}\)$/\1!/'
echo 'hello world' | sed -E -e 's/h/H/' -e 's/ +/, /g' -e 's/([[:alnum:]_]+)$/\1!/'
echo 'hello world' | sed -E -e 's/h/H/' -e 's/ +/, /g' -e 's/(\w+)$/\1!/' # does not work
# find a sequence of characters in a line
# replace the following space with a slash
# output: abcd+/abcd+/
echo 'abcd+ abcd+ ' > test
python
import os
output = os.execl('/usr/bin/sed', '-e', 's#\([[:alnum:]_+]\{1,\}\) #\\1/#g', 'test')
To use a single quote as part of a sed regular expression while keeping your outer single quotes for the sed regular expression, you can concatenate three separate strings each enclosed in single quotes to avoid possible shell expansion.
# man bash:
# "A single quote may not occur between single quotes, even when preceded by a backslash."
# cf. http://stackoverflow.com/a/9114512 & http://unix.stackexchange.com/a/82757
# concatenate: 's/doesn' + \' + 't/does not/'
echo "sed doesn't work for me" | sed -e 's/doesn'\''t/does not/'

using sed to replace ^[(s3B with blank space

I'm trying to use sed with perl to replace ^[(s3B with an empty string in several files.
s/^[(s3B// isn't working though, so I'm wondering what else I could try.
You need to quote the special characters:
$ echo "^[(s3B AAA ^[(s3B"|sed 's/\^\[[(]s3B//g'
AAA
$ echo "^[(s3B AAA ^[(s3B" >file.txt
$ perl -p -i -e 's/\^\[[(]s3B//g' file.txt
$ cat file.txt
AAA
The problem is that there are several characters that have a special meaning in regular expressions. ^ is a start-of-line anchor, [ opens a character class, and ( opens a capture.
You can escape all non-alphanumerics in a Perl string by preceding it with \Q, so you can safely use
s/\Q^[(s3B//
which is equivalent to, and more readable than
s/\^\[\(s3B//
If you're dealing with ANSI sequences (xterm color sequences, escape sequences), then ^[ is not '^' followed by '[' but rather an unprintable character ESC, ASCII code 0x1B.
To put that character into a sed expression you need to use \x1B in GNU sed, or see http://www.cyberciti.biz/faq/unix-linux-sed-ascii-control-codes-nonprintable/ . You can also insert special characters directly into your command line using ctrl+v in Bash line editing.
In regex "^", "[" and "(" (and many others) are special characters used for special regex features, if you are referencing the characters themselves you should preceed them with "\".
The correct substitution reges would be:
$string =~ s/\^\[\(3B//g
if you want to replace all occurences.