sed substitution with user-specified replacement string - regex

The general form of the substitution command in sed is:
s/regexp/replacement/flags
where the '/' characters may be uniformly replaced by any other single character. But how do you choose this separator character when the replacement string is being fed in by an environment variable and might contain any printable character? Is there a straightforward way to escape the separator character in the variable using bash?
The values are coming from trusted administrators so security is not my main concern. (In other words, please don't answer with: "Never do this!") Nevertheless, I can't predict what characters will need to appear in the replacement string.

You can use control character as regex delimiters also like this:
s^Aregexp^Areplacement^Ag
Where ^A is CTRLva pressed together.
Or else use awk and don't worry about delimiters:
awk -v s="search" -v r="replacement" '{gsub(s, r)} 1' file

Here isn't (easy) solution for the following using the sed.
while read -r string from to wanted
do
echo "in [$string] want replace [$from] to [$to] wanted result: [$wanted]"
final=$(echo "$string" | sed "s/$from/$to/")
[[ "$final" == "$wanted" ]] && echo OK || echo WRONG
echo
done <<EOF
=xxx= xxx === =====
=abc= abc /// =///=
=///= /// abc =abc=
EOF
what prints
in [=xxx=] want replace [xxx] to [===] wanted result: [=====]
OK
in [=abc=] want replace [abc] to [///] wanted result: [=///=]
sed: 1: "s/abc/////": bad flag in substitute command: '/'
WRONG
in [=///=] want replace [///] to [abc] wanted result: [=abc=]
sed: 1: "s/////abc/": bad flag in substitute command: '/'
WRONG
Can't resists: Never do this! (with sed). :)
Is there a straightforward way to escape the separator character in
the variable using bash?
No, because you passing the strings from variables, you can't easily escape the separator character, because in "s/$from/$to/" the separator can appear not only in the $to part but in the $from part too. E.g. when you escape the separator it in the $from part it will not do the replacement at all, because will not find the $from.
Solution: use something other as sed
1.) Using pure bash. In the above script instead of the sed use the
final=${string//$from/$to}
2.) If the bash's substitutions are not enough, use something to what you can pass the $from and $to as variables.
as #anubhava already said, can use: awk -v f="$from" -v t="$to" '{gsub(f, t)} 1' file
or you can use perl and passing values as environment variables
final=$(echo "$string" | perl_from="$from" perl_to="$to" perl -pe 's/$ENV{perl_from}/$ENV{perl_to}/')
or passing the variables to perl via the command line arguments
final=$(echo "$string" | perl -spe 's/$f/$t/' -- -f="$from" -t="$to")

2 options:
1) take a char not in the string (need a pre process on content check and possible char without warranty that a char is available)
# Quick and dirty sample using `'/_##|!%=:;,-` arbitrary sequence
Separator="$( printf "%sa%s%s" '/_##|!%=:;,-' "${regexp}" "${replacement}" \
| sed -n ':cycle
s/\(.\)\(.*a.*\1.*\)\1/\1\2/g;t cycle
s/\(.\)\(.*a.*\)\1/\2/g;t cycle
s/^\(.\).*a.*/\1/p
' )"
echo "Separator: [ ${Separator} ]"
sed "s${Separator}${regexp}${Separator}${replacement}${Separator}flag" YourFile
2) escape the wanted char in the string patterns (need a pre process to escape char).
# Quick and dirty sample using # arbitrary with few escape security check
regexpEsc="$( printf "%s" "${regexp}" | sed 's/#/\\#/g' )"
replacementEsc"$( printf "%s" "${replacement}" | sed 's/#/\\#/g' )"
sed 's#regexpEsc#replacementEsc#flags' YourFile

From man sed
\cregexpc
Match lines matching the regular expression regexp. The c may be any
character.
When working with paths i often use # as separator:
sed s\#find/path#replace/path#
No need to escape / with ugly \/.

Related

Is there a better way to escape slashes in a string in POSIX sh [duplicate]

In my bash script I have an external (received from user) string, which I should use in sed pattern.
REPLACE="<funny characters here>"
sed "s/KEYWORD/$REPLACE/g"
How can I escape the $REPLACE string so it would be safely accepted by sed as a literal replacement?
NOTE: The KEYWORD is a dumb substring with no matches etc. It is not supplied by user.
Warning: This does not consider newlines. For a more in-depth answer, see this SO-question instead. (Thanks, Ed Morton & Niklas Peter)
Note that escaping everything is a bad idea. Sed needs many characters to be escaped to get their special meaning. For example, if you escape a digit in the replacement string, it will turn in to a backreference.
As Ben Blank said, there are only three characters that need to be escaped in the replacement string (escapes themselves, forward slash for end of statement and & for replace all):
ESCAPED_REPLACE=$(printf '%s\n' "$REPLACE" | sed -e 's/[\/&]/\\&/g')
# Now you can use ESCAPED_REPLACE in the original sed statement
sed "s/KEYWORD/$ESCAPED_REPLACE/g"
If you ever need to escape the KEYWORD string, the following is the one you need:
sed -e 's/[]\/$*.^[]/\\&/g'
And can be used by:
KEYWORD="The Keyword You Need";
ESCAPED_KEYWORD=$(printf '%s\n' "$KEYWORD" | sed -e 's/[]\/$*.^[]/\\&/g');
# Now you can use it inside the original sed statement to replace text
sed "s/$ESCAPED_KEYWORD/$ESCAPED_REPLACE/g"
Remember, if you use a character other than / as delimiter, you need replace the slash in the expressions above wih the character you are using. See PeterJCLaw's comment for explanation.
Edited: Due to some corner cases previously not accounted for, the commands above have changed several times. Check the edit history for details.
The sed command allows you to use other characters instead of / as separator:
sed 's#"http://www\.fubar\.com"#URL_FUBAR#g'
The double quotes are not a problem.
The only three literal characters which are treated specially in the replace clause are / (to close the clause), \ (to escape characters, backreference, &c.), and & (to include the match in the replacement). Therefore, all you need to do is escape those three characters:
sed "s/KEYWORD/$(echo $REPLACE | sed -e 's/\\/\\\\/g; s/\//\\\//g; s/&/\\\&/g')/g"
Example:
$ export REPLACE="'\"|\\/><&!"
$ echo fooKEYWORDbar | sed "s/KEYWORD/$(echo $REPLACE | sed -e 's/\\/\\\\/g; s/\//\\\//g; s/&/\\\&/g')/g"
foo'"|\/><&!bar
Based on Pianosaurus's regular expressions, I made a bash function that escapes both keyword and replacement.
function sedeasy {
sed -i "s/$(echo $1 | sed -e 's/\([[\/.*]\|\]\)/\\&/g')/$(echo $2 | sed -e 's/[\/&]/\\&/g')/g" $3
}
Here's how you use it:
sedeasy "include /etc/nginx/conf.d/*" "include /apps/*/conf/nginx.conf" /etc/nginx/nginx.conf
It's a bit late to respond... but there IS a much simpler way to do this. Just change the delimiter (i.e., the character that separates fields). So, instead of s/foo/bar/ you write s|bar|foo.
And, here's the easy way to do this:
sed 's|/\*!50017 DEFINER=`snafu`#`localhost`\*/||g'
The resulting output is devoid of that nasty DEFINER clause.
It turns out you're asking the wrong question. I also asked the wrong question. The reason it's wrong is the beginning of the first sentence: "In my bash script...".
I had the same question & made the same mistake. If you're using bash, you don't need to use sed to do string replacements (and it's much cleaner to use the replace feature built into bash).
Instead of something like, for example:
function escape-all-funny-characters() { UNKNOWN_CODE_THAT_ANSWERS_THE_QUESTION_YOU_ASKED; }
INPUT='some long string with KEYWORD that need replacing KEYWORD.'
A="$(escape-all-funny-characters 'KEYWORD')"
B="$(escape-all-funny-characters '<funny characters here>')"
OUTPUT="$(sed "s/$A/$B/g" <<<"$INPUT")"
you can use bash features exclusively:
INPUT='some long string with KEYWORD that need replacing KEYWORD.'
A='KEYWORD'
B='<funny characters here>'
OUTPUT="${INPUT//"$A"/"$B"}"
Use awk - it is cleaner:
$ awk -v R='//addr:\\file' '{ sub("THIS", R, $0); print $0 }' <<< "http://file:\_THIS_/path/to/a/file\\is\\\a\\ nightmare"
http://file:\_//addr:\file_/path/to/a/file\\is\\\a\\ nightmare
Here is an example of an AWK I used a while ago. It is an AWK that prints new AWKS. AWK and SED being similar it may be a good template.
ls | awk '{ print "awk " "'"'"'" " {print $1,$2,$3} " "'"'"'" " " $1 ".old_ext > " $1 ".new_ext" }' > for_the_birds
It looks excessive, but somehow that combination of quotes works to keep the ' printed as literals. Then if I remember correctly the vaiables are just surrounded with quotes like this: "$1". Try it, let me know how it works with SED.
These are the escape codes that I've found:
* = \x2a
( = \x28
) = \x29
" = \x22
/ = \x2f
\ = \x5c
' = \x27
? = \x3f
% = \x25
^ = \x5e
sed is typically a mess, especially the difference between gnu-sed and bsd-sed
might just be easier to place some sort of sentinel at the sed side, then a quick pipe over to awk, which is far more flexible in accepting any ERE regex, escaped hex, or escaped octals.
e.g. OFS in awk is the true replacement ::
date | sed -E 's/[0-9]+/\xC1\xC0/g' |
mawk NF=NF FS='\xC1\xC0' OFS='\360\237\244\241'
1 Tue Aug 🤡 🤡:🤡:🤡 EDT 🤡
(tested and confirmed working on both BSD-sed and GNU-sed - the emoji isn't a typo that's what those 4 bytes map to in UTF-8 )
There are dozens of answers out there... If you don't mind using a bash function schema, below is a good answer. The objective below was to allow using sed with practically any parameter as a KEYWORD (F_PS_TARGET) or as a REPLACE (F_PS_REPLACE). We tested it in many scenarios and it seems to be pretty safe. The implementation below supports tabs, line breaks and sigle quotes for both KEYWORD and replace REPLACE.
NOTES: The idea here is to use sed to escape entries for another sed command.
CODE
F_REVERSE_STRING_R=""
f_reverse_string() {
: 'Do a string reverse.
To undo just use a reversed string as STRING_INPUT.
Args:
STRING_INPUT (str): String input.
Returns:
F_REVERSE_STRING_R (str): The modified string.
'
local STRING_INPUT=$1
F_REVERSE_STRING_R=$(echo "x${STRING_INPUT}x" | tac | rev)
F_REVERSE_STRING_R=${F_REVERSE_STRING_R%?}
F_REVERSE_STRING_R=${F_REVERSE_STRING_R#?}
}
# [Ref(s).: https://stackoverflow.com/a/2705678/3223785 ]
F_POWER_SED_ECP_R=""
f_power_sed_ecp() {
: 'Escape strings for the "sed" command.
Escaped characters will be processed as is (e.g. /n, /t ...).
Args:
F_PSE_VAL_TO_ECP (str): Value to be escaped.
F_PSE_ECP_TYPE (int): 0 - For the TARGET value; 1 - For the REPLACE value.
Returns:
F_POWER_SED_ECP_R (str): Escaped value.
'
local F_PSE_VAL_TO_ECP=$1
local F_PSE_ECP_TYPE=$2
# NOTE: Operational characters of "sed" will be escaped, as well as single quotes.
# By Questor
if [ ${F_PSE_ECP_TYPE} -eq 0 ] ; then
# NOTE: For the TARGET value. By Questor
F_POWER_SED_ECP_R=$(echo "x${F_PSE_VAL_TO_ECP}x" | sed 's/[]\/$*.^[]/\\&/g' | sed "s/'/\\\x27/g" | sed ':a;N;$!ba;s/\n/\\n/g')
else
# NOTE: For the REPLACE value. By Questor
F_POWER_SED_ECP_R=$(echo "x${F_PSE_VAL_TO_ECP}x" | sed 's/[\/&]/\\&/g' | sed "s/'/\\\x27/g" | sed ':a;N;$!ba;s/\n/\\n/g')
fi
F_POWER_SED_ECP_R=${F_POWER_SED_ECP_R%?}
F_POWER_SED_ECP_R=${F_POWER_SED_ECP_R#?}
}
# [Ref(s).: https://stackoverflow.com/a/24134488/3223785 ,
# https://stackoverflow.com/a/21740695/3223785 ,
# https://unix.stackexchange.com/a/655558/61742 ,
# https://stackoverflow.com/a/11461628/3223785 ,
# https://stackoverflow.com/a/45151986/3223785 ,
# https://linuxaria.com/pills/tac-and-rev-to-see-files-in-reverse-order ,
# https://unix.stackexchange.com/a/631355/61742 ]
F_POWER_SED_R=""
f_power_sed() {
: 'Facilitate the use of the "sed" command. Replaces in files and strings.
Args:
F_PS_TARGET (str): Value to be replaced by the value of F_PS_REPLACE.
F_PS_REPLACE (str): Value that will replace F_PS_TARGET.
F_PS_FILE (Optional[str]): File in which the replacement will be made.
F_PS_SOURCE (Optional[str]): String to be manipulated in case "F_PS_FILE" was
not informed.
F_PS_NTH_OCCUR (Optional[int]): [1~n] - Replace the nth match; [n~-1] - Replace
the last nth match; 0 - Replace every match; Default 1.
Returns:
F_POWER_SED_R (str): Return the result if "F_PS_FILE" is not informed.
'
local F_PS_TARGET=$1
local F_PS_REPLACE=$2
local F_PS_FILE=$3
local F_PS_SOURCE=$4
local F_PS_NTH_OCCUR=$5
if [ -z "$F_PS_NTH_OCCUR" ] ; then
F_PS_NTH_OCCUR=1
fi
local F_PS_REVERSE_MODE=0
if [ ${F_PS_NTH_OCCUR} -lt -1 ] ; then
F_PS_REVERSE_MODE=1
f_reverse_string "$F_PS_TARGET"
F_PS_TARGET="$F_REVERSE_STRING_R"
f_reverse_string "$F_PS_REPLACE"
F_PS_REPLACE="$F_REVERSE_STRING_R"
f_reverse_string "$F_PS_SOURCE"
F_PS_SOURCE="$F_REVERSE_STRING_R"
F_PS_NTH_OCCUR=$((-F_PS_NTH_OCCUR))
fi
f_power_sed_ecp "$F_PS_TARGET" 0
F_PS_TARGET=$F_POWER_SED_ECP_R
f_power_sed_ecp "$F_PS_REPLACE" 1
F_PS_REPLACE=$F_POWER_SED_ECP_R
local F_PS_SED_RPL=""
if [ ${F_PS_NTH_OCCUR} -eq -1 ] ; then
# NOTE: We kept this option because it performs better when we only need to replace
# the last occurrence. By Questor
# [Ref(s).: https://linuxhint.com/use-sed-replace-last-occurrence/ ,
# https://unix.stackexchange.com/a/713866/61742 ]
F_PS_SED_RPL="'s/\(.*\)$F_PS_TARGET/\1$F_PS_REPLACE/'"
elif [ ${F_PS_NTH_OCCUR} -gt 0 ] ; then
# [Ref(s).: https://unix.stackexchange.com/a/587924/61742 ]
F_PS_SED_RPL="'s/$F_PS_TARGET/$F_PS_REPLACE/$F_PS_NTH_OCCUR'"
elif [ ${F_PS_NTH_OCCUR} -eq 0 ] ; then
F_PS_SED_RPL="'s/$F_PS_TARGET/$F_PS_REPLACE/g'"
fi
# NOTE: As the "sed" commands below always process literal values for the "F_PS_TARGET"
# so we use the "-z" flag in case it has multiple lines. By Quaestor
# [Ref(s).: https://unix.stackexchange.com/a/525524/61742 ]
if [ -z "$F_PS_FILE" ] ; then
F_POWER_SED_R=$(echo "x${F_PS_SOURCE}x" | eval "sed -z $F_PS_SED_RPL")
F_POWER_SED_R=${F_POWER_SED_R%?}
F_POWER_SED_R=${F_POWER_SED_R#?}
if [ ${F_PS_REVERSE_MODE} -eq 1 ] ; then
f_reverse_string "$F_POWER_SED_R"
F_POWER_SED_R="$F_REVERSE_STRING_R"
fi
else
if [ ${F_PS_REVERSE_MODE} -eq 0 ] ; then
eval "sed -i -z $F_PS_SED_RPL \"$F_PS_FILE\""
else
tac "$F_PS_FILE" | rev | eval "sed -z $F_PS_SED_RPL" | tac | rev > "$F_PS_FILE"
fi
fi
}
MODEL
f_power_sed "F_PS_TARGET" "F_PS_REPLACE" "" "F_PS_SOURCE"
echo "$F_POWER_SED_R"
EXAMPLE
f_power_sed "{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate" "[ ]+|$/,\"\0\"" "" "Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate that concatenation of the final \", \" then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,\"\0\"); print; }' <<<\"$string\") on Bash that supports readarray. Note your method is Bash 4.4+ I think because of the -d in readar"
echo "$F_POWER_SED_R"
IF YOU JUST WANT TO ESCAPE THE PARAMETERS TO THE SED COMMAND
MODEL
# "TARGET" value.
f_power_sed_ecp "F_PSE_VAL_TO_ECP" 0
echo "$F_POWER_SED_ECP_R"
# "REPLACE" value.
f_power_sed_ecp "F_PSE_VAL_TO_ECP" 1
echo "$F_POWER_SED_ECP_R"
IMPORTANT: If the strings for KEYWORD and/or replace REPLACE contain tabs or line breaks you will need to use the "-z" flag in your "sed" command. More details here.
EXAMPLE
f_power_sed_ecp "{ gsub(/,[ ]+|$/,\"\0\"); print }' ./ and eliminate" 0
echo "$F_POWER_SED_ECP_R"
f_power_sed_ecp "[ ]+|$/,\"\0\"" 1
echo "$F_POWER_SED_ECP_R"
NOTE: The f_power_sed_ecp and f_power_sed functions above was made available completely free as part of this project ez_i - Create shell script installers easily!.
Standard recommendation here: use perl :)
echo KEYWORD > /tmp/test
REPLACE="<funny characters here>"
perl -pi.bck -e "s/KEYWORD/${REPLACE}/g" /tmp/test
cat /tmp/test
don't forget all the pleasure that occur with the shell limitation around " and '
so (in ksh)
Var=">New version of \"content' here <"
printf "%s" "${Var}" | sed "s/[&\/\\\\*\\"']/\\&/g' | read -r EscVar
echo "Here is your \"text\" to change" | sed "s/text/${EscVar}/g"
If the case happens to be that you are generating a random password to pass to sed replace pattern, then you choose to be careful about which set of characters in the random string. If you choose a password made by encoding a value as base64, then there is is only character that is both possible in base64 and is also a special character in sed replace pattern. That character is "/", and is easily removed from the password you are generating:
# password 32 characters log, minus any copies of the "/" character.
pass=`openssl rand -base64 32 | sed -e 's/\///g'`;
If you are just looking to replace Variable value in sed command then just remove
Example:
sed -i 's/dev-/dev-$ENV/g' test to sed -i s/dev-/dev-$ENV/g test
I have an improvement over the sedeasy function, which WILL break with special characters like tab.
function sedeasy_improved {
sed -i "s/$(
echo "$1" | sed -e 's/\([[\/.*]\|\]\)/\\&/g'
| sed -e 's:\t:\\t:g'
)/$(
echo "$2" | sed -e 's/[\/&]/\\&/g'
| sed -e 's:\t:\\t:g'
)/g" "$3"
}
So, whats different? $1 and $2 wrapped in quotes to avoid shell expansions and preserve tabs or double spaces.
Additional piping | sed -e 's:\t:\\t:g' (I like : as token) which transforms a tab in \t.
An easier way to do this is simply building the string before hand and using it as a parameter for sed
rpstring="s/KEYWORD/$REPLACE/g"
sed -i $rpstring test.txt

Is it possible to escape regex metacharacters reliably with sed

I'm wondering whether it is possible to write a 100% reliable sed command to escape any regex metacharacters in an input string so that it can be used in a subsequent sed command. Like this:
#!/bin/bash
# Trying to replace one regex by another in an input file with sed
search="/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3"
replace="/xyz\n\t[0-9]\+\([^ ]\)\{2,3\}\3"
# Sanitize input
search=$(sed 'script to escape' <<< "$search")
replace=$(sed 'script to escape' <<< "$replace")
# Use it in a sed command
sed "s/$search/$replace/" input
I know that there are better tools to work with fixed strings instead of patterns, for example awk, perl or python. I would just like to prove whether it is possible or not with sed. I would say let's concentrate on basic POSIX regexes to have even more fun! :)
I have tried a lot of things but anytime I could find an input which broke my attempt. I thought keeping it abstract as script to escape would not lead anybody into the wrong direction.
Btw, the discussion came up here. I thought this could be a good place to collect solutions and probably break and/or elaborate them.
Note:
If you're looking for prepackaged functionality based on the techniques discussed in this answer:
bash functions that enable robust escaping even in multi-line substitutions can be found at the bottom of this post (plus a perl solution that uses perl's built-in support for such escaping).
#EdMorton's answer contains a tool (bash script) that robustly performs single-line substitutions.
Ed's answer now has an improved version of the sed command used below, corrected in calestyo's answer, which is needed if you want to escape string literals for potential use with other regex-processing tools, such as awk and perl. In short: for cross-tool use, \ must be escaped as \\ rather than as [\], which means: instead of the
sed 's/[^^]/[&]/g; s/\^/\\^/g' command used below, you must use
sed 's/[^^\]/[&]/g; s/[\^]/\\&/g;'
All snippets below assume bash as the shell (POSIX-compliant reformulations are possible):
SINGLE-line Solutions
Escaping a string literal for use as a regex in sed:
To give credit where credit is due: I found the regex used below in this answer.
Assuming that the search string is a single-line string:
search='abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3' # sample input containing metachars.
searchEscaped=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"$search") # escape it.
sed -n "s/$searchEscaped/foo/p" <<<"$search" # Echoes 'foo'
Every character except ^ is placed in its own character set [...] expression to treat it as a literal.
Note that ^ is the one char. you cannot represent as [^], because it has special meaning in that location (negation).
Then, ^ chars. are escaped as \^.
Note that you cannot just escape every char by putting a \ in front of it because that can turn a literal char into a metachar, e.g. \< and \b are word boundaries in some tools, \n is a newline, \{ is the start of a RE interval like \{1,3\}, etc.
The approach is robust, but not efficient.
The robustness comes from not trying to anticipate all special regex characters - which will vary across regex dialects - but to focus on only 2 features shared by all regex dialects:
the ability to specify literal characters inside a character set.
the ability to escape a literal ^ as \^
Escaping a string literal for use as the replacement string in sed's s/// command:
The replacement string in a sed s/// command is not a regex, but it recognizes placeholders that refer to either the entire string matched by the regex (&) or specific capture-group results by index (\1, \2, ...), so these must be escaped, along with the (customary) regex delimiter, /.
Assuming that the replacement string is a single-line string:
replace='Laurel & Hardy; PS\2' # sample input containing metachars.
replaceEscaped=$(sed 's/[&/\]/\\&/g' <<<"$replace") # escape it
sed -n "s/.*/$replaceEscaped/p" <<<"foo" # Echoes $replace as-is
MULTI-line Solutions
Escaping a MULTI-LINE string literal for use as a regex in sed:
Note: This only makes sense if multiple input lines (possibly ALL) have been read before attempting to match.
Since tools such as sed and awk operate on a single line at a time by default, extra steps are needed to make them read more than one line at a time.
# Define sample multi-line literal.
search='/abc\n\t[a-z]\+\([^ ]\)\{2,3\}\3
/def\n\t[A-Z]\+\([^ ]\)\{3,4\}\4'
# Escape it.
searchEscaped=$(sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$search" | tr -d '\n') #'
# Use in a Sed command that reads ALL input lines up front.
# If ok, echoes 'foo'
sed -n -e ':a' -e '$!{N;ba' -e '}' -e "s/$searchEscaped/foo/p" <<<"$search"
The newlines in multi-line input strings must be translated to '\n' strings, which is how newlines are encoded in a regex.
$!a\'$'\n''\\n' appends string '\n' to every output line but the last (the last newline is ignored, because it was added by <<<)
tr -d '\n then removes all actual newlines from the string (sed adds one whenever it prints its pattern space), effectively replacing all newlines in the input with '\n' strings.
-e ':a' -e '$!{N;ba' -e '}' is the POSIX-compliant form of a sed idiom that reads all input lines a loop, therefore leaving subsequent commands to operate on all input lines at once.
If you're using GNU sed (only), you can use its -z option to simplify reading all input lines at once:
sed -z "s/$searchEscaped/foo/" <<<"$search"
Escaping a MULTI-LINE string literal for use as the replacement string in sed's s/// command:
# Define sample multi-line literal.
replace='Laurel & Hardy; PS\2
Masters\1 & Johnson\2'
# Escape it for use as a Sed replacement string.
IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$replace")
replaceEscaped=${REPLY%$'\n'}
# If ok, outputs $replace as is.
sed -n "s/\(.*\) \(.*\)/$replaceEscaped/p" <<<"foo bar"
Newlines in the input string must be retained as actual newlines, but \-escaped.
-e ':a' -e '$!{N;ba' -e '}' is the POSIX-compliant form of a sed idiom that reads all input lines a loop.
's/[&/\]/\\&/g escapes all &, \ and / instances, as in the single-line solution.
s/\n/\\&/g' then \-prefixes all actual newlines.
IFS= read -d '' -r is used to read the sed command's output as is (to avoid the automatic removal of trailing newlines that a command substitution ($(...)) would perform).
${REPLY%$'\n'} then removes a single trailing newline, which the <<< has implicitly appended to the input.
bash functions based on the above (for sed):
quoteRe() quotes (escapes) for use in a regex
quoteSubst() quotes for use in the substitution string of a s/// call.
both handle multi-line input correctly
Note that because sed reads a single line at at time by default, use of quoteRe() with multi-line strings only makes sense in sed commands that explicitly read multiple (or all) lines at once.
Also, using command substitutions ($(...)) to call the functions won't work for strings that have trailing newlines; in that event, use something like IFS= read -d '' -r escapedValue <(quoteSubst "$value")
# SYNOPSIS
# quoteRe <text>
quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }
# SYNOPSIS
# quoteSubst <text>
quoteSubst() {
IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1")
printf %s "${REPLY%$'\n'}"
}
Example:
from=$'Cost\(*):\n$3.' # sample input containing metachars.
to='You & I'$'\n''eating A\1 sauce.' # sample replacement string with metachars.
# Should print the unmodified value of $to
sed -e ':a' -e '$!{N;ba' -e '}' -e "s/$(quoteRe "$from")/$(quoteSubst "$to")/" <<<"$from"
Note the use of -e ':a' -e '$!{N;ba' -e '}' to read all input at once, so that the multi-line substitution works.
perl solution:
Perl has built-in support for escaping arbitrary strings for literal use in a regex: the quotemeta() function or its equivalent \Q...\E quoting.
The approach is the same for both single- and multi-line strings; for example:
from=$'Cost\(*):\n$3.' # sample input containing metachars.
to='You owe me $1/$& for'$'\n''eating A\1 sauce.' # sample replacement string w/ metachars.
# Should print the unmodified value of $to.
# Note that the replacement value needs NO escaping.
perl -s -0777 -pe 's/\Q$from\E/$to/' -- -from="$from" -to="$to" <<<"$from"
Note the use of -0777 to read all input at once, so that the multi-line substitution works.
The -s option allows placing -<var>=<val>-style Perl variable definitions following -- after the script, before any filename operands.
Building upon #mklement0's answer in this thread, the following tool will replace any single-line string (as opposed to regexp) with any other single-line string using sed and bash:
$ cat sedstr
#!/bin/bash
old="$1"
new="$2"
file="${3:--}"
escOld=$(sed 's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g' <<< "$old")
escNew=$(sed 's/[&/\]/\\&/g' <<< "$new")
sed "s/$escOld/$escNew/g" "$file"
To illustrate the need for this tool, consider trying to replace a.*/b{2,}\nc with d&e\1f by calling sed directly:
$ cat file
a.*/b{2,}\nc
axx/bb\nc
$ sed 's/a.*/b{2,}\nc/d&e\1f/' file
sed: -e expression #1, char 16: unknown option to `s'
$ sed 's/a.*\/b{2,}\nc/d&e\1f/' file
sed: -e expression #1, char 23: invalid reference \1 on `s' command's RHS
$ sed 's/a.*\/b{2,}\nc/d&e\\1f/' file
a.*/b{2,}\nc
axx/bb\nc
# .... and so on, peeling the onion ad nauseum until:
$ sed 's/a\.\*\/b{2,}\\nc/d\&e\\1f/' file
d&e\1f
axx/bb\nc
or use the above tool:
$ sedstr 'a.*/b{2,}\nc' 'd&e\1f' file
d&e\1f
axx/bb\nc
The reason this is useful is that it can be easily augmented to use word-delimiters to replace words if necessary, e.g. in GNU sed syntax:
sed "s/\<$escOld\>/$escNew/g" "$file"
whereas the tools that actually operate on strings (e.g. awk's index()) cannot use word-delimiters.
NOTE: the reason to not wrap \ in a bracket expression is that if you were using a tool that accepts [\]] as a literal ] inside a bracket expression (e.g. perl and most awk implementations) to do the actual final substitution (i.e. instead of sed "s/$escOld/$escNew/g") then you couldn't use the approach of:
sed 's/[^^]/[&]/g; s/\^/\\^/g'
to escape \ by enclosing it in [] because then \x would become [\][x] which means \ or ] or [ or x. Instead you'd need:
sed 's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g'
So while [\] is probably OK for all current sed implementations, we know that \\ will work for all sed, awk, perl, etc. implementations and so use that form of escaping.
It should be noted that the regular expression used in some answers above among this and that one:
's/[^^\\]/[&]/g; s/\^/\\^/g; s/\\/\\\\/g'
seems to be wrong:
Doing first s/\^/\\^/g followed by s/\\/\\\\/g is an error, as any ^ escaped first to \^ will then have its \ escaped again.
A better way seems to be: 's/[^\^]/[&]/g; s/[\^]/\\&/g;'.
[^^\\] with sed (BRE/ERE) should be just [^\^] (or [^^\]). \ has no special meaning inside a bracket expression and needs not to be quoted.
Bash parameter expansion can be used to escape a string for use as a Sed replacement string:
# Define a sample multi-line literal. Includes a trailing newline to test corner case
replace='a&b;c\1
d/e
'
# Escape it for use as a Sed replacement string.
: "${replace//\\/\\\\}"
: "${_//&/\\\&}"
: "${_//\//\\\/}"
: "${_//$'\n'/\\$'\n'}"
replaceEscaped=$_
# Output should match "$replace"
sed -n "s/.*/$replaceEscaped/p" <<<''
In bash 5.2+, it can be simplified further:
# Define a sample multi-line literal. Includes a trailing newline to test corner case
replace='a&b;c\1
d/e
'
# Escape it for use as a Sed replacement string.
shopt -s extglob
shopt -s patsub_replacement # An & in the replacement will expand to what matched. bash 5.2+
: "${replace//#(&|\\|\/|$'\n')/\\&}"
replaceEscaped=$_
# Output should match "$replace"
sed -n "s/.*/$replaceEscaped/p" <<<''
Encapsulate it in a bash function:
##
# escape_replacement -v var replacement
#
# Escape special characters in _replacement_ so that it can be
# used as the replacement part in a sed substitute command.
# Store the result in _var_.
escape_replacement() {
if ! [[ $# = 3 && $1 = '-v' ]]; then
echo "escape_replacement: invalid usage" >&2
echo "escape_replacement: usage: escape_replacement -v var replacement" >&2
return 1
fi
local -n var=$2 # nameref (requires Bash 4.3+)
# We use the : command (true builtin) as a dummy command as we
# trigger a sequence of parameter expansions
# We exploit that the $_ variable (last argument to the previous command
# after expansion) contains the result of the previous parameter expansion
: "${3//\\/\\\\}" # Backslash-escape any existing backslashes
: "${_//&/\\\&}" # Backslash-escape &
: "${_//\//\\\/}" # Backslash-escape the delimiter (we assume /)
: "${_//$'\n'/\\$'\n'}" # Backslash-escape newline
var=$_ # Assign to the nameref
# To support Bash older than 4.3, the following can be used instead of nameref
#eval "$2=\$_" # Use eval instead of nameref https://mywiki.wooledge.org/BashFAQ/006
}
# Test the function
# =================
# Define a sample multi-line literal. Include a trailing newline to test corner case
replace='a&b;c\1
d/e
'
escape_replacement -v replaceEscaped "$replace"
# Output should match "$replace"
sed -n "s/.*/$replaceEscaped/p" <<<''

replace more than one special character with sed

I´m a nooby in regex so i have my headache with sed.
I need help to replace all special characters from the given company names with "-".
So this is the given string:
FML Finanzierungs- und Mobilien Leasing GmbH & Co. KG
I want the result:
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG
I tried the following:
nr = $(echo "$name" | sed -e 's/ /-/g'))
so this replace all whitespaces with -, but what the right expression to replace the others? My one search via google are not very successful.
That depends on what you consider to be a special character -- I say this because you appear to consider & a regular character but not ., which seems a bit odd. Anyway, I imagine something of the form
nr=$(echo "$name" | sed 's/[^[:alnum:]&]\+/-/g')
would serve you best. Here [^[:alnum:]&] matches any character that is not alphanumeric or &, and [^[:alnum:]&]\+ matches a sequence of one or more such characters, so the sed call replaces all such sequences in $name with a hyphen. If there are other characters that you consider regular, add them to the set. Note that the handling of umlauts and suchlike depends on your locale.
Also note that echo may cause trouble if $name begins with a hyphen (it could be parsed as options for echo), so if you can tether yourself to bash,
nr=$(sed 's/[^[:alnum:]&]\+/-/g' <<< "$name")
might be more robust.
Apparently you wan to remove - and . and then replace spaces with -.
This would do it, by saying sed -e 'one thing' -e 'another thing':
$ echo "$name" | sed -e 's/[-\.]//g' -e 's/ /-/g'
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG
Note we enclose within square backets all the characters that we want to treat equally: [-\.] means either - or . (we need to escape it, otherwise it would match any character).
Do this help you:
awk -vOFS=- '{gsub(/[.-]/,"");$1=$1}1' <<< "$name"
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG
gsub(/[.-]/,"") Removes . and _
-vOFS=- sets new field separator to -
$1=$1 reconstruct the line so it uses new field separator
1 print the line.
To get it to a variable
nr=$(awk -vOFS=- '{gsub(/[.-]/,"");$1=$1}1' <<< "$name")
Try this way also
echo "name" | sed 's/ \|- \|\. /-/g'
OutPut :
FML-Finanzierungs-und-Mobilien-Leasing-GmbH-&-Co-KG

pipe sed command to create multiple files

I need to get X to Y in the file with multiple occurrences, each time it matches an occurrence it will save to a file.
Here is an example file (demo.txt):
\x00START how are you? END\x00
\x00START good thanks END\x00
sometimes random things\x00\x00 inbetween it (ignore this text)
\x00START thats nice END\x00
And now after running a command each file (/folder/demo1.txt, /folder/demo2.txt, etc) should have the contents between \x00START and END\x00 (\x00 is null) in addition to 'START' but not 'END'.
/folder/demo1.txt should say "START how are you? ", /folder/demo2.txt should say "START good thanks".
So basicly it should pipe "how are you?" and using 'echo' I can prepend the 'START'.
It's worth keeping in mind that I am dealing with a very large binary file.
I am currently using
sed -n -e '/\x00START/,/END\x00/ p' demo.txt > demo1.txt
but that's not working as expected (it's getting lines before the '\x00START' and doesn't stop at the first 'END\x00').
If you have GNU awk, try:
awk -v RS='\0START|END\0' '
length($0) {printf "START%s\n", $0 > ("folder/demo"++i".txt")}
' demo.txt
RS='\0START|END\0' defines a regular expression acting as the [input] Record Separator which breaks the input file into records by strings (byte sequences) between \0START and END\0 (\0 represents NUL (null char.) here).
Using a multi-character, regex-based record separate is NOT POSIX-compliant; GNU awk supports it (as does mawk in general, but seemingly not with NUL chars.).
Pattern length($0) ensures that the associated action ({...}) is only executed if the records is nonempty.
{printf "START%s\n", $0 > ("folder/demo"++i)} outputs each nonempty record preceded by "START", into file folder/demo{n}.txt", where {n} represent a sequence number starting with 1.
You can use grep for that:
grep -Po "START\s+\K.*?(?=END)" file
how are you?
good thanks
thats nice
Explanation:
-P To allow Perl regex
-o To extract only matched pattern
-K Positive lookbehind
(?=something) Positive lookahead
EDIT: To match \00 as START and END may appear in between:
echo -e '\00START hi how are you END\00' | grep -aPo '\00START\K.*?(?=END\00)'
hi how are you
EDIT2: The solution using grep would only match single line, for multi-line it's better use perl instead. The syntax will be very similar:
echo -e '\00START hi \n how\n are\n you END\00' | perl -ne 'BEGIN{undef $/ } /\A.*?\00START\K((.|\n)*?)(?=END)/gm; print $1'
hi
how
are
you
What's new here:
undef $/ Undefine INPUT separator $/ which defaults to '\n'
(.|\n)* Dot matches almost any character, but it does not match
\n so we need to add it here.
/gm Modifiers, g for global m for multi-line
I would translate the nulls into newlines so that grep can find your wanted text on a clean line by itself:
tr '\000' '\n' < yourfile.bin | grep "^START"
from there you can take it into sed as before.

Replace slash in Bash

Let's suppose I have this variable:
DATE="04\Jun\2014:15:54:26"
Therein I need to replace \ with \/ in order to get the string:
"04\/Jun\/2014:15:54:26"
I tried tr as follows:
echo "04\Jun\2014:15:54:26" | tr '\' '\\/'
But this results in: "04\Jun\2014:15:54:26".
It does not satisfy me. Can anyone help?
No need to use an echo + a pipe + sed.
A simple substitution variable is enough and faster:
echo ${DATE//\//\\/}
#> 04\/Jun\/2014:15:54:26
Use sed for substitutions:
sed 's#/#\\/#g' < filename.txt > newfilename.txt
You usually use "/" instead of the "#", but as long as it is there, it doesn't matter.
I am writing this on a windows PC so I hope it is right, you may have to escape the slashes with another slash.
sed explained, the -e lets you edit the file in place. You can use -i to create a backup automatically.
sed -e s/STRING_TO_REPLACE/STRING_TO_REPLACE_IT/g index.html
here you go:
kent$ echo "04/Jun/2014:15:54:26"|sed 's#/#\\/#g'
04\/Jun\/2014:15:54:26
your tr line was not correct, you may mis-understand what tr does, tr 'abc' 'xyz' will change a->x, b->y, c->z,not changing whole abc->xyz..
You can also escape the slashes, with a slightly less readable solution than with hashes:
echo "04/Jun/2014:15:54:26" | sed 's/\//\\\//g'
This has not been said in other answers so I thought I'd add some clarifications:
tr uses two sets of characters for replacement, and the characters from the first set are replaced with those from the second set in a one-to-one correspondance. The manpage states that
SET2 is extended to length of SET1 by repeating its last character as necessary. Excess characters of SET2 are ignored.
Example:
echo abca | tr ab de # produces decd
echo abca | tr a de # produces dbcd, 'e' is ignored
echo abca | tr ab d # produces ddcd, 'd' is interpreted as a replacement for 'b' too
When using sed for substitutions, you can use another character than '/' for the delimiter, which will make your expression clearer (I like to use ':', #n34_panda proposed '#' in their answer). Don't forget to use the /g modifier to replace all occurences: sed 's:/:\\/:g' with quotes or sed s:/:\\\\/:g without (backslashes have to be escaped twice).
Finally your shortest solution will probably be #Luc-Olivier's answer, involving substitution, in the following form (don't forget to escape forward slashes too when part of the expected pattern):
echo ${variable/expected/replacement} # will replace one occurrence
echo ${variable//expected/replacement} # will replace all occurrences