sed escape user input string [duplicate] - regex

This question already has answers here:
Escape a string for a sed replace pattern
(17 answers)
Closed 7 years ago.
I am using sed for string replacement in a config file.
User has to input the string salt and then I replace this salt string in the config file:
Sample config file myconfig.conf
CONFIG_SALT_VALUE=SOME_DUMMY_VALUE
I use the command to replace dummy value with value of salt entered by the user.
sed -i s/^CONFIG_SALT_VALUE.*/CONFIG_SALT_VALUE=$salt/g" ./myconfig.conf
Issue : value of $salt can contain any character, so if $salt contains / (like 12d/dfs) then my above sed command breaks.
I can change delimiter to !, but now again if $salt contains amgh!fhf then my sed command will break.
How should I proceed to this problem?

You can use almost any character as sed delimiter. However, as you mention in your question, to keep changing it is fragile.
Maybe it is useful to use awk instead, doing a little bit of parsing of the line:
awk 'BEGIN{repl=ARGV[1]; ARGV[1]=""; FS=OFS="="}
$1 == "CONFIG_SALT_VALUE" {$2=repl}
1' "$salt" file
As one liner:
awk 'BEGIN{repl=ARGV[1]; ARGV[1]=""; FS=OFS="="} $1 == "CONFIG_SALT_VALUE" {$2=repl}1' "$salt" file
This sets = as field separator. Then, it checks when a line contains CONFIG_SALT_VALUE as parameter name. When this happens, it replaces the value to the one given.
To prevent values in $salt like foo\\bar from being interpreted, as that other guy commented in my original answer, we have the trick:
awk 'BEGIN{repl=ARGV[1]; ARGV[1]=""} ...' "$var" file
This uses the answer in How to use variable including special symbol in awk? where Ed Morton says that
The way to pass a shell variable to awk without backslashes being
interpreted is to pass it in the arg list instead of populating an awk
variable outside of the script.
and then
You need to set ARGV[1]="" after populating the awk variable to
avoid the shell variable value also being treated as a file name.
Unlike any other way of passing in a variable, ALL characters used in
a variable this way are treated literally with no "special" meaning.
This does not do in-place editing, but you can redirect to another file and then replace the original:
awk '...' file > tmp_file && mv tmp_file file

Related

Sed finds backslash but doesn't replace it (escape character problem)? [duplicate]

This question already has answers here:
Is it possible to escape regex metacharacters reliably with sed
(4 answers)
Closed 2 years ago.
I want to replace \emph{G. fortis} with \emph{G. fortis}\index{\emph{Geospiza fortis}} to add terms to an index in a TeX document. I have a list of words in a file called gfortis that I will pass through the sed command in a while read -r command.
PREFTXt="\emph{G. fortis}" # text to search
REPLACETXT="$PREFTXt\index{\emph{Geospiza fortis}}" # text to replace
sed -e "s/${PREFTXt}/${REPLACETXT}/" path/chapt1.tex
The result is this:
\emph{G. fortis}index{emph{Geospiza fortis}}
But it should be:
\emph{G. fortis}\index{\emph{Geospiza fortis}}
The final command looks like that:
while read -r RP
do
echo "Adding $RP to the index"
PREFTXt="$RP"
ADDTXt="\index{\emph{Geospiza fortis}}"
REPLACETXT="$PREFTXt$ADDTXt"
echo "Replaced $RP with $REPLACETXT"
sed -e "s/${PREFTXt}/${REPLACETXT}/" path/chapt1.tex # should replace the text within this file.
done < path/words_index/gfortis # input the words file to replace with a certain \index command
The cap1.txt contains this:
\chapter{Another chapter in the wall}
NICE other\index{other} to be added to the index.
\emph{Geospiza fortis}
All of the stuff that I put here shall be into the index.
\emph{Geospiza fortis}
This index will be gigantic, but I won't be making multiple indexes.
\emph{G. fortis}
Other cool stuff here
\emph{G. fortis}
I'm using bash in macOS Mojave
You need to use a double backslash instead of a single one. This is because bash/shell etc. will interpret it as a special character and replace "\e" with "e".
To avoid having to escape those, you could put their contents in a file, for instance preftxt.cfg, and do something similar for the other file
it would contain
\emph{G. fortis}
And you could use it like this
PREFTXt="$(cat preftxt.cfg)"
use 3 backslash \ instead of one -- \\\
REPLACETXT="$PREFTXt\\\index{\\\emph{Geospiza fortis}}"

Is there a way to use sed to remove only the exact string match?

I have recently started learning bash and I ran into a problem doing an assignment, So I have a txt file and in it contains something like
foo:abc:200:1:1:1
foobar:asd:100:3:2:1
bar:test:100:2:2:2
where the first column is the title of the book followed by the author name followed by price,quantity available and qty sold all seperated with the delimiter ":"
the goal here is to remove a book base on the name and author the user types in.
I have searched around and found that sed might possibly be able to help me with this problem, I have tried to test sed by deleting base on the title alone with
sed /"foo"/d Book.txt
I expected the output to be
foobar:asd:100:3:2:1
bar:test:100:2:2:2
however the output was
bar:test:100:2:2:2
which tells me that any line in the txt file containing "foo" will get deleted
Hence I would like to ask
Is there any way to use sed so it deletes the exact match only instead of lines containing foo?
is there any way to use delimiters with sed so I can use both title and author?
Should I be using something other than sed?
Using sed it is better to use:
sed -E '/(^|:)foo(:|$)/d' file
foobar:asd:100:3:2:1
bar:test:100:2:2:2
Which makes sure foo is preceded by start or : and followed by end or :.
However this job is more suitable for awk as data is delimited by colon:
awk -F: '$1 != "foo"' file
Is there any way to use sed so it deletes the exact match only instead of lines containing foo?
Yes you can for the given example, if you mark your search pattern to match exactly foo: you can have luck deleting it. For e.g. if you do below
sed '/^foo:/d' file
The pattern ^ marks that the string starting with foo followed by a colon mark : which matches your use-case. This is assuming foo can be part of the fist column only
Is there any way to use delimiters with sed so I can use both title and author?
Should I be using something other than sed?
If you are dealing with a input file has a fixed de-limiter like : which will never form a part of your valid column content, then using awk/perl are better suited as they read text easily once a de-limiter is set.
As an example, consider an e.g. if you want to change the quantity name from fourth column for one particular book named foobar, with awk you can just do
awk -F: 'BEGIN { OFS = FS } $1 == "foobar" { $4 = 6 }1' input-file
To decode above line, the content within '..' are left untouched by the shell and passed literally to the command, that's why we wrap the content in single quotes. Also the statements inside it are not meaningful in the context of the shell.
So the -F: sets the input field-separator to : which is when the command reads the file line by line, the first line is broken down into tokens separated by :. The first column is labelled $1, which is extended up to $NF, meaning the last column of the line. The part BEGIN { OFS = FS } assigns the output field separator as the same as input i.e. retain the : de-limitation when awk writes the output also.
The part $1 == "foobar" { $4 = 6 } is almost self-explanatory in a sense, that if the first column contains the string within quotes do the action inside {..}, which is set the fourth column value as 6. The {..}1 is a short-hand notation for {...; print} which is to re-construct the line based on the output field/record separators defined.
This might work for you (GNU sed):
sed '/\<foo\>/d' file
Or
sed '/\bfoo\b/d' file
The first solution uses \< start word and \> end word. The second solution uses the \b word boundary.
P.S. The dual of \b is \B so to delete lines that contain foobar or foobaz but not foo only, use:
sed '/\bfoo\B/d' file

BASH escaping double quotes within single quotes

I'm trying to write a bash function that would escape all double quotes within single quotes, eg:
'I need to escape "these" quotes with backslashes'
would become
'I need to escape \"these\" quotes with backslashes'
My take on it was:
Find pairs of single quotes in the input and extract them with grep
Pipe into sed, escape double quotes
Sed again the whole input and replace grep match with sedded match
I managed to get it working to the part of having correctly escaped quotes section, but replacing it in the whole input fails.
The script code copypaste:
# $1 - Full name, $2 - minified name
adjust_quotes ()
{
SINGLE_QUOTES=`grep -Eo "'.*'" $2`
ESCAPED_QUOTES=`echo $SINGLE_QUOTES | sed 's|"|\\\\"|g'`
sed -r "s|'.*'|$ESCAPED_QUOTES|g" "$2" > "$2.escaped"
mv "$2.escaped" $2
echo "Quotes escaped within single quotes on $2"
}
Random additional questions:
In the console, escaping the quote with only two backslashes works, but when code is put in the script - I need four. I'd love to know
Could I modify this code into a loop to escape all pairs of single quotes, one after another until EOF?
Thanks!
P.S. I know this would probably be easier to do in eg. python, but I really need to keep it in bash.
Using BASH string replacement:
s='I need to escape "these" quotes with backslashes'
r="${s//\"/\\\"}"
echo "$r"
I need to escape \"these\" quotes with backslashes
Here's a pure bash solution, which does the transformation on stdin, printing to stdout. It reads the entire input into memory, so it won't work with really enormous files.
escape_enclosed_quotes() (
IFS=\'
read -d '' -r -a fields
for ((i=1; i<${#fields[#]}; i+=2)); do
fields[i]=${fields[i]//\"/\\\"}
done
printf %s "${fields[*]}"
)
I deliberately enclosed the body of the function in parentheses rather than braces, in order to force the body to run in a subshell. That limits the modification of IFS to the body, as well as implicitly making the variables used local.
The function uses the read builtin to read the entire input (since the line delimiter is set to NUL with -d '') into an array (-a) using a single quote as the field separator (IFS=\'). The result is that the parts of the input surrounded with single quotes are in the odd positions of the array, so the function loops over the odd indices to do the substitution only for those fields. I use bash's find-and-replace syntax instead of deferring to an external utility like sed.
This being bash, there are a couple of gotchas:
If the file contains a NUL, the rest of the file will be ignored.
If the last line of the file does not end with a newline, and the last character of that line is a single quote, it will not be output.
Both of the above conditions are impossible in a portable text file, so it's probably OK. All the same, worth taking note.
The supplementary question: why are the extra backslashes needed in
ESCAPED_QUOTES=`echo $SINGLE_QUOTES | sed 's|"|\\\\"|g'`
Answer: It has nothing to do with that line being in a script. It has to do with your use of backticks (...) for command substitution, and the idiosyncratic and often unpredictable handling of backslashes inside backticks. This syntax is deprecated. Do not use it. (Not even if you see someone else using it in some random example on the internet.) If you had used the recommended $(...) syntax for command substitution, it would have worked as expected:
ESCAPED_QUOTES=$(echo $SINGLE_QUOTES | sed 's|"|\\"|g')
(More information is in the Bash FAQ linked above.)

"sed" special characters handling

we have an sed command in our script to replace the file content with values from variables
for example..
export value="dba01upc\Fusion_test"
sed -i "s%{"sara_ftp_username"}%$value%g" /home_ldap/user1/placeholder/Sara.xml
the sed command ignores the special characters like '\' and replacing with string "dba01upcFusion_test" without '\'
It works If I do the export like export value='dba01upc\Fusion_test' (with '\' surrounded with ‘’).. but unfortunately our client want to export the original text dba01upc\Fusion_test with single/double quotes and he don’t want to add any extra characters to the text.
Can any one let me know how to make sed to place the text with special characters..
Before Replacement : Sara.xml
<?xml version="1.0" encoding="UTF-8"?>
<ser:service-account >
<ser:description/>
<ser:static-account>
<con:username>{sara_ftp_username}</con:username>
</ser:static-account>
</ser:service-account>
After Replacement : Sara.xml
<?xml version="1.0" encoding="UTF-8"?>
<ser:service-account>
<ser:description/>
<ser:static-account>
<con:username>dba01upcFusion_test</con:username>
</ser:static-account>
</ser:service-account>
Thanks in advance
You cannot robustly solve this problem with sed. Just use awk instead:
awk -v old="string1" -v new="string2" '
idx = index($0,old) {
$0 = substr($0,1,idx-1) new substr($0,idx+length(old))
}
1' file
Ah, #mklement0 has a good point - to stop escapes from being interpreted you need to pass in the values in the arg list along with the file names and then assign the variables from that, rather than assigning values to the variables with -v (see the summary I wrote a LONG time ago for the comp.unix.shell FAQ at http://cfajohnson.com/shell/cus-faq-2.html#Q24 but apparently had forgotten!).
The following will robustly make the desired substitution (a\ta -> e\tf) on every search string found on every line:
$ cat tst.awk
BEGIN {
old=ARGV[1]; delete ARGV[1]
new=ARGV[2]; delete ARGV[2]
lgthOld = length(old)
}
{
head = ""; tail = $0
while ( idx = index(tail,old) ) {
head = head substr(tail,1,idx-1) new
tail = substr(tail,idx+lgthOld)
}
print head tail
}
$ cat file
a\ta a a a\ta
$ awk -f tst.awk 'a\ta' 'e\tf' file
e\tf a a e\tf
The white space in file is tabs. You can shift ARGV[3] down and adjust ARGC if you like but it's not necessary in most cases.
Update with the benefit of hindsight, to present options:
Update 2: If you're intent on using sed, see the - somewhat cumbersome, but now robust and generic - solution below.
If you want a robust, self-contained awk solution that also properly handles both arbitrary search and replacement strings (but cannot incorporate regex features such as word-boundary assertions), see Ed Morton's answer.
If you want a pure bash solution and your input files are small and preserving multiple trailing newlines is not important, see Charles Duffy's answer.
If you want a full-fledged third-party templating solution, consider, for instance, j2cli, a templating CLI for Jinja2 - if you have Python and pip, install with sudo pip install j2cli.
Simple example (note that since the replacement string is provided via a file, this may not be appropriate for sensitive data; note the double braces ({{...}})):
value='dba01upc\Fusion_test'
echo "sara_ftp_username=$value" >data.env
echo '<con:username>{{sara_ftp_username}}</con:username>' >tmpl.xml
j2 tmpl.xml data.env # -> <con:username>dba01upc\Fusion_test</con:username>
If you use sed, careful escaping of both the search and the replacement string is required, because:
As Ed Morton points out in a comment elsewhere, sed doesn't support use of literal strings as replacement strings - it invariably interprets special characters/sequences in the replacement string.
Similarly, the search string literal must be escaped in a way that its characters aren't mistaken for special regular-expression characters.
The following uses two generic helper functions that perform this escaping (quoting) that apply techniques explained at "Is it possible to escape regex characters reliably with sed?":
#!/usr/bin/env bash
# SYNOPSIS
# quoteRe <text>
# DESCRIPTION
# Quotes (escapes) the specified literal text for use in a regular expression,
# whether basic or extended - should work with all common flavors.
quoteRe() { sed -e 's/[^^]/[&]/g; s/\^/\\^/g; $!a\'$'\n''\\n' <<<"$1" | tr -d '\n'; }
# '
# SYNOPSIS
# quoteSubst <text>
# DESCRIPTION
# Quotes (escapes) the specified literal string for safe use as the substitution string (the 'new' in `s/old/new/`).
quoteSubst() {
IFS= read -d '' -r < <(sed -e ':a' -e '$!{N;ba' -e '}' -e 's/[&/\]/\\&/g; s/\n/\\&/g' <<<"$1")
printf %s "${REPLY%$'\n'}"
}
# The search string.
search='{sara_ftp_username}'
# The replacement string; a demo value with characters that need escaping.
value='&\1%"'\'';<>/|dba01upc\Fusion_test'
# Use the appropriately escaped versions of both strings.
sed "s/$(quoteRe "$search")/$(quoteSubst "$value")/g" <<<'<el>{sara_ftp_username}</el>'
# -> <el>&\1%"';<>/|dba01upc\Fusion_test</el>
Both quoteRe() and quoteSubst() correctly handle multi-line strings.
Note, however, given that sed reads a single line at at time by default, use of quoteRe() with multi-line strings only makes sense in sed commands that explicitly read multiple (or all) lines at once.
quoteRe() is always safe to use with a command substitution ($(...)), because it always returns a single-line string (newlines in the input are encoded as '\n').
By contrast, if you use quoteSubst() with a string that has trailing newlines, you mustn't use $(...), because the latter will remove the last trailing newline and therefore break the encoding (since quoteSubst() \-escapes actual newlines, the string returned would end in a dangling \).
Thus, for strings with trailing newlines, use IFS= read -d '' -r escapedValue < <(quoteSubst "$value") to read the escaped value into a separate variable first, then use that variable in the sed command.
This can be done with bash builtins alone -- no sed, no awk, etc.
orig='{sara_ftp_username}' # put the original value into a variable
new='dba01upc\Fusion_test' # ...no need to 'export'!
contents=$(<Sara.xml) # read the file's content into
new_contents=${contents//"$orig"/$new} # use parameter expansion to replace
printf '%s' "$new_contents" >Sara.xml # write new content to disk
See the relevant part of BashFAQ #100 for information on using parameter expansion for string substitution.

Last Occurrence of Character Field Separator AWK

I'm using a find command to find all files of a certain format, that command has been golden. I'm piping that output into an awk command and I want to use the last underscore as a field separator. The problem being that depending on the path the file is in, there could be one or two underscores before the fact.
find . -regex ".*prob[0-9]*_.*" | awk 'BEGIN { FS = "_.*$" } { print $1 " " $2 }'
I get what's wrong with the regular expression in my field separator, it thinks to separate on the underscore and whatever follows, is there away to specify just the single character itself. Moreover, how do I specifically use a field separator on the last occurrence of a character.
This is somewhat an extension of a question I asked earlier:
Suppress output to StdOut when piping echo
The files I get are generally like this, the wrinkle being that the directory can have an underscore as well:
/the/directory/probXXXXX_XX
where X is any integer.
A workaround I've been thinking of is separating at every underscore and then print every column... I'd rather like to get it working in the method above though.
A trick of awk that is not obvious is that $ is an operator; you can use it with a variable or even an expression, and in particular with expressions involving the predefined variable NF: $NF gets the last field, $(NF - 1) the second last field.