Bash sed muti word find and replace

Bash sed muti word find and replace - regex

I am getting the error "unterminated substitute pattern" on mac os when attempting to replace multiple words with a different set of multiple words separated by spaces. I am doing this in a bash script. Reading from csv to replace a set of strings in files.
example
while IFS=, read col1 col2 col3
#$col1=FOO BAR
#$col2=another set of words
#$col3=file
do
REGX="'s|$col2|$col3|g'"
sed -i -e $REGX $col1
done < $config_file
I want the output to be "another set of words" can't seem to find out how to allow the spaces in the expression.
Thanks

You are defining the substitution to do in a variable so that you use later on:
REGX="'s|$col2|$col3|g'"
sed -i -e REGX col3
Another example:
$ cat a
hello this is a test
$ REGX="s/this/that/g"
$ sed $REGX a
hello that is a test
However, I would directly use the command as follows:
while IFS=, read -r col1 col2 col3
do
sed -i.bak -e "s|$col2|$col3|g" $col1
done < $config_file
Notes:
Use -r in read to avoid weird situations on corner cases.
Use double quotes in sed so that the variables within the expression are evaluated. Otherwise, it will look for literal $col2 and change with literal $col3.
Use -i.bak to create backup files when using -i. Otherwise, if you try and fail... you will lose your original document.

sed -i "s#foo bar#another set of words#g"
# OR
sed -i "s#foo|bar#another string#g"
sed use | as regex OR, use another separator (the default / suite well here also)

Related

Replace placeholders with corresponding env variables

I have a configuration file with placeholders like this (stored in /tmp/var for this example)
ldap_bind_dn='${bind_dn}'
Now I'd like replace ${bind_dn} with the the environment variable of the same name (which is set inside Docker).
export $bind_dn=CN=my-user,CN=Users,DC=example,DC=com
The expected result after processing aboves test file would be
ldap_bind_dn='CN=my-user,CN=Users,DC=example,DC=com'
I tried sed but it doesn't replace it to the value of the env variable
$ sed "s#\${\(.*\)}#$\1#" /tmp/var
bind_dn='$bind_dn'
Why sed replace with $bind_dn instead of the value? I'd expect that the variable is processed because I haven't escaped the $ sign.
The expression itself works, only the substitution doesn't:
$ sed "s#\${\(.*\)}#test123#" /tmp/var
bind_dn='test123'
The replacement is also done correctly when the target variable is hardcoded
$ sed "s#\${\(.*\)}#$bind_dn#" /tmp/var
bind_dn='CN=my-user,CN=Users,DC=example,DC=com'
But since we have a bunch of configuration variables, I'd like to automatically replace all env variables in the format ${NAME} automatically.

Not the most elegant solution, but this one works in bash:
sed 's#\${\(.*\)}#`{echo,"$\1"}`#' /tmp/var | xargs -n1 -I{} echo echo "{}" | bash -s
It is a little bit tricky because you need bash execution for the variable replacement, that's why I'm piping it to bash -s

While the env variable could be parsed by executing eval with mapfile, this seems not suiteable for me because its a ini configuration file. The sections marked with brackets like [general] would throw errors. And I also have concerns to just execute the WHOLE line, which allows executing any command.
This is fixed by the following awk:
awk '{while(match($0,"[$]{[^}]*}")) {var=substr($0,RSTART+2,RLENGTH -3);gsub("[$]{"var"}",ENVIRON[var])}}1' < /tmp/var > /tmp/var

Why sed replace with $bind_dn instead of the value?
Because sed is not supposed to do shell parameter expansion in its pattern space. That's the job of the shell mainly.
Using GNU sed:
~> cat /tmp/var
ldap_bind_dn='${bind_dn}'
~> export bind_dn=CN=my-user,CN=Users,DC=example,DC=com
~> sed -E 's/^\w+=.*/echo "&"/e' /tmp/var
ldap_bind_dn='CN=my-user,CN=Users,DC=example,DC=com'
The e command (a GNU extension) in
sed -E 's/^\w+=.*/echo "&"/e'
executes the command that is found in pattern space and replaces the pattern space with the output. For this example, pattern space is ldap_bind_dn='${bind_dn}', and is replaced with the output of echo "ldap_bind_dn='${bind_dn}'" (& references the whole matched portion of the pattern space in echo "&"). Since the argument of echo is in double quotes, it is subject to parameter expansion when it is executed by the shell.
Caveat: Make sure that the file (/tmp/var in this example) comes from a trusted source. Otherwise it may contain lines like foo='$(some_nasty_command)', which is executed when the sed command above runs.

extract substring using regex in shell script

The strings could be of form:
com.company.$(PRODUCT_NAME:rfc1034identifier)
$(PRODUCT_BUNDLE_IDENTIFIER)
com.company.$(PRODUCT_NAME:rfc1034identifier).$(someRandomVariable)
I need help in writing regex that extract all the string inside $(..)
I created a regex like ([(])\w+([)]) but when I try to execute in shell script, it gives me error of unmatched parenthesis.
This is what I executed:
echo "com.io.$(sdfsdfdsf)"|grep -P '([(])\w+([)])' -o
I need to get all matching substrings.

Problem is use of double quotes in echo command which is interpreting $(...) as a command substitution.
You can use single quotes:
echo 'com.io.$(sdfsdfdsf)' | grep -oP '[(]\w+[)]'
Here is an alternative using builtin BASH regex:
$> re='[(][^)]+[)]'
$> [[ 'com.io.$(sdfsdfdsf)' =~ $re ]] && echo "${BASH_REMATCH[0]}"
(sdfsdfdsf)

You can do it quite simple with sed
echo 'com.io.$(asdfasdf)'|sed -e 's/.*(\(.*\))/\1/g'
Gives
asdfasdf
For two fields:
echo 'com.io.$(asdfasdf).$(ddddd)'|sed -e 's/.*((.*)).$((.*))/\1 \2/g'
Gives
asdfasdf ddddd
Explanation:
sed -e 's/.*(\(.*\))/\1/g'
\_/\____/ \/
| | |_ print the placeholder content
| |___ placeholder selecting the text inside the paratheses
|____ select the text from beginning including the first paranthese

Your question specifies "shell", but not "bash". So I'll start with a common shell-based tool (awk) rather than assuming you can use any particular set of non-POSIX built-ins.
$ cat inp.txt
com.company.$(PRODUCT_NAME:rfc1034identifier)
$(PRODUCT_BUNDLE_IDENTIFIER)
com.company.$(PRODUCT_NAME:rfc1034identifier).$(someRandomVariable)
$ awk -F'[()]' '{for(i=2;i<=NF;i+=2){print $i}}' inp.txt
PRODUCT_NAME:rfc1034identifier
PRODUCT_BUNDLE_IDENTIFIER
PRODUCT_NAME:rfc1034identifier
someRandomVariable
This awk one-liner defines a field separator that consists of opening or closing brackets. With such a field separator, every even-numbered field will be the content you're looking for, assuming all lines of input are correctly formatted and there are no parentheses embedded inside other parentheses.
If you did want to do this in POSIX shell alone, the following would be an option:
#!/bin/sh
while read line; do
while expr "$line" : '.*(' >/dev/null; do
line="${line#*(}"
echo "${line%%)*}"
done
done < inp.txt
This steps through each line of input, slicing it up using the parentheses and printing each slice. Note that this uses expr, which most likely an external binary, but is at least included in POSIX.1.

sed got error if no single quotes around the regular expression

I tried to do the following command in bash:
ls -1 | sed s/\(.*\)/"\1"/
which is add double quotes around each output of ls, but the result shows
sed: 1: "s/(.*)/\1/": \1 not defined in the RE
after I add single quotes around the regular expression, I got the right result. the right one is:
ls -1 | sed 's/\(.*\)/"\1"/'
theocratically I do not need the outer quotes right? any one has the same experience?

Single quotes are used to disable shell parsing of various sequences including backslash escapes. If you don't use them, your sequences like \( are passed to sed as (. You may check that by adding echo to the beginning of your command.

Sending the command to echo will show you what sed sees
$ echo sed s/\(.*\)/"\1"/
sed
Hmm, the sed script disappeared altogether. The exposed "*" is forcing the shell to try to match files. Let's disable that:
$ set -f
$ echo sed s/\(.*\)/"\1"/
sed s/(.*)/\1/
The shell ate the quotes and the backslashes. Quoting the sed script:
$ echo sed 's/\(.*\)/"\1"/'
sed s/\(.*\)/"\1"/
That gives the right result, sed will see the script you want to give it. How can we do that without quotes
$ echo sed s/\\\(.\*\\\)/\"\\1\"/
sed s/\(.*\)/"\1"/
And that's ugly.

Perl substitution output drops characters from Bash script input

I have a variable in a bash script with a length of 64 characters
authkey=$(LC_ALL=C tr -cd 'a-zA-Z0-9,;.:_#*+~!#$%&()=?{[]}|><-' < /dev/random | head -c 64)
if i parse the variable to perl to do a string substitution
perl -pi -e "s/'AUTH_KEY', 'put your unique phrase here'/'AUTH_KEY', '$authkey'/" test.txt
depending on the selected random characters the length of the output differs.The output looks the following (The first string is the output in the resulting text file, the second string is the echo'ed output of the variable in the bash script)
q=dB7oUz59.IDBXI:i>ckW4oy3smX&k:-C.[rIf*9w}H=(N93yiB&nk{fP:y0_
q=dB7oUz59.IDBXI:i>ckW4oy3smX&k$s:-C.[rIf*9w}H=(N93yiB&nk{fP:y0_
5A+BwP~l3~<evp.ciTkMYtvmPjyMrL=):Qj1VaMI(,TSS,ZGMcd.m,4W
5A+BwP~l3~<evp.ciTkMYtvmPjyMrL=):Qj1VaMI(#Dk7UNgs,TSS,ZGMcd.m,4W
dX73}i5G1d;L*J=60WHHe<!61Ji_KJ)T5B~b2bCfaNDjBQr_N]}3HS=;GzAaX<gB
dX73}i5G1d;L*J=60WHHe<!61Ji_KJ)T5B~b2bCfaNDjBQr_N]}3HS=;GzAaX<gB
6Ndn(9+:>(6>*rh?B.m),3POp)>sfm8c1rh9vXr~fzZj;]!)kf3#60=M
6Ndn(9+:>(6>*rh?B.m),3POp)>sfm8c1rh9vXr~fzZj;]#YH!)kf3#$=$$ckt=M
FYMI,K|6WutC&dr-3]6)f(>QU-~{vBX>n!J-zq:kK84T|fZ7UW:{1&qU[nwYZLmC
FYMI,K|6WutC&dr-3]6)f(>QU-~{vBX>n!J-zq:kK84T|fZ7UW:{1&qU[nwYZLmC
5A+BwP~l3~<evp.ciTkMYtvmPjyMrL=):Qj1VaMI(,TSS,ZGMcd.m,4W
5A+BwP~l3~<evp.ciTkMYtvmPjyMrL=):Qj1VaMI(#Dk7UNgs,TSS,ZGMcd.m,4W
v1FR8c8}dZD(QGwOrr%M{FSUw*?h.JGI?Ay4tgRVp~l7C5eAxW<w<;c}emeX#S
v1FR8c8}dZD(QGwOrr%M{FSUw*?h.JGI?Ay4tgRVp#s~l7C5eAxW<w<;c}emeX#S
+MGg0=*NrhJ}.qPkk6v[lc)J.uiW1o?LL5t<HTC#Q-hSeqn%-ke!cwL5tk[e
+MGg$|=*NrhJ}.qPkk6v[lc)J.uiW1o?L$55L5t<HTC#Q-hSeqn%-ke!cwL5tk[e
each character dropout was caused by either a $ or # at the beginning of the group of characters. Is there a way to prevent that behaviour? Best regards Ralf

Using a single quote ' as the delimiter instead of slash / for the substitution suppresses variable interpolation
$ foobar=\$bar; perl -p -e "s'foo'$foobar'"
xx
xx
foo
$bar
^C
$
Unfortunately the single quotes that are matched in the substitution now need escaping
foobar=\$bar; perl -p -e "s'\'foo'$foobar\''"
x
x
'foo
$bar'
^C
But that seems to get passed through to Perl OK, without munging the authkey contents with sed

You can escape $ and # before calling perl:
authey=$(echo -n "$authkey" | sed -re 's/(\$|\#)/\\\1/g')

Why is sed not recognizing \t as a tab?

sed "s/\(.*\)/\t\1/" $filename > $sedTmpFile && mv $sedTmpFile $filename
I am expecting this sed script to insert a tab in front of every line in $filename however it is not. For some reason it is inserting a t instead.

Not all versions of sed understand \t. Just insert a literal tab instead (press Ctrl-V then Tab).

Using Bash you may insert a TAB character programmatically like so:
TAB=$'\t'
echo 'line' | sed "s/.*/${TAB}&/g"
echo 'line' | sed 's/.*/'"${TAB}"'&/g' # use of Bash string concatenation

#sedit was on the right path, but it's a bit awkward to define a variable.
Solution (bash specific)
The way to do this in bash is to put a dollar sign in front of your single quoted string.
$ echo -e '1\n2\n3'
1
2
3
$ echo -e '1\n2\n3' | sed 's/.*/\t&/g'
t1
t2
t3
$ echo -e '1\n2\n3' | sed $'s/.*/\t&/g'
1
2
3
If your string needs to include variable expansion, you can put quoted strings together like so:
$ timestamp=$(date +%s)
$ echo -e '1\n2\n3' | sed "s/.*/$timestamp"$'\t&/g'
1491237958 1
1491237958 2
1491237958 3
Explanation
In bash $'string' causes "ANSI-C expansion". And that is what most of us expect when we use things like \t, \r, \n, etc. From: https://www.gnu.org/software/bash/manual/html_node/ANSI_002dC-Quoting.html#ANSI_002dC-Quoting
Words of the form $'string' are treated specially. The word expands
to string, with backslash-escaped characters replaced as specified by
the ANSI C standard. Backslash escape sequences, if present, are
decoded...
The expanded result is single-quoted, as if the dollar sign had not
been present.
Solution (if you must avoid bash)
I personally think most efforts to avoid bash are silly because avoiding bashisms does NOT* make your code portable. (Your code will be less brittle if you shebang it to bash -eu than if you try to avoid bash and use sh [unless you are an absolute POSIX ninja].) But rather than have a religious argument about that, I'll just give you the BEST* answer.
$ echo -e '1\n2\n3' | sed "s/.*/$(printf '\t')&/g"
1
2
3
* BEST answer? Yes, because one example of what most anti-bash shell scripters would do wrong in their code is use echo '\t' as in #robrecord's answer. That will work for GNU echo, but not BSD echo. That is explained by The Open Group at http://pubs.opengroup.org/onlinepubs/9699919799/utilities/echo.html#tag_20_37_16 And this is an example of why trying to avoid bashisms usually fail.

I've used something like this with a Bash shell on Ubuntu 12.04 (LTS):
To append a new line with tab,second when first is matched:
sed -i '/first/a \\t second' filename
To replace first with tab,second:
sed -i 's/first/\\t second/g' filename

Use $(echo '\t'). You'll need quotes around the pattern.
Eg. To remove a tab:
sed "s/$(echo '\t')//"

You don't need to use sed to do a substitution when in actual fact, you just want to insert a tab in front of the line. Substitution for this case is an expensive operation as compared to just printing it out, especially when you are working with big files. Its easier to read too as its not regex.
eg using awk
awk '{print "\t"$0}' $filename > temp && mv temp $filename

I used this on Mac:
sed -i '' $'$i\\\n\\\thello\n' filename
Used this link for reference

sed doesn't support \t, nor other escape sequences like \n for that matter. The only way I've found to do it was to actually insert the tab character in the script using sed.
That said, you may want to consider using Perl or Python. Here's a short Python script I wrote that I use for all stream regex'ing:
#!/usr/bin/env python
import sys
import re
def main(args):
if len(args) < 2:
print >> sys.stderr, 'Usage: <search-pattern> <replace-expr>'
raise SystemExit
p = re.compile(args[0], re.MULTILINE | re.DOTALL)
s = sys.stdin.read()
print p.sub(args[1], s),
if __name__ == '__main__':
main(sys.argv[1:])

Instead of BSD sed, i use perl:
ct#MBA45:~$ python -c "print('\t\t\thi')" |perl -0777pe "s/\t/ /g"
hi

I think others have clarified this adequately for other approaches (sed, AWK, etc.). However, my bash-specific answers (tested on macOS High Sierra and CentOS 6/7) follow.
1) If OP wanted to use a search-and-replace method similar to what they originally proposed, then I would suggest using perl for this, as follows. Notes: backslashes before parentheses for regex shouldn't be necessary, and this code line reflects how $1 is better to use than \1 with perl substitution operator (e.g. per Perl 5 documentation).
perl -pe 's/(.*)/\t$1/' $filename > $sedTmpFile && mv $sedTmpFile $filename
2) However, as pointed out by ghostdog74, since the desired operation is actually to simply add a tab at the start of each line before changing the tmp file to the input/target file ($filename), I would recommend perl again but with the following modification(s):
perl -pe 's/^/\t/' $filename > $sedTmpFile && mv $sedTmpFile $filename
## OR
perl -pe $'s/^/\t/' $filename > $sedTmpFile && mv $sedTmpFile $filename
3) Of course, the tmp file is superfluous, so it's better to just do everything 'in place' (adding -i flag) and simplify things to a more elegant one-liner with
perl -i -pe $'s/^/\t/' $filename

TAB=$(printf '\t')
sed "s/${TAB}//g" input_file
It works for me on Red Hat, which will remove tabs from the input file.

If you know that certain characters are not used, you can translate "\t" into something else.
cat my_file | tr "\t" "," | sed "s/(.*)/,\1/"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js