How does bash expand escaped characters when dereferencing variables - regex

If I quit using variables, and just write the regexes directly into to the last sed command,
everything works. But as it is here, no substitutions are done?
#!/bin/bash
#html substitutions
ampP="\&"
ampR="&"
ltP="\<"
ltR="<"
gtP="\>"
gtR=">"
quotP="\""
quotP2='\โ€œ'
quotP3="\โ€"
quotR="\""
tripDotP="\&#8230"
tripDotR="..."
tickP="\โ€™"
tickR="\ยด"
#get a random page, and filter out the quotes
#pick a random quote
#translate wierd html symbols
curl "www.yodaquotes.net/page/$((RANDOM % 9 +1))/" -Ls | sed -nr 's/.*data-text=\"([^\"]+)\".*/\1/p' \
| sort -R | head -n1 \
| sed 's/"$ampP"/"$ampR"/g; s/$ltP/$ltR/g; s/$gtP/$gtR/g; s/$quotP/$quotR/g; s/"$quotP2"/"$quotR"/g; s/$quotP3/$quotR/g; s/$tripDotP/$tripDotR/g; s/$stickP/$stickR/g'

This sed isn't going to work:
sed 's/"$ampP"/"$ampR"/g'
because of wrong shell quoting. Your shell variables won't be expanded at all in single quotes. Try using this form:
sed "s~$ampP~$ampR~g"

Debugging 101: Let's just echo what sed receives:
echo 's/"$ampP"/"$ampR"/g; s/$ltP/$ltR/g; s/$gtP/$gtR/g; s/$quotP/$quotR/g; s/"$quotP2"/"$quotR"/g; s/$quotP3/$quotR/g; s/$tripDotP/$tripDotR/g; s/$stickP/$stickR/g'
s/"$ampP"/"$ampR"/g; s/$ltP/$ltR/g; s/$gtP/$gtR/g; s/$quotP/$quotR/g; s/"$quotP2"/"$quotR"/g; s/$quotP3/$quotR/g; s/$tripDotP/$tripDotR/g; s/$stickP/$stickR/g
That doesn't look right now, does it?
There's no variable substitution within single quotes in bash. That's why we have two different quotes, so you can decide which one is more appropriate for the task.
For readability I would suggest putting each sed command within double quotes.
Like this: "s/$ampP/$ampR/g"

Related

Filtering a variable in bash script using regex tr or awk

row1=$('+00 00:30:07.880000')
rowX=$('row1 | tr -dc '0-9')
I basically want to filter out all the special characters and space.
I wish to have a output as follows.
echo $'row1' = 003007.880000
You don't need regular expressions or external commands like tr for this. Bash's built-in parameter expansion can do it:
row1='+00 00:30:07.880000'
row1=${row1//[^0-9.]/}
echo "row1=$row1"
outputs row1=00003007.880000.
The output has two leading zeros that are not in the output suggested in the question. Maybe there's an unstated requirement to remove prefixes delimited by spaces. If that is the case, possible code is:
row1='+00 00:30:07.880000'
row1=${row1##* }
row1=${row1//[^0-9.]/}
echo "row1=$row1"
That outputs row1=003007.880000.
See How do I do string manipulations in bash? for explanations of ${row1//[^0-9.]/} and ${row1##* }.
This is the easiest way to do that :
$ echo '+00 00:30:07.880000' | tr -dc '[0-9].'
00003007880000
Regards!

How to use sed in shell script to replace all environment value occurrences with their current values

I would like to have a shell script to iterate over all the occurrences of environment variable names in a file and replace them with their current values. I am not sure how this can be done by using sed command.
The file content:
values:
value1:
name: "something"
value: "$ENV_VAR1" # this could be any variable name
value2:
name: "something"
value: "$ENV_VAR2"
...
First, I need to find all occurrences of any variable (Using regex "\$(.*?)" ). Then, somehow, I need to replace it with the variable value from the shell. I am not sure how I can use the sed command to achieve the second part as the variable name is specified in the file itself.
Something like the following command:
sed -i "s/\"\$(.*?)\"/${Some_How_Get_Var_Name}/g" file.yaml
This is a problem that comes up often. envsubst is commonly given as a solution, but I find it's easier to just stick with perl and do something like:
perl -pe 'while (my ($k, $v) = each %ENV) { s/\$$k/$v/g }'
This is almost certainly not a robust solution (it will replace $FOO, but it won't do replacements of the form ${FOO}), but I find I'm always disappointed that envsubst doesn't do ${FOO-bar}, and envsubst seems less ubiquitous than perl.
Or, rather than doing the replacement for everything in the environment, you might prefer something like:
perl -pe 's/\$([[:alpha:]_][_[:alnum:]]+)/$ENV{$1}/g'
or
perl -pe 's/\$([[:alpha:]_][\w]+)/$ENV{$1}/g'
These last two will replace '$FOO' with the empty string if FOO is not defined, while the first leaves it unreplaced. Which behavior you desire may drive the decision as to which to use.
I won't claim these are completely correct, but they are a reasonable approximation.
If You are using bash and the envsubst command is avaiable you can do:
envsubst < inputfile
E.g. (creating a temp input for demonstrating it:
$ env | tail -2 | sed 's_^_$_'
$MANPATH=/home/linuxbrew/.linuxbrew/share/man:
$INFOPATH=/home/linuxbrew/.linuxbrew/share/info:
Then running this through envsubst:
$ env | tail -2 | sed 's_^_$_' | envsubst
/home/linuxbrew/.linuxbrew/share/man:=/home/linuxbrew/.linuxbrew/share/man:
/home/linuxbrew/.linuxbrew/share/info:=/home/linuxbrew/.linuxbrew/share/info:
This might work for you (GNU sed):
sed '/value:/{y/"/\n/;s/^.*/printf "&"/e;y/\n/"/}' file
On any line containing the string value: convert any "'s to newlines, use printf to convert the environmental variables to their real values and reconvert the introduced newlines back to "'s.
N.B. If the environmental variable can contain "'s, these will need to be quoted following the printf command, i.e. insert s/"/\\"/g before the last y command.

Sed | Variable containing regex causes invalid reference error

I'm having problems with sed and the back-referencig when using variables containing regexes.
It is a parser written in bash. At a very earlier point, I want to use sed to clean every line into the needed data: the indentation, a key and a value (colon separated). The data is similar to yaml but using an equals.
A basic example of the data:
overview = peparing 2016-10-22
license= sorted 2015-11-01
The function I'm having problems with does the logic in a while loop:
function prepare_parsing () {
local file=$1
# regex components:
local s='[[:space:]]*' \
w='[a-zA-Z0-9_]*' \
fs=':'
# regexes(NoQuotes, SingleQuotes, DoubleQuotes):
local searchNQ='^('$s')('$w')'$s'='$s'(.*)'$s'$' \
searchSQ='^('$s')('$w')'$s'='$s\''(.*)'\'$s'\$' \
searchDQ='^('$s')('$w')'$s'='$s'"(.*)"'$s'\$' \
replace="\1$fs\2$fs\3"
while IFS="$fs" read -r indentation key value; do
...
SOME CUSTOM LOGIC
...
done < <(sed -n "s/${searchNQ}/${replace}/p" $file)
}
When trying to call the function, I receive the known invalid reference error into \3: invalid reference \3 on s' command's RHS
To debug this, after the vars definition, I've printed their values using the printf and the %q option.
printf "%q\n" $searchNQ $searchSQ $searchDQ $replace
Getting these values:
\^\(\[\[:space:\]\]\*\)\(\[a-zA-Z0-9_\]\*\)\[\[:space:\]\]\*=\[\[:space:\]\]\*\(.\*\)\[\[:space:\]\]\*\$
\^\(\[\[:space:\]\]\*\)\(\[a-zA-Z0-9_\]\*\)\[\[:space:\]\]\*=\[\[:space:\]\]\*\'\(.\*\)\'\[\[:space:\]\]\*\\\$
\^\(\[\[:space:\]\]\*\)\(\[a-zA-Z0-9_\]\*\)\[\[:space:\]\]\*=\[\[:space:\]\]\*\"\(.\*\)\"\[\[:space:\]\]\*\\\$
$'\\1\034\\2\034\\3'
And maybe here's the problem, the excessive escape sequences when the shell (bash) expand the variables (for example, it seems to be escaping the *, the [], ...).
If I pass the -r option to sed, it works perfectly, but I have to avoid this since the system that will execute the script won't have this sed implementation: I have to use basic sed.
Do you have any idea on how to store the regex into variables and make them usable for the backreferencing on the RHS?
It works in these two cases:
When using a plain regex string:
sed -n "s/^\([[:space:]]*\)\([a-zA-Z0-9_]*\)[[:space:]]*=[[:space:]]*\(.*\)[[:space:]]*\$/\1:\2:\3/p" $file
And when I use just the vars s, w and fs:
sed -n "s/^\($s\)\($w\)$s=$s\(.*\)$s\$/\1$fs\2$fs\3/p" $file
Many thanks for the help!
perl that supports extended RegExps may be used instead of sed, like
perl -n -e "s/${searchNQ}/${replace}/; print"

search and replace substring in string in bash

I have the following task:
I have to replace several links, but only the links which ends with .do
Important: the files have also other links within, but they should stay untouched.
<li>Einstellungen verwalten</li>
to
<li>Einstellungen verwalten</li>
So I have to search for links with .do, take the part before and remember it for example as $a , replace the whole link with
<s:url action=' '/>
and past $a between the quotes.
I thought about sed, but sed as I know does only search a whole string and replace it complete.
I also tried bash Parameter Expansions in combination with sed but got severel problems with the quotes and the variables.
cat ./src/main/webapp/include/stoBox2.jsp | grep -e '<a href=".*\.do">' | while read a;
do
b=${a#*href=\"};
c=${b%.do*};
sed -i 's/href=\"$a.do\"/href=\"<s:url action=\'$a\'/>\"/g' ./src/main/webapp/include/stoBox2.jsp;
done;
any ideas ?
Thanks a lot.
sed -i sed 's#href="\(.*\)\.do"#href="<s:url action='"'\1'"'/>"#g' ./src/main/webapp/include/stoBox2.jsp
Use patterns with parentheses to get the link without .do, and here single and double quotes separate the sed command with 3 parts (but in fact join with one command) to escape the quotes in your text.
's#href="\(.*\)\.do"#href="<s:url action='
"'\1'"
'/>"#g'
parameters -i is used for modify your file derectly. If you don't want to do this just remove it. and save results to a tmp file with > tmp.
Try this one:
sed -i "s%\(href=\"\)\([^\"]\+\)\.do%\1<s:url action='\2'/>%g" \
./src/main/webapp/include/stoBox2.jsp;
You can capture patterns with parenthesis (\(,\)) and use it in the replacement pattern.
Here I catch a string without any " but preceding .do (\([^\"]\+\)\.do), and insert it without the .do suffix (\2).
There is a / in the second pattern, so I used %s to delimit expressions instead of traditional /.

how to replace part of a string using sed

echo "/home/repository/tags/1.9.1/1.9.1.8/core" | sed "s/HELP/XXX/g"
I would like some HELP in replacing what is in between tags and core with let's say XXX. So my desired output would be /home/repository/tags/XXX/core.
The string is a directory path, where /home/repository/tags are the only constant parts. The path is always six levels deep. So it may not always be between tags and core.
echo "/home/repository/whatever/1.9.1/1.9.1.8/core/and/more/junk" \
| sed 's#\(/[^/]*/[^/]*/[^/]*\)/[^/]*/[^/]*#\1/XXX#'
yields ...
/home/repository/whatever/XXX/core/and/more/junk
By using repetition quantifiers, you can easily adjust where your replacement is made:
echo "/home/repository/tags/1.9.1/1.9.1.8/core" | \
sed -r 's|(/([^/]+/){3})([^/]+/){2}(.*)|\1XXX/\4|'
3 represents how many components to keep at the beginning
2 represents how many to replace
You could even use variables:
$ dirs='/one/two/three/four/five/six/seven/eight'
$ for keep in {0..3}; do for replace in {0..3}; do echo "$dirs" | \
sed -r "s|(/([^/]+/){$keep})([^/]+/){$replace}(.*)|\1XXX/\4|"; done; done
/XXX/one/two/three/four/five/six/seven/eight
/XXX/two/three/four/five/six/seven/eight
/XXX/three/four/five/six/seven/eight
/XXX/four/five/six/seven/eight
/one/XXX/two/three/four/five/six/seven/eight
/one/XXX/three/four/five/six/seven/eight
/one/XXX/four/five/six/seven/eight
/one/XXX/five/six/seven/eight
/one/two/XXX/three/four/five/six/seven/eight
/one/two/XXX/four/five/six/seven/eight
/one/two/XXX/five/six/seven/eight
/one/two/XXX/six/seven/eight
/one/two/three/XXX/four/five/six/seven/eight
/one/two/three/XXX/five/six/seven/eight
/one/two/three/XXX/six/seven/eight
/one/two/three/XXX/seven/eight
If your directory is always 6 levels deep, this works (remember to escape the round brackets):
echo "/home/repository/tags/1.9.1/1.9.1.8/core" |
sed 's/\(\/home\/repository\/tags\/\).*\/.*\(\/.*\)/\1XXX\2/'
produces:
/home/repository/tags/XXX/core
Here, spare yourself some regex agony:
echo "/home/repository/tags/1.9.1/1.9.1.8/core" | sed 's#/home/repository/tags/.*/\(.\+\)$#/home/repository/tags/XXX/\1#'
No need to explicitly match the components if all you're really trying to do is strip out everything between tags/ and the last component. Note that I used + not *, so the component must be nonempty. That'll guard against having a trailing slash.