How to find and replace text within ".." using bash script - regex

I want to replace this line #discovery.seed_hosts: ["host1","host2"] with discovery.seed_hosts: ["${extraNode1}","${extraNode2}","${masterIP}"]. Need to remove the # and replace the host1 and host2 as per the given argument also need to add another value (3rd value) into the array as well.
sudo sed -i "/#discovery.seed_hosts: ["host1","host2"]/s/#discovery.seed_hosts: ["host1","host2"]/discovery.seed_hosts: ["${extraNode1}","${extraNode2}","${masterIP}"]/" check.yml
I tried the above command to do this but it is giving error because of the ["host1","host2"] in the command.
sed: -e expression #1, char 49: Invalid range end - Error received

You need to use the s (substitute) command, and escape ., in addition to [. Like this:
sudo sed -i 's/#\(discovery\.seed_hosts: \["\)host1","host2"]/\1${extraNode1}","${extraNode2}","${masterIP}"]/' check.yml
If you don't scape the ., the sed command will match any character where the . is, like #discovery7seed_hosts: ["host1","host2"].
The sed command is pretty straight forward. I just added parentheses around the part of the match that I wanted to reuse in the substitution which creates a group. The \1 is replace with "group 1", the contents of what's in between the parentheses, which must be escaped too.
EDIT: The ", double quotes, don't need to be escaped because the sed command is in single quotes: 's/.../.../'. Also, the ], closing square bracket, doesn't need to be escaped as long as its corresponding [, opening square bracket, has been escaped. Finally, both parentheses ( and ) need to be escaped to create the group. (END OF EDIT)
Test:
$ cat check.yml
This is a test
Another line
#discovery.seed_hosts: ["host1","host2"]
#discovery7seed_hosts: ["host1","host2"]
OK. Good bye?
$ sed 's/#\(discovery\.seed_hosts: \["\)host1","host2"]/\1${extraNode1}","${extraNode2}","${masterIP}"]/' check.yml
This is a test
Another line
discovery.seed_hosts: ["${extraNode1}","${extraNode2}","${masterIP}"]
#discovery7seed_hosts: ["host1","host2"]
OK. Good bye?
$

You'll need to escape [s and "s with backslashes as:
sudo sed -i "/#discovery.seed_hosts: \[\"host1\",\"host2\"]/s/#discovery.seed_hosts: \[\"host1\",\"host2\"]/discovery.seed_hosts: [\""${extraNode1}\"",\""${extraNode2}\"",\""${masterIP}\""]/" check.yml

Related

git grep <regex containing newline>

I'm trying to grep all line breaks after some binary operators in a project using git bash on a Windows machine.
Tried the following commands which did not work:
$ git grep "[+-*\|%]\ *\n"
fatal: command line, '[+-*\|%]\ *\n': Invalid range end
$ git grep "[+\-*\|%]\ *\n"
fatal: command line, '[+\-*\|%]\ *\n': Invalid range end
OK, I don't know how to include "-" in a character set, but still after removing it the \n matches the character n literally:
$ git grep "[+*%] *\n"
somefile.py: self[:] = '|' + name + '='
^^^
Escaping the backslash once (\\n) has no effect, and escaping it twice (\\\n) causes the regex to match \n (literally).
What is the correct way to grep here?
I don't know how to include "-" in a character set
There is no need to escape the dash character (-) if you want to include it in a character set. If you put it the first or the last character in set it doesn't have its special meaning.
Also, there is no need to escape | inside a character range. Apart from ^ (when it's the first character in the range), - (when it is not the first or the last character in the range), ] and \ (when it is used to escape ]), all other characters have their literal meaning (i.e no special meaning) in a character range.
There is also no need to put \n in the regexp. The grepping tools, by default, try to match the regexp against one row at a time and git grep does the same. If you need to match the regexp only at the end of line then put $ (the end of line anchor) as the last character of the regexp.
Your regexp should be [-+*|%] *$.
Put together, the complete command line is:
git grep '[-+*|%] *$'
How to find a newline in the middle of a line
For lack of better option I think I'll start with:
sudo apt install pcregrep
git grep --cached -Il '' | xargs pcregrep -Mb 'y\nl'
this combines:
How to list all text (non-binary) files in a git repository?
https://unix.stackexchange.com/questions/112132/how-can-i-grep-patterns-across-multiple-lines/112134#112134
The output clearly shows the filename and line number, e.g.:
myfile.txt:123:my
love
myfile.txt:234:my
life
otherfile.txt:11:my
lion
Tested on Ubuntu 22.04.

White spaces in sed search string

I want to substitute a String from a file which is:
# - "server1"
My first attempt was something like this:
sed -i 's/#\ -\ "\server1"\.*/ChangedWord/g' file
But I get an error if I try it like this.
So there is to be another way to handle whitespaces, I guess I have to use \s or [[:space:]]. But for some how I am not able to make it work.
I think you are complicating the expression too much. This should be enough:
sed 's/^#[[:space:]]*-[[:space:]]*"server1".*/ChangedWord/' file
It looks for those lines starting with # followed by 0 to n spaces, then "server1" and then anything. In such case, it replaces the line with ChangedWord.
Note I am using [[:space:]] to match the spaces, since it is a more compatible way (thanks Tom Fenech in comments).
Note also there is no need to use g in the sed expression, because the pattern can occur just once per line.
Test
$ cat a
hello
# - "server1"
hello# - "server1"
$ sed 's/^#[[:space:]]*-[[:space:]]*"server1".*/ChangedWord/' a
hello
ChangedWord
hello# - "server1"
The actual fault was the missing escaping from the double quotes:
ssh -i file root#IP sed 's/^#[[:space:]]*-[[:space:]]*\"server1\".*/ChangedWord/' file
That did it for me. Thanks for all your support
rghome is right, you don't need those backslashes in front of spaces as the expression is wrapped in quotes. In fact, they're causing the error: sed is telling you that \<Space> is not a valid option. Just remove them and it should work as expected:
sed -i 's/# - "server1"/ChangedWord/' file

Egrep expression: how to unescape single quotes when reading from file?

I need to use egrep to obtain an entry in an index file.
In order to find the entry, I use the following command:
egrep "^$var_name" index
$var_name is the variable read from a var list file:
while read var_name; do
egrep "^$var_name" index
done < list
One of the possible keys comes usually in this format:
$ERROR['SOME_VAR']
My index file is in the form:
$ERROR['SOME_VAR'] --> n
Where n is the line where the variable is found.
The problem is that $var_name is automatically escaped when read. When I enable the debug mode, I get the following command being executed:
+ egrep '^$ERRORS['\''SELECT_COUNTRY'\'']' index
The command above doesn't work, because egrep will try to interpret the pattern.
If I don't use the extended version, using grep or fgrep, the command will work only if I remove the ^ anchor:
grep -F "$var_name" index # this actually works
The problem is that I need to ensure that the match is made at the beginning of the line.
Ideas?
set -x shows the command being executed in shell notation.
The backslashes you see do not become part of the argument, they're just printed by set -x to show the executed command in a copypastable format.
Your problem is not too much escaping, but too little: $ in regex means "end of line", so ^$ERROR will never match anything. Similarly, [ ] is a character range, and will not match literal square brackets.
The correct regex to match your pattern would be ^\$ERROR\['SOME VAR'], equivalent to the shell argument in egrep "^\\\$ERROR\['SOME_VAR']".
Your options to fix this are:
If you expect to be able to use regex in your input file, you need to include regex escapes like above, so that your patterns are valid.
If you expect to be able to use arbitrary, literal strings, use a tool that can match flexibly and literally. This requires jumping through some hoops, since UNIX tools for legacy reasons are very sloppy.
Here's one with awk:
while IFS= read -r line
do
export line
gawk 'BEGIN{var=ENVIRON["line"];} substr($0, 0, length(var)) == var' index
done < list
It passes the string in through the environment (because -v is sloppy) and then matches literally against the string from the start of the input.
Here's an example invocation:
$ cat script
while IFS= read -r line
do
export line
gawk 'BEGIN{var=ENVIRON["line"];} substr($0, 0, length(var)) == var' index
done < list
$ cat list
$ERRORS['SOME_VAR']
\E and \Q
'"'%##%*'
$ cat index
hello world
$ERRORS['SOME_VAR'] = 'foo';
\E and \Q are valid strings
'"'%##%*' too
etc
$ bash script
$ERRORS['SOME_VAR'] = 'foo';
\E and \Q are valid strings
'"'%##%*' too
You can use printf "%q":
while read -r var_name; do
egrep "^$(printf "%q\n" "$var_name")" index
done < list
Update: You can also do:
while read -r var_name; do
egrep "^\Q$var_name\E" index
done < list
Here \Q and \E are used to make string in between a literal string removing all special meaning of regex symbols.

Sed with both " and ' in insert string

I am using sed command in Ubuntu for making shell script.
I have a problem because the string I am inserting has both single and double quotes. Dashes also. This is the expample:
sed -i "16i$('#myTable td:contains("Trunk do SW-BG-26,
GigabitEthernet0/22")').parents("tr").remove();" proba.txt
It should insert
$('#myTable td:contains("Trunk do SW-BG-26, GigabitEthernet0/22")').parents("tr").remove();
in line 16 of the file proba.txt
but instead it inserts
$('#myTable td:contains(
because it exits prematurely . How can resolve this, I cannot find solution here on site bcause I have both quotation signs and there are explanations only for one kind.
2nd try
I set \ in front every double quote except the outermost ones but I still didn't get what I want. Result is:
.parents("tr").remove();
Then I put \ in front of every ' too but the result was an error in script. This is the 4th row:
sed -i "16i$(\'#myTable td:contains(\"QinQ tunnel - SCnet wireless\")\').parents(\"tr\").remove();" proba.txt
This is the error:
4: skripta.sh: Syntax error: "(" unexpected (expecting ")")
Maybe there is easier way to insert line into the file at the exact line if that line has ", ', /?
3rd time is a charm
Inserting many lines last day I came across another problem using sed. I want to insert this text:
$(document).ready( function() {
with command:
sed -i "16i$(document).ready( function() {" proba.txt
and I get as result this text inserted as document is something special or because of the $:
.ready( function() {
Any thoughts about that?
There are two ways around this. The easy way out is to put the script into a file and use that on the command line. For example, sed.script contains:
16i\
$('#myTable td:contains("Trunk do SW-BG-26, GigabitEthernet0/22")').parents("tr").remove();
and you run:
sed -f sed.script ...
If you want to do it without the file, then you have to decide whether to use single quotes or double quotes around your sed -e expression. Using single quotes is usually easier; there are no other special characters to worry about. Each embedded single quote is replaced by '\'':
sed -e '16i\
$('\''#myTable td:contains("Trunk do SW-BG-26, GigabitEthernet0/22")'\'').parents("tr").remove();' ...
If you want to use double quotes, then each embedded double quote needs to be replaced by \", but you also have to escape embedded back quotes `, dollar signs $ and backslashes \:
sed -e "16i\\
\$('#myTable td:contains(\"Trunk do SW-BG-26, GigabitEthernet0/22\")').parents(\"tr\").remove();" ...
(To the point: I forgot to escape the $ before I checked the script with double quotes; I got the script with single quotes right first time.)
Because of all the extra checking, I almost invariably use single quotes, unless I need to get shell variables substituted into the script.
sed -i "6 i\\
\$('#myTable td:contains(\"Trunk do SW-BG-26, GigabitEthernet0/22\")').parents(\"tr\").remove();" proba.txt
escape the double quote, the slash and new line needed after the i instruction and the $ due to double quote shell interpretation

unix sed command regular expression

Can anyone explain me how the regular expression works in the sed substitute command.
$ cat path.txt
/usr/kbos/bin:/usr/local/bin:/usr/jbin:/usr/bin:/usr/sas/bin
/usr/local/sbin:/sbin:/bin/:/usr/sbin:/usr/bin:/opt/omni/bin:
/opt/omni/lbin:/opt/omni/sbin:/root/bin
$ sed 's/\(\/[^:]*\).**/\1/g' path.txt
/usr/kbos/bin
/usr/local/sbin
/opt/omni/lbin
From the above sed command they used back reference and save operator concept.
Can anyone explain me how the regular expression especially /[^:]* work in the substitute command to get only the first path in each line.
I think you wrote an extra asterisk * in your sed code, so it should be like this:
$ sed 's/\(\/[^:]*\).*/\1/g' file
/usr/kbos/bin
/usr/local/sbin
/opt/omni/lbin
To change the delimiter will help to understand it a little bit better:
sed 's#\(/[^:]*\).*#\1#g'
The s#something#otherthing#g is a basic sed command that looks for something and changes it for otherthing all over the file.
If you do s#(something)#\1#g then you "save" that something and then you can print it back with \1.
Hence, what it is doing is to get a pattern like /[^:]* and then print is back. /[^:]* means / and then every char except :. So it will get / + all the string until it finds a semicolon :. It will store that piece of the string and then print it back.
Small examples:
# get every char
$ echo "hello123bye" | sed 's#\([a-z]*\).*#\1#g'
hello
# get everything until it finds the number 3
$ echo "hello123bye" | sed 's#\([^3]*\).*#\1#g'
hello12
[^:]*
in regex would match all characters except for :, so it would match until this:
/usr/kbos/bin
also it would match these,
/usr/local/bin
/usr/jbin
/usr/bin
/usr/sas/bin
As, these all contains characters, that are not :
.* match any character, zero or more times.
Thus, this regex [^:]*.*, would match all this expressions:
/usr/kbos/bin:/usr/local/bin:/usr/jbin:/usr/bin:/usr/sas/bin
/usr/local/bin:/usr/jbin:/usr/bin:/usr/sas/bin
/usr/jbin:/usr/bin:/usr/sas/bin
/usr/bin:/usr/sas/bin
However, you get only the first field (ie,/usr/kbos/bin, by using back reference in sed), because, regular expression output the longest possible match found.