Find multi-line text & replace it, using regex, in shell script - regex

I am trying to find a pattern of two consecutive lines, where the first line is a fixed string and the second has a part substring I like to replace.
This is to be done in sh or bash on macOS.
If I had a regex tool at hand that would operate on the entire text, this would be easy for me. However, all I find is bash's simple text replacement - which doesn't work with regex, and sed, which is line oriented.
I suspect that I can use sed in a way where it first finds a matching first line, and only then looks to replace the following line if its pattern also matches, but I cannot figure this out.
Or are there other tools present on macOS that would let me do a regex-based search-and-replace over an entire file or a string? Maybe with Python (v2.7 and v3 is installed)?
Here's a sample text and how I like it modified:
keyA
value:474
keyB
value:474 <-- only this shall be replaced (follows "keyB")
keyC
value:474
keyB
value:474
Now, I want to find all occurances where the first line is "keyB" and the following one is "value:474", and then replace that second line with another value, e.g. "value:888".
As a regex that ignores line separators, I'd write this:
Search: (\bkeyB\n\s*value):474
Replace: $1:888
So, basically, I find the pattern before the 474, and then replace it with the same pattern plus the new number 888, thereby preserving the original indentation (which is variable).

You can use
sed -e '/keyB$/{n' -e 's/\(.*\):[0-9]*/\1:888/' -e '}' file
# Or, to replace the contents of the file inline in FreeBSD sed:
sed -i '' -e '/keyB$/{n' -e 's/\(.*\):[0-9]*/\1:888/' -e '}' file
Details:
/keyB$/ - finds all lines that end with keyB
n - empties the current pattern space and reads the next line into it
s/\(.*\):[0-9]*/\1:888/ - find any text up to the last : + zero or more digits capturing that text into Group 1, and replaces with the contents of the group and :888.
The {...} create a block that is executed only once the /keyB$/ condition is met.
See an online sed demo.

Use a perl one-liner with -0777 to scan over multiple lines:
$ # inline edit:
$ perl -0777 -i -pe 's/\bkeyB\s*value):\d*/$1:888/' file.txt
$ # to stdout:
$ cat file.txt | perl -0777 -pe 's/\bkeyB\s*value):\d*/$1:888/'

In plain bash:
#!/bin/bash
keypattern='^[[:blank:]]*keyB$'
valpattern='(.*):'
replacement=888
while read -r; do
printf '%s\n' "$REPLY"
if [[ $REPLY =~ $keypattern ]]; then
read -r
if [[ $REPLY =~ $valpattern ]]; then
printf '%s%s\n' "${BASH_REMATCH[0]}" "$replacement"
else
printf '%s\n' "$REPLY"
fi
fi
done < file

Related

How can I replace * to #* with bash?

I need to deactivate certain lines in a file that starts with * by putting # at the front of the line.
At first, sed -i 's/*/#*/g' tmp.conf seems to work. But it adds # as many as I run the command.
user#host:/etc/security/limits.d:$ cat tmp.conf
#* soft nproc 4096
root soft nproc unlimited
user#host:/etc/security/limits.d:$ sudo sed -i 's/*/#*/g' tmp.conf
user#host:/etc/security/limits.d:$ cat tmp.conf
##* soft nproc 4096
root soft nproc unlimited
So it has to ignore when the line starts with #, otherwise put # at the front.
I searched to come up with sed -i 's/^(?!#)\*/#*/g' tmp.conf, which doesn't work.
What regex should I use to find *, not #*?
Or is there any other way to do this other than using sed?
Maybe with this?
sed 's/^\*/#&/'
Use this Perl one-liner:
perl -i.bak -pe 's{^[*]}{#*}' test.txt
It will not add extra # characters to lines that already have one. And it can be run multiple times on the file, and it will not add extra # characters.
Example:
$ echo "*1\n#*2\n3" > test.txt
# cat test.txt
#*1
#*2
3
$ perl -i.bak -pe 's{^[*]}{#*}' test.txt
$ cat test.txt
#*1
#*2
3
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. If you want to skip writing a backup file, just use -i and skip the extension.
s{^[*]}{#*} : replace a literal * at the beginning of the line (^) with #*. Note that * has a special meaning (0 or more repetitions of the preceding character) and must be either escaped like so: \* or placed inside a character class like so: [*].
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start

Adding blank line spaces before and after pattern 'string' match

I am trying to add 5 blank line spaces in a text file (text.txt) before and after string pattern matches. I used the following to get spaces after the 'string' match which worked for me-
sed '/string/{G;G;G;G;G;}' text.txt
I want to apply the same sed command to obtain 5 blank lines before the 'string' Here I don't want spaces, but rather blank lines before and after them. Any suggestions?
sed -r 's/(^.*)(string)(.*$)/\1\n\n\n\n\n\2\n\n\n\n\n\3/' text.txt
Use -r or -E to allow regular expressions, split likes into three sections and then substitute the line for the first section, 5 new lines, the second section, 5 new lines and then finally the third section.
Use this Perl one-liner:
perl -pe 's/string/\n\n\n\n\n$&\n\n\n\n\n/' text.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
s/PATTERN/REPLACEMENT/ : change PATTERN to REPLACEMENT.
$& : matched pattern.
\n : newline character.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start
For a single string match:
$ sed -e '/string/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt
For multiple strings, assuming same requirements:
$ sed -E '/(string1|string2|string3)/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt
This might work for you:
sed '/string/{G;s/\(string\)\(.*\)\(.\)/\3\3\3\3\3\1\3\3\3\3\3\2/}' file
Match on string, append an empty line, pattern match using the newline to separate the match by 5 lines either side.
And an awk version:
awk '{if(/string1|string2|.../){printf "\n\n\n\n\n%s\n\n\n\n\n",$0}else{print}}' file

using sed to delete lines containing slashes /

I know in some circumstances, other characters besides / can be used in a sed expression:
sed -e 's.//..g' file replaces // with the empty string in file since we're using . as the separator.
But what if you want to delete lines matching //comment in file?
sed -e './/comment.d' file returns
sed: -e expression #1, char 1: unknown command: `.'
You can use still use alternate delimiter:
sed '\~//~d' file
Just escape the start of delimeter once.
To delete lines with comments, select from these Perl one-liners below. They all use m{} form of regex delimiters instead of the more commonly used //. This way, you do not have to escape slashes like so: \/, which makes a double slash look less readable: /\/\//.
Create an example input file:
echo > in_file \
'no comment
// starts with comment
// starts with whitespace, then has comment
foo // comment is anywhere in the line'
Remove the lines that start start with comment:
perl -ne 'print unless m{^//}' in_file > out_file
Output:
no comment
// starts with whitespace, then has comment
foo // comment is anywhere in the line
Remove the lines that start with optional whitespace, followed by comment:
perl -ne 'print unless m{^\s*//}' in_file > out_file
Output:
no comment
foo // comment is anywhere in the line
Remove the lines that have a comment anywhere:
perl -ne 'print unless m{//}' in_file > out_file
Output:
no comment
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlrequick: Perl regular expressions quick start

Troubles with regular expressions

I wanted some help on extended regular expressions.
I have been trying to figure out but in vain
I have a file conflicts.txt which looks like this please note that it is only a part of this file , there are many lines like these
Server/core/wildSetting.json
Server/core
Client/arcade/src/assets
Client/arcade/src/assets/
Client/arcade/src/assets
Client/arcade/src/Game/
i am writing a shell script which goes thorugh this file line by line :
if [ -s "$CONFLICTS" ] ; then
count=0
while read LINE
do
let count++
echo -e "\n $LINE \n"
done < $CONFLICTS
fi
the above prints the file line by line what i am trying now is to redirect the lines which have a certain text into some other file for that i have modified echo line of the code to :
echo -e "\n $LINE \n" | grep -E "Server/game" > newfile.txt
My Query :
As we can see there are many lines of the form Server/Core...
I want to write a regular expression and use it in grep, which matches two kind of lines
1) line s containing the ONLY the string "Server/core" preceeded and suceeded by any number of spaces
2) all the lines containing the string "assets"
I have written a regular expression for the same but it doesn't work
here my regEx:
grep -E '[^' '*Server/core$] | [assets]'
can you please tell me what is the right way of doing it ?
Please note that there can be any number of spaces before and after "Server/core" as this file is a result of parsing a previous file.
Thanks !
Based on what's asked in the comments:
1) the lines containing the string "assets"
$ grep "assets" file
Client/arcade/src/assets
Client/arcade/src/assets/
Client/arcade/src/assets
2) lines that contain only the sting "Server/core" preceeded and succeed by any amount of space
$ grep "^[ ]*Server/core[ ]*$" file
Server/core
sed (Stream EDitor) can solve your problem perfectly.
Try this command sed -n '/^ *Server\/core\|assets/p' conflicts.txt.
There is something wrong with your grep -E '[^' '*Server/core$] | [assets]'.
The ^ in a squared brackets omits all the strings containing any of the subsequent characters in the brackets.
If you want to perform in-place modification, add the -i option to the sed command like
sed -in '/^ *Server\/core\|assets/p' conflicts.txt
Your regex just needs to be this:
assets|^\s*Server/Core\s*$
I think sed or awk would be a better tool than grep - you would need to escape the forward slash if you used one of these.

Problem with perl multiline matching

I'm trying to use a perl one-liner to update some code that spans multiple lines and am seeing some strange behavior. Here's a simple text file that shows the problem I'm seeing:
ABCD START
STOP EFGH
I expected the following to work but it doesn't end up replacing anything:
perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt
After doing some experimenting I found that the \s+ in the original regex will match the newline but not any of the whitespace on the 2nd line, and adding a second \s+ doesn't work either. So for now I'm doing the following workaround, which is to add an intermediate regex that only removes the newline:
perl -pi -e 's/START\s+/START/s' input.txt
This creates the following intermediate file:
ABCD START STOP EFGH
Then I can run the original regex (although the /s is no longer needed):
perl -pi -e 's/START\s+STOP/REPLACE/s' input.txt
This creates the final, desired file:
ABCD REPLACE EFGH
It seems like the intermediate step should not be necessary. Am I missing something?
You were close. You need either -00 or -0777:
perl -0777 -pi -e 's/START\s+/START/' input.txt
perl -p processes the file one line at a time. The regex you have is correct, but it is never matched against the multi-line string.
A simple strategy, assuming the file will fit in memory, is to read the whole thing (do this without -p):
$/ = undef;
$file = <>;
$file =~ s/START\s+STOP/REPLACE/sg;
print $file;
Note, I have added the /g modifier to specify global replacement.
As a shortcut for all that extra boilerplate, you can use your existing script with the -0777 option: perl -0777pi -e 's/START\s+STOP/REPLACE/sg'. Adding /g is still needed if you may need to make multiple replacements within the file.
A hiccup that you might run into, although not with this regex: if the regex were START.+STOP, and a file contains multiple START/STOP pairs, greedy matching of .+ will eat everything from the first START to the last STOP. You can use non-greedy matching (match as little as possible) with .+?.
If you want to use the ^ and $ anchors for line boundaries anywhere in the string, then you also need the /m regex modifier.
A relatively simple one-liner (reading the file in memory):
perl -pi -e 'BEGIN{undef $/;} s/START\s+STOP/REPLACE/sg;' input.txt
Another alternative (not so simple), not reading the file in memory:
perl -ni -e '$a.=$_; \
if ( $a =~ s/START\s+STOP/REPLACE/s ) { print $a; $a=""; } \
END{$a && print $a}' input.txt
perl -MFile::Slurp -e '$content = read_file(shift); $content =~ s/START\s+STOP/REPLACE/s; print $content' input.txt
Here's a one-liner that doesn't read the entire file into memory at once:
perl -i -ne 'if (($x = $last . $_) =~ s/START\n\s*STOP/REPLACE/) \
{ print $x; $last = ""; } else { print $last; $last = $_; } \
print $last if eof ARGV' input.txt