Adding blank line spaces before and after pattern 'string' match - regex

I am trying to add 5 blank line spaces in a text file (text.txt) before and after string pattern matches. I used the following to get spaces after the 'string' match which worked for me-
sed '/string/{G;G;G;G;G;}' text.txt
I want to apply the same sed command to obtain 5 blank lines before the 'string' Here I don't want spaces, but rather blank lines before and after them. Any suggestions?

sed -r 's/(^.*)(string)(.*$)/\1\n\n\n\n\n\2\n\n\n\n\n\3/' text.txt
Use -r or -E to allow regular expressions, split likes into three sections and then substitute the line for the first section, 5 new lines, the second section, 5 new lines and then finally the third section.

Use this Perl one-liner:
perl -pe 's/string/\n\n\n\n\n$&\n\n\n\n\n/' text.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
s/PATTERN/REPLACEMENT/ : change PATTERN to REPLACEMENT.
$& : matched pattern.
\n : newline character.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start

For a single string match:
$ sed -e '/string/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt
For multiple strings, assuming same requirements:
$ sed -E '/(string1|string2|string3)/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt

This might work for you:
sed '/string/{G;s/\(string\)\(.*\)\(.\)/\3\3\3\3\3\1\3\3\3\3\3\2/}' file
Match on string, append an empty line, pattern match using the newline to separate the match by 5 lines either side.

And an awk version:
awk '{if(/string1|string2|.../){printf "\n\n\n\n\n%s\n\n\n\n\n",$0}else{print}}' file

Related

Find multi-line text & replace it, using regex, in shell script

I am trying to find a pattern of two consecutive lines, where the first line is a fixed string and the second has a part substring I like to replace.
This is to be done in sh or bash on macOS.
If I had a regex tool at hand that would operate on the entire text, this would be easy for me. However, all I find is bash's simple text replacement - which doesn't work with regex, and sed, which is line oriented.
I suspect that I can use sed in a way where it first finds a matching first line, and only then looks to replace the following line if its pattern also matches, but I cannot figure this out.
Or are there other tools present on macOS that would let me do a regex-based search-and-replace over an entire file or a string? Maybe with Python (v2.7 and v3 is installed)?
Here's a sample text and how I like it modified:
keyA
value:474
keyB
value:474 <-- only this shall be replaced (follows "keyB")
keyC
value:474
keyB
value:474
Now, I want to find all occurances where the first line is "keyB" and the following one is "value:474", and then replace that second line with another value, e.g. "value:888".
As a regex that ignores line separators, I'd write this:
Search: (\bkeyB\n\s*value):474
Replace: $1:888
So, basically, I find the pattern before the 474, and then replace it with the same pattern plus the new number 888, thereby preserving the original indentation (which is variable).
You can use
sed -e '/keyB$/{n' -e 's/\(.*\):[0-9]*/\1:888/' -e '}' file
# Or, to replace the contents of the file inline in FreeBSD sed:
sed -i '' -e '/keyB$/{n' -e 's/\(.*\):[0-9]*/\1:888/' -e '}' file
Details:
/keyB$/ - finds all lines that end with keyB
n - empties the current pattern space and reads the next line into it
s/\(.*\):[0-9]*/\1:888/ - find any text up to the last : + zero or more digits capturing that text into Group 1, and replaces with the contents of the group and :888.
The {...} create a block that is executed only once the /keyB$/ condition is met.
See an online sed demo.
Use a perl one-liner with -0777 to scan over multiple lines:
$ # inline edit:
$ perl -0777 -i -pe 's/\bkeyB\s*value):\d*/$1:888/' file.txt
$ # to stdout:
$ cat file.txt | perl -0777 -pe 's/\bkeyB\s*value):\d*/$1:888/'
In plain bash:
#!/bin/bash
keypattern='^[[:blank:]]*keyB$'
valpattern='(.*):'
replacement=888
while read -r; do
printf '%s\n' "$REPLY"
if [[ $REPLY =~ $keypattern ]]; then
read -r
if [[ $REPLY =~ $valpattern ]]; then
printf '%s%s\n' "${BASH_REMATCH[0]}" "$replacement"
else
printf '%s\n' "$REPLY"
fi
fi
done < file

Print the line matching 'pattern' string, excluding the 'pattern'

I have the following lines in a text file 'file.txt'
String1 ABCDEFGHIJKL
String2 DCEGIJKLQMAB
I want to print the characters corresponding to 'String1' in another text file 'text.txt' like this
ABCDEFGHIJKL
Here, I don't want to use any line numbers. Any suggestions using 'sed' command?. I tried with between 'string 1' and 'string 2', but couldn't obtain command excluding 'string1'. This following code for excluding only 'string2'.
sed -n '/^string1/,/^string2/{p;/^string2/q}' file.txt | sed '$d' > text.txt
awk '$1=="String1" { print $2 }' file.txt > text.txt
Where the first space delimited field equals "String1", print the second field. Redirect the output to text.txt.
Use GNU grep:
grep -Po 'String1\s+\K.*' in_file
Here, grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
\K : Cause the regex engine to "keep" everything it had matched prior to the \K and not include it in the match. Specifically, ignore the preceding part of the regex when printing the match.
SEE ALSO:
grep manual
perlre - Perl regular expressions

using sed to delete lines containing slashes /

I know in some circumstances, other characters besides / can be used in a sed expression:
sed -e 's.//..g' file replaces // with the empty string in file since we're using . as the separator.
But what if you want to delete lines matching //comment in file?
sed -e './/comment.d' file returns
sed: -e expression #1, char 1: unknown command: `.'
You can use still use alternate delimiter:
sed '\~//~d' file
Just escape the start of delimeter once.
To delete lines with comments, select from these Perl one-liners below. They all use m{} form of regex delimiters instead of the more commonly used //. This way, you do not have to escape slashes like so: \/, which makes a double slash look less readable: /\/\//.
Create an example input file:
echo > in_file \
'no comment
// starts with comment
// starts with whitespace, then has comment
foo // comment is anywhere in the line'
Remove the lines that start start with comment:
perl -ne 'print unless m{^//}' in_file > out_file
Output:
no comment
// starts with whitespace, then has comment
foo // comment is anywhere in the line
Remove the lines that start with optional whitespace, followed by comment:
perl -ne 'print unless m{^\s*//}' in_file > out_file
Output:
no comment
foo // comment is anywhere in the line
Remove the lines that have a comment anywhere:
perl -ne 'print unless m{//}' in_file > out_file
Output:
no comment
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlrequick: Perl regular expressions quick start

How can I use regex to exclude lines with extra characters?

I have a bunch of email addresses:
abc#google.com
bdc#yahoo.com
\\ske#google.com
I'd like to delete the bolded line because there is extra character in the string other than # . and letters. How do I do this ?
Through awk,
$ awk '/^\w+#\w+/{print}' file
abc#google.com
bdc#yahoo.com
Awk searches for the lines which starts with one or more word character followed by an # symbol and again followed by one or more word characters. If it founds any, then prints the whole line.
This line \\ske#google.com wouldn't starts with a word character, so it not get printed.
You can use this sed:
sed -i.bak -n '/^[[:alnum:]]*#/p' file
You can use vim to take care of it too:
vim -c 'v/^[[:alnum:]]*#/d' -c 'wq' file
You could also use a perl module:
perl -ne 'use Email::Valid; print if Email::Valid->address($_)'

How do I replace multiple newlines with a single one with Perl's Regular Expressions?

I've got a document containing empty lines (\n\n). They can be removed with sed:
echo $'a\n\nb'|sed -e '/^$/d'
But how do I do that with an ordinary regular expression in perl? Anything like the following just shows no result at all.
echo $'a\n\nb'|perl -p -e 's/\n\n/\n/s'
You need to use s/^\n\z//. Input is read by line so you will never get more than one newline. Instead, eliminate lines that do not contain any other characters. You should invoke perl using
perl -ne 's/^\n\z//; print'
No need for the /s switch.
The narrower problem of not printing blank lines is more straightforward:
$(input) | perl -ne 'print if /\S/'
will output all lines except the ones that only contain whitespace.
The input is three separate lines, and perl with the -p option only processes one line at time.
The workaround is to tell perl to slurp in multiple lines of input at once. One way to do it is:
echo $'a\n\nb' | perl -pe 'BEGIN{$/=undef}; s/\n\n/\n/'
Here $/ is the record separator variable, which tells perl how to parse an input stream into lines.