bash sed to remove all whitespace and blank lines from end of file - regex

I found this to remove whitespace from the end of a script How to remove trailing whitespaces with sed? but it doesn't quite do what I was hoping. What I would like to do when I think of remove all white space is to remove also any empty lines - I think that this sed just removes spaces and tabs, but can it be expanded to also trim out any empty lines from the end of the file? Maybe it's not possible to do this with one line, and maybe there are better ways to achieve this, any options are great.
Also, am I right in thinking that this should replace the file in place with the changes? I'm just not sure that's happening in my testing.
sed -i 's/[ \t]*$//' ~/.bashrc
# -i is in place, [ \t] applies to any number of spaces and tabs before the end of the file "*$"

To remove all whitespace at the end of the file:
perl -0777 -pe 's{\s+\z}{}m' foo > bar
To change the file in-place:
perl -i.bak -0777 -pe 's{\s+\z}{}m' foo
To replace all whitespace at the end of the file with a single newline:
perl -0777 -pe 's{\s+\z}{\n}m' foo > bar
To change the file in-place:
perl -i.bak -0777 -pe 's{\s+\z}{\n}m' foo
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.
-0777 : Slurp files whole.
\s+\z : one or more whitespace characters (including newline) at the end of the string (which happens to be the entire file).
The regex uses this modifier:
/m : Allow multiline matches.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)

This might work for you (GNU sed):
sed ':a;/\S/!{$d;N;ba}' file
Append empty lines to the previous line.
If the empty line is the last, delete the current pattern space.
Otherwise print the pattern space.
To remove spaces from the end of all lines too:
sed ':a;/\S/!{$d;N;ba};s/ *$//mg' file
or:
sed 'H;$!d;x;s/.//;s/ *$//mg;s/\n*$//' file

Related

How can I replace * to #* with bash?

I need to deactivate certain lines in a file that starts with * by putting # at the front of the line.
At first, sed -i 's/*/#*/g' tmp.conf seems to work. But it adds # as many as I run the command.
user#host:/etc/security/limits.d:$ cat tmp.conf
#* soft nproc 4096
root soft nproc unlimited
user#host:/etc/security/limits.d:$ sudo sed -i 's/*/#*/g' tmp.conf
user#host:/etc/security/limits.d:$ cat tmp.conf
##* soft nproc 4096
root soft nproc unlimited
So it has to ignore when the line starts with #, otherwise put # at the front.
I searched to come up with sed -i 's/^(?!#)\*/#*/g' tmp.conf, which doesn't work.
What regex should I use to find *, not #*?
Or is there any other way to do this other than using sed?
Maybe with this?
sed 's/^\*/#&/'
Use this Perl one-liner:
perl -i.bak -pe 's{^[*]}{#*}' test.txt
It will not add extra # characters to lines that already have one. And it can be run multiple times on the file, and it will not add extra # characters.
Example:
$ echo "*1\n#*2\n3" > test.txt
# cat test.txt
#*1
#*2
3
$ perl -i.bak -pe 's{^[*]}{#*}' test.txt
$ cat test.txt
#*1
#*2
3
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak. If you want to skip writing a backup file, just use -i and skip the extension.
s{^[*]}{#*} : replace a literal * at the beginning of the line (^) with #*. Note that * has a special meaning (0 or more repetitions of the preceding character) and must be either escaped like so: \* or placed inside a character class like so: [*].
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlre: Perl regular expressions (regexes): Quantifiers; Character Classes and other Special Escapes; Assertions; Capture groups
perldoc perlrequick: Perl regular expressions quick start

Adding blank line spaces before and after pattern 'string' match

I am trying to add 5 blank line spaces in a text file (text.txt) before and after string pattern matches. I used the following to get spaces after the 'string' match which worked for me-
sed '/string/{G;G;G;G;G;}' text.txt
I want to apply the same sed command to obtain 5 blank lines before the 'string' Here I don't want spaces, but rather blank lines before and after them. Any suggestions?
sed -r 's/(^.*)(string)(.*$)/\1\n\n\n\n\n\2\n\n\n\n\n\3/' text.txt
Use -r or -E to allow regular expressions, split likes into three sections and then substitute the line for the first section, 5 new lines, the second section, 5 new lines and then finally the third section.
Use this Perl one-liner:
perl -pe 's/string/\n\n\n\n\n$&\n\n\n\n\n/' text.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
s/PATTERN/REPLACEMENT/ : change PATTERN to REPLACEMENT.
$& : matched pattern.
\n : newline character.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlrequick: Perl regular expressions quick start
For a single string match:
$ sed -e '/string/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt
For multiple strings, assuming same requirements:
$ sed -E '/(string1|string2|string3)/{ s/^/\n\n\n\n\n/; s/$/\n\n\n\n\n/ }' text.txt
This might work for you:
sed '/string/{G;s/\(string\)\(.*\)\(.\)/\3\3\3\3\3\1\3\3\3\3\3\2/}' file
Match on string, append an empty line, pattern match using the newline to separate the match by 5 lines either side.
And an awk version:
awk '{if(/string1|string2|.../){printf "\n\n\n\n\n%s\n\n\n\n\n",$0}else{print}}' file

using sed to delete lines containing slashes /

I know in some circumstances, other characters besides / can be used in a sed expression:
sed -e 's.//..g' file replaces // with the empty string in file since we're using . as the separator.
But what if you want to delete lines matching //comment in file?
sed -e './/comment.d' file returns
sed: -e expression #1, char 1: unknown command: `.'
You can use still use alternate delimiter:
sed '\~//~d' file
Just escape the start of delimeter once.
To delete lines with comments, select from these Perl one-liners below. They all use m{} form of regex delimiters instead of the more commonly used //. This way, you do not have to escape slashes like so: \/, which makes a double slash look less readable: /\/\//.
Create an example input file:
echo > in_file \
'no comment
// starts with comment
// starts with whitespace, then has comment
foo // comment is anywhere in the line'
Remove the lines that start start with comment:
perl -ne 'print unless m{^//}' in_file > out_file
Output:
no comment
// starts with whitespace, then has comment
foo // comment is anywhere in the line
Remove the lines that start with optional whitespace, followed by comment:
perl -ne 'print unless m{^\s*//}' in_file > out_file
Output:
no comment
foo // comment is anywhere in the line
Remove the lines that have a comment anywhere:
perl -ne 'print unless m{//}' in_file > out_file
Output:
no comment
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlre: Perl regular expressions (regexes)
perldoc perlrequick: Perl regular expressions quick start

Using Sed to delete lines which contain non alphabets

The following Regex works as expected in Notepad++:
^.*[^a-z\r\n].*$
However, when I try to use it with sed, it wont work.
sed -r 's/\(^.*[^a-z\r\n].*$\)//g' wordlist.txt
You could use:
sed -i '/[^a-z]/d' wordlist.txt
This will delete each line that has a non-alphabet character (no need to specify linefeeds)
EDIT:
You regex doesn't work because you are trying to match
( bracket
^ beginning of line
...
$ end of line
) bracket
As you won't have a bracket and then the beginning of the line, your regex simply doesn't match anything.
Note, also an expression of
s/\(^.*[^a-z\r\n].*$\)//g'
wouldn't delete a line but replace it with a blank line
EDIT2:
Note, in sed using the -r flag changes the behaviour of \( and \) without the -r flag they are group indicators, but with the -r flag they're just brackets...
Two things:
Sed is a stream editor. It processes one line of the input at a time. That means the search and replace commands, etc, can only see the current line. By contrast, Notepad++ has the whole file in memory and so its search expressions can span two or more lines.
Your command sed -r 's/\(^.*[^a-z\r\n].*$\)//g' wordlist.txt includes \( and \). These mean real (ie non-escaped) round brackets. So the command says find a line that starts with a ( and ends with a ) with some other characters between and replace it with nothing. Rewriting the command as sed -r 's/^.*[^a-z\r\n].*$//g' wordlist.txt should have the desired effect. You could also remove the \r\n to give sed -r 's/^.*[^a-z].*$//g' wordlist.txt. But neither of these will be exactly the same as the Notepad++ command as they will leave empty lines. So you may find the command sed -r '/^.*[^a-z].*$/d' wordlist.txt is closer to what you really want.

how to trim trailing spaces after all delimiter in a text file

Need help to remove trailing spaces after all delimiter in a text file
I have Text file with below data.
eg.
ADDRESS_ID| COUNTRY_TP_CD| RESIDENCE_TP_CD| PROV_STATE_TP_CD|ADDR_LINE_ONE|P_ADDR_LINE_ONE
885637959852960985.0| 76.0|||169 Park lane||Scottish||lane||KU|||||||2013-09-19 14:48:49.609000|
I want to remove spaces after the delimiter and the first letter of the word.
Any regex or unix script that can do the same. Looking for output as below:
ADDRESS_ID|COUNTRY_TP_CD|RESIDENCE_TP_CD|PROV_STATE_TP_CD|ADDR_LINE_ONE|P_ADDR_LINE_ONE
885637959852960985.0|76.0|||169 Park lane||Scottish||lane||KU||||||2013-09-19 14:48:49.609000|
Any help will be appreciated.
awk 'BEGIN{FS=OFS="|"} {for (i=1;i<=NF;i++) gsub(/^[[:space:]]+|[[:space:]]+$/,"",$i)} 1' file
Using a perl one-liner to remove the spacing around every field. Assumes no embedded delimiters:
perl -i -lpe 's/\s*([^|]*?)\s*/$1/g' file.txt
Switches:
-i: Edit <> files in place (makes backup if extension supplied)
-l: Enable line ending processing
-p: Creates a while(<>){...; print} loop for each “line” in your input file.
-e: Tells perl to execute the code on command line.
The below perl code would remove the spaces which are present at the start of a line or the spaces after to the delimiter | ,
$ perl -pe 's/(?<=\|) +|^ +//g' file
ADDRESS_ID|COUNTRY_TP_CD|RESIDENCE_TP_CD|PROV_STATE_TP_CD|ADDR_LINE_ONE|P_ADDR_LINE_ONE
885637959852960985.0|76.0|||169 Park lane||Scottish||lane||KU|||||||2013-09-19 14:48:49.609000|
To save the changes made to that file,
perl -i -pe 's/(?<=\|) +|^ +//g' file
sed 's/\ //g' input.txt > output.txt
With sed:
sed -r -e 's/(^|\|)\s+/\1/g' -e 's/\s+$//' filename
In the first expression:
(^|\|) matches the beginning of the line or a | character, and saves this in capture group 1.
\s+ matches a sequence of whitespace characters after that.
The replacement \1 substitutes capture group 1, so this deletes the whitespace at the beginning of the line and after the delimiter.
The g modifier makes it operate on all the matches in the line.
In the second expression:
\s+ again matches a sequence of whitespace
$ matches the end of the line
The replacement replaces the whole thing with an empty string, this removing trailing spaces.
for posix sed (for GNU sed add --posix)
sed 's/^[[:space:]]//;s/|[[:space:]]/|/g' YourFile
use 2 substitution (there are no OR (|) in sed regex posix version)
Remove starting space by replacing space at start( ^[[:space:]]*) by nothing
Replace any sequence pipe than any space (|[[:space:]]*) by pipe
[[:space:]] could be replace by a single space char if text only have space (ASCII 32) char