How to use sed to remove only double empty lines? - regex

I found this question and answer on how to remove triple empty lines. However, I need the same only for double empty lines. Ie. all double blank lines should be deleted completely, but single blank lines should be kept.
I know a bit of sed, but the proposed command for removing triple blank lines is over my head:
sed '1N;N;/^\n\n$/d;P;D'

This would be easier with cat:
cat -s

I've commented the sed command you don't understand:
sed '
## In first line: append second line with a newline character between them.
1N;
## Do the same with third line.
N;
## When found three consecutive blank lines, delete them.
## Here there are two newlines but you have to count one more deleted with last "D" command.
/^\n\n$/d;
## The combo "P+D+N" simulates a FIFO, "P+D" prints and deletes from one side while "N" appends
## a line from the other side.
P;
D
'
Remove 1N because we need only two lines in the 'stack' and it's enought with the second N, and change /^\n\n$/d; to /^\n$/d; to delete all two consecutive blank lines.
A test:
Content of infile:
1
2
3
4
5
6
7
Run the sed command:
sed '
N;
/^\n$/d;
P;
D
' infile
That yields:
1
2
3
4
5
6
7

sed '/^$/{N;/^\n$/d;}'
It will delete only two consecutive blank lines in a file. You can use this expression only in file then only you can fully understand. When a blank line will come that it will enter into braces.
Normally sed will read one line. N will append the second line to pattern space. If that line is empty line. the both lines are separated by newline.
/^\n$/ this pattern will match that time only the d will work. Else d not work. d is used to delete the pattern space whole content then start the next cycle.

This would be easier with awk:
awk -v RS='\n\n\n' 1

BUT the above solution only deletes first search of 3 consecutive blank line.
To delete all, 3 consecutive blank lines use below command
sed '1N;N;/^\n\n$/ { N;s/^\n\n//;N;D; };P;D' filename

As far as I can tell none of the solutions here work. cat -s as suggested by #DerMike isn't POSIX compliant (and it's less convenient if you're already using sed for another transformation), and sed 'N;/^\n$/d;P;D' as suggested by #Birei sometimes deletes more newlines than it should.
Instead, sed ':L;N;s/^\n$//;t L' works. For POSIX compliance use sed -e :L -e N -e 's/^\n$//' -e 't L', since POSIX doesn't specify using ; to separate commands.
Example:
$ S='foo\nbar\n\nbaz\n\n\nqux\n\n\n\nquxx\n';\
> paste <(printf "$S")\
> <(printf "$S" | sed -e 'N;/^\n$/d;P;D')\
> <(printf "$S" | sed -e ':L;N;s/^\n$//;t L')
foo foo foo
bar bar bar
baz baz baz
qux
qux
qux quxx
quxx
quxx
$
Here we can see the original file, #Birei's solution, and my solution side-by-side. #Birei's solution deletes all blank lines separating baz and qux, while my solution removes all but one as intended.
Explanation:
:L Create a new label called L.
N Read the next line into the current pattern space,
separated by an "embedded newline."
s/^\n$// Replace the pattern space with the empty pattern space,
corresponding to a single non-embedded newline in the output,
if the current pattern space only contains a single embedded newline,
indicating that a blank line was read into the pattern space by `N`
after a blank line had already been read from the input.
t L Branch to label L if the previous `s` command successfully
substituted text in the pattern space.
In effect, this deletes one recurrent blank line at a time, reading each into the pattern space as an embedded newline with N and deleting them with s.

BUT the above solution only deletes first search of 3 consecutive blank line. To delete all, 3 consecutive blank lines use below command
sed '1N;N;/^\n\n$/ { N;s/^\n\n//;N;D; };P;D' filename

Just pipe it to 'uniq' command and all empty lines regardless the number of them will be shrank to just one. Simpler is better.
Clarification: As Marlar stated this is not a solution if you have "other non-blank consecutive duplicated lines" that you do not want to get rid of. This is a solution in other cases like when trying to cleanup configuration files which was the solution I was after when I saw this question. I solved my problem indeed just using 'uniq'.

Related

Replace several lines by one using sed

I have an input like this:
This_is(A)
Goto(B,condition_1)
Goto(C,condition_2)
This_is(B)
Goto(A,condition_3)
This_is(C)
Goto(B,condition_1)
I want it to become like this
(A,B,condition_1)
(A,C,condition_2)
(B,A,condition_3)
(C,B,condition_1)
Anyone knows how to do this with sed?
Assuming you don't really need to do this with sed, this will work using any awk in any shell on every UNIX box:
$ awk -F'[()]' '/^[^[:space:]]/{s=$2; next} {sub(/[^[:space:]]*\(/,"("s",")} 1' file
(A,B,condition_1)
(A,C,condition_2)
(B,A,condition_3)
(C,B,condition_1)
This is a possible sed solution, where I have hardcoded a few bits, like This_is and Goto because the OP did not clarify if those strings change along the file in the actual file:
sed '/^This_is/{:a;N;s/\(^This_is(\(.\)).*\)\(\n *\)Goto(\([^)]*)\)$/\1\3(\2,\4/;$!ta;s/[^\n]*\n//}' input_file
(Unfortunately, with all these parenthesis, using the -E does not shorten the command much.)
The code is slightly more readable if split on more lines:
sed '/^This_is/{
:a
N
s/\(^This_is(\(.\)).*\)\(\n *\)Goto(\([^)]*)\)$/\1\3(\2,\4/
$!ta
s/[^\n]*\n//
}' os
Here you can see that the code takes action only on the lines starting with This_is; when the program hits those lines, it does the following.
It uses the N command to append the next line to the pattern space (interspersing \ns),
and it attempts a substitution with s/…/…/, which essentially tries to pick the x in This_is(x) and to put it just after the last Goto( on the multiline,
and it keeps doing this as long as the latter action is successful (ta branches to :a if s was successful) and the last line has not been read ($! matches all line but the last);
Indeed, this is a do-while loop, where :a marks the entry point, where the control jumps back if the while-condition is true, and ta is the command that evaluates the logical condition.
When the above while loop terminates, the shorter s/…/…/ command removes the leading line from the multiline pattern space, which is the This_is line.
This might work for you (GNU sed):
sed -E '/^\S.*\(.*\)/{h;d};G;s/\S+\((.*\))\n.*(\(.*)\).*/\2,\1/;P;d' file
If a line starts with a non-white space character and contains parens, copy it to the hold space (HS) and then delete it.
Otherwise, append the HS, remove non-white characters upto the opening paren, insert the value between parens from the stored value, add a comma and print the first line and then delete the whole of the pattern space.
N.B. Lines that do not meet the substitution criteria will be unchanged.
An alternative solution using GNU parallel and sed:
parallel --pipe --recstart T -kqN1 sed -E '1{h;d};G;s/\S+\((.*)\n.*(\(.*)\).*/\2,\1/;P;d' <file

How to add a new line to the beginning and end of every line matching a pattern?

How can sed be used to add a \n to the beginning and to the end of every line matching the pattern %%###? This is how far I got:
If foo.txt contains
foobar
%%## Foo
%%### Bar
foobar
then sed 's/^%%###/\n&\n/g' foo.txt gives
foobar
%%## Foo
%%###
Bar
foobar
instead of
foobar
%%## Foo
%%### Bar
foobar
Note: This seems related to this post
Update: I'm actually looking for case where lines starting with the pattern are considered only.
With GNU sed:
sed 's/.*%%###.*/\n&\n/' file
Output:
foobar
%%## Foo
%%### Bar
foobar
&: Refer to that portion of the pattern space which matched
If you want to edit your file "in place" use sed's option -i.
It is cumbersome to directly add newlines via sed. But here is one option if you have perl available:
$ foo.txt | perl -pe 's/(.*%%###.*)/\n$1\n/'
Here we capture every matching line, which is defined as any line containing the pattern %%### anywhere, and then we replace with that line surrounded by leading and trailing newlines.
This might work for you (GNU sed):
sed '/%%###/!b;G;i\\' file
For those lines that meet the criteria, append a newline from the hold space (the hold space by default contains a newline) and insert an empty line.
Another way:
sed -e '/%%###/!b;i\\' -e 'a\\' file
This time insert and then append empty lines.
N.B. The i and a must be followed by a newline, this can be achieved by putting them in separate -e invocations.
A third way:
sed '/%%###/!b;G;s/.*\(.\)/\1&/' file
As in the first way, append a newline from the hold space, then copy it i.e. the last character of the amended current line, and prepend it to the current line.
Yet another way:
sed '/%%###/{x;p;x;G}' file
Swap to the hold space, print the newline, swap back and append the newline.
N.B. If the hold space may not be empty (a previous, x,h,H,g or G command may have changed it) the buffer may be cleared before it is printed (p) by using the zap command z.
And of course:
sed '/%%###/s/^\|$/\n/g' file
Personally I'd use a string instead of regexp comparison since there's no repexp characters in the text you want to match and if there were you wouldn't want them treated as such, and just print newlines around the string instead of modifying the string itself:
awk '{print ($1=="^%%###" ? ORS $0 ORS : $0)}' file
The above will work with any awk in every shell on every UNIX box and is easily modified if you don't want multiple blank lines between consecutive %%### lines or don't want blank lines added if the existing surrounding lines are already blank or you need to do anything else.

Bash one liner - Get line group from file if matched by a string

I have a log file like;
A
some lines
some lines
Z
A
some lines
some lines
IMPORTANT text
some lines
Z
A
some lines
more lines
some lines
Z
A
some lines
IMPORTANT text
more lines
some lines
Z
I only need lines between A-Z if it has the IMPORTANT word. So the desired output is;
A
some lines
some lines
IMPORTANT text
some lines
Z
A
some lines
IMPORTANT text
more lines
some lines
Z
The line count between A-Z is variable. I tried too many commands like:
grep 'IMPORTANT' -A 3 -B 3 x.log | sed -n '/^A$/,/^Z$/p'
grep 'IMPORTANT' -A 3 -B 3 x.log | grep -E '^Z$' -B 5 | grep -E '^A$' -A 5
Some printed not needed lines from another group, other printed lines without starting or ending points... And all failed.
Is there any way to do this with a one liner?
Using gnu-awk you can do:
awk 'BEGIN{RS=ORS="\nZ\n"} /^A/ && /IMPORTANT/' file
A
some lines
some lines
IMPORTANT text
some lines
Z
A
some lines
IMPORTANT text
more lines
some lines
Z
BEGIN{RS=ORS="\nZ\n"} sets input ad output record separators as Z with newlines on either side.
/^A/ && /IMPORTANT/ ensures that each record starts with A and has IMPORTANT in it.
each matching is printed as that is default action in awk
With sed:
sed -n '/^A$/{:a;N;/\nZ$/!ba;/IMPORTANT/p}' x.log
Explained:
/^A$/ { # If line matches ^A$...
:a # Label to branch to
N # Append next line to pattern space
/\nZ$/!ba # Branch to :a if pattern space doesn't end with \nZ
/IMPORTANT/p # Print if pattern space contains IMPORTANT
}
This basically appends lines until we have a complete block in the pattern space, then prints it if it matches IMPORTANT and just discards it otherwise.
The -n option prevents output when we reach the end of a cycle.
Some seds don't support oneliners with command grouping ({} with ;) or inline comments. For some seds, having p; instead of p works, and for others, this (basically the above minus comments) should work (and is POSIX compliant):
sed -n '/^A$/{
:a
N
/\nZ$/!ba
/IMPORTANT/p
}' x.log

SED: addressing two lines before match

Print line, which is situated 2 lines before the match(pattern).
I tried next:
sed -n ': loop
/.*/h
:x
{n;n;/cen/p;}
s/./c/p
t x
s/n/c/p
t loop
{g;p;}
' datafile
The script:
sed -n "1N;2N;/XXX[^\n]*$/P;N;D"
works as follows:
Read the first three lines into the pattern space, 1N;2N
Search for the test string XXX anywhere in the last line, and if found print the first line of the pattern space, P
Append the next line input to pattern space, N
Delete first line from pattern space and restart cycle without any new read, D, noting that 1N;2N is no longer applicable
This might work for you (GNU sed):
sed -n ':a;$!{N;s/\n/&/2;Ta};/^PATTERN\'\''/MP;$!D' file
This will print the line 2 lines before the PATTERN throughout the file.
This one with grep, a bit simpler solution and easy to read [However need to use one pipe]:
grep -B2 'pattern' file_name | sed -n '1,2p'
If you can use awk try this:
awk '/pattern/ {print b} {b=a;a=$0}' file
This will print two line before pattern
I've tested your sed command but the result is strange (and obviously wrong), and you didn't give any explanation. You will have to save three lines in a buffer (named hold space), do a pattern search with the newest line and print the oldest one if it matches:
sed -n '
## At the beginning read three lines.
1 { N; N }
## Append them to "hold space". In following iterations it will append
## only one line.
H
## Get content of "hold space" to "pattern space" and check if the
## pattern matches. If so, extract content of first line (until a
## newline) and exit.
g
/^.*\nsix$/ {
s/^\n//
P
q
}
## Remove the old of the three lines saved and append the new one.
s/^\n[^\n]*//
h
' infile
Assuming and input file (infile) with following content:
one
two
three
four
five
six
seven
eight
nine
ten
It will search six and as output will yield:
four
Here are some other variants:
awk '{a[NR]=$0} /pattern/ {f=NR} END {print a[f-2]}' file
This stores all lines in an array a. When pattern is found store line number.
At then end print that line number from the file.
PS may be slow with large files
Here is another one:
awk 'FNR==NR && /pattern/ {f=NR;next} f-2==FNR' file{,}
This reads the file twice (file{,} is the same as file file)
At first round it finds the pattern and store line number in variable f
Then at second round it prints the line two before the value in f

What does this sed expression from todo.sh do?

What does the sed expression: G; s/\n/&&/; /^\([ ~-]*\n\).*\n\1/d; s/\n//; h; P do? Exactly what does it match and how does it match it?
It's from todo.sh. In context:
archive()
{
#defragment blank lines
sed -i.bak -e '/./!d' "$TODO_FILE" ## delete all empty lines
[ $TODOTXT_VERBOSE -gt 0 ] && grep "^x " "$TODO_FILE" ## if verbose mode print completed tasks..
grep "^x " "$TODO_FILE" >> "$DONE_FILE" ## append completed tasks to $DONE_FILE
sed -i.bak '/^x /d' "$TODO_FILE" ## delete completed tasks
cp "$TODO_FILE" "$TMP_FILE"
sed -n 'G; s/\n/&&/; /^\([ ~-]*\n\).*\n\1/d; s/\n//; h; P' "$TMP_FILE" > "$TODO_FILE"
## G; Add a newline
## s/\n/&&/; Substitute newline with && (two newlines?)
## /^\([ ~-]*\n\).*\n\1/d; Delete duplicate lines???
## s/\n// Remove newlines
## h Hold: copy pattern space to buffer
## P Print first line of pattern space
if [ $TODOTXT_VERBOSE -gt 0 ]; then
echo "TODO: $TODO_FILE archived."
fi
}
Ok, you've got some of the story already. Recall that the sed expression is executed for each input line. So the G at the beginning appends the contents of the hold space to the current line (with a newline in between). The contents of the hold space is empty initially but expanded by the h command at the end of each input cycle.
Then s/\n/&&/ duplicates the first newline only, the one between the current line and what was grabbed from the hold space. This is in preparation for the next command. /^\([ -~]*\n\).*\n\1/ indeed matches if the current line is identical to a line in the hold space:
    ^\([ -~]*\n\) matches a line at the beginning of the buffer¹
        Note that this matches only if the line contains only printable ASCII characters.
        If your system supports locales, ^\([[:print:]]*\n\) would be better.
    .*\n matches at least one subsequent line
    \1 matches a line identical to the first line
The extra newline added by the previous s command takes care of the case when the duplicate is the very first line from the hold space. The point of the \n\1 is to “anchor” the duplicate at the beginning of a line, otherwise bar would be considered a duplicate of foobar. If the current line is a duplicate, the d command discards it and execution branches to the next line.
If the current line is not a duplicate, s/\n// discards that extra newline (again, no g modifier, so only the first newline is removed). Then the h command results in the hold space containing what it contained before, with the current line prepended. Finally P prints the current input line.
Ok, now what does the hold space contain? It starts empty, then gets each successive line prepended unless it's a duplicate. So the hold space contains the input lines, in reverse order, minus the duplicates.
¹ Uh, I don't know how you did that, but that should be [ -~], not [ ~-] which wouldn't make any sense.
Here's another way of doing this, if you have a POSIX-conforming set of tools (Single Unix v2 is good enough).
<"$TMP_FILE" \
nl -s: | # add line numbers
sort -t: -k2 -u | # sort, ignoring the line numbers, and remove duplicates
sort -t: -k1 -n | # sort by line number
cut -d: -f2- # cut out the line numbers
Oh, you wanted to do this legibly and concisely? Just use awk.
<"$TMP_FILE" awk '!seen[$0] {++seen[$0]; print}'
If the current line hasn't been seen yet, mark it as seen, and print it.
Note that like the sed method, the awk method essentially stores the whole file in memory. The method above using sort has the advantage that only sort needs to keep more than one line of input at a time, and it's designed for this.
Of course, if you don't care about the order of the lines, it's as simple as sort -u.
After Gilles presented his excellent answer I found Famous Sed One-Liners Explained, which includes this exact sed expression; adding here for reference:
70. Delete duplicate, nonconsecutive lines from a file.
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
This is a very tricky one-liner. It
stores the unique lines in hold buffer
and at each newly read line, tests if
the new line already is in the hold
buffer. If it is, then the new line is
purged. If it's not, then it's saved
in hold buffer for future tests and
printed.
A more detailed description - at each
line this one-liner appends the
contents of hold buffer to pattern
space with "G" command. The appended
string gets separated from the
existing contents of pattern space by
"\n" character. Next, a substitution
is made to that substitutes the "\n"
character with two "\n\n". The
substitute command "s/\n/&&/" does
that. The "&" means the matched
string. As the matched string was
"\n", then "&&" is two copies of it
"\n\n". Next, a test "/^([
-~]\n).\n\1/" is done to see if the contents of group capture group 1 is
repeated. The capture group 1 is all
the characters from space " " to "~"
(which include all printable chars).
The "[ -~]" matches that. Replacing
one "\n" with two was the key idea
here. As "([ -~]\n)" is greedy
(matches as much as possible), the
double newline makes sure that it
matches as little text as possible. If
the test is successful, the current
input line was already seen and "d"
purges the whole pattern space and
starts script execution from the
beginning. If the test was not
successful, the doubled "\n\n" gets
replaced with a single "\n" by
"s/\n//" command. Then "h" copies the
whole string to hold buffer, and "P"
prints the new line.