Bash one liner - Get line group from file if matched by a string - regex

I have a log file like;
A
some lines
some lines
Z
A
some lines
some lines
IMPORTANT text
some lines
Z
A
some lines
more lines
some lines
Z
A
some lines
IMPORTANT text
more lines
some lines
Z
I only need lines between A-Z if it has the IMPORTANT word. So the desired output is;
A
some lines
some lines
IMPORTANT text
some lines
Z
A
some lines
IMPORTANT text
more lines
some lines
Z
The line count between A-Z is variable. I tried too many commands like:
grep 'IMPORTANT' -A 3 -B 3 x.log | sed -n '/^A$/,/^Z$/p'
grep 'IMPORTANT' -A 3 -B 3 x.log | grep -E '^Z$' -B 5 | grep -E '^A$' -A 5
Some printed not needed lines from another group, other printed lines without starting or ending points... And all failed.
Is there any way to do this with a one liner?

Using gnu-awk you can do:
awk 'BEGIN{RS=ORS="\nZ\n"} /^A/ && /IMPORTANT/' file
A
some lines
some lines
IMPORTANT text
some lines
Z
A
some lines
IMPORTANT text
more lines
some lines
Z
BEGIN{RS=ORS="\nZ\n"} sets input ad output record separators as Z with newlines on either side.
/^A/ && /IMPORTANT/ ensures that each record starts with A and has IMPORTANT in it.
each matching is printed as that is default action in awk

With sed:
sed -n '/^A$/{:a;N;/\nZ$/!ba;/IMPORTANT/p}' x.log
Explained:
/^A$/ { # If line matches ^A$...
:a # Label to branch to
N # Append next line to pattern space
/\nZ$/!ba # Branch to :a if pattern space doesn't end with \nZ
/IMPORTANT/p # Print if pattern space contains IMPORTANT
}
This basically appends lines until we have a complete block in the pattern space, then prints it if it matches IMPORTANT and just discards it otherwise.
The -n option prevents output when we reach the end of a cycle.
Some seds don't support oneliners with command grouping ({} with ;) or inline comments. For some seds, having p; instead of p works, and for others, this (basically the above minus comments) should work (and is POSIX compliant):
sed -n '/^A$/{
:a
N
/\nZ$/!ba
/IMPORTANT/p
}' x.log

Related

Can not replace multiple empty lines with one

Why does the following not replace multiple empty lines with one?
$ cat some_random_text.txt
foo
bar
test
and this does not work:
$ cat some_random_text.txt | perl -pe "s/\n+/\n/g"
foo
bar
test
I am trying to replace the multiple new lines (i.e. empty lines) to a single empty new line but the regex I use for that does not work as you can see in the example snippet.
What am I messing up?
Expected outcome is:
foo
bar
test
The reason it doesn't work is that -p tells perl to process the input line by line, and there's never more than one \n in a single line.
Better idea:
perl -00 -lpe 1
-00: Enable paragraph mode (input records are terminated by any sequence of 2+ newlines).
-l: Enable autochomp mode (the input record separators are trimmed automatically, so since we're in paragraph mode, all trailing newlines are removed, and output records get "\n\n" added).
-p: Enable automatic input/output (the main code is executed for each input record; anything left in $_ is printed automatically).
-e 1: Use a dummy main program that does nothing.
Taken all together this does nothing except normalize paragraph terminators to exactly two newlines.
You are executing the following program:
LINE: while (<>) {
s/\n+/\n/g;
}
continue {
die "-p destination: $!\n" unless print $_;
}
Since you are reading one line at at time, and since a line is a sequence of characters that aren't line feeds terminated by a line feed, your pattern will never match more than one newline.
The simple fix is to tell Perl to treat the entire file as one line. Also, you don't want to replace every line feed, but just those found in sequence of two or more, and you want to replace the sequence with two line feeds.
perl -0777pe's/\n\n\K\n+//g; s^\n+//; s/\n\K\n\z//' some_random_text.txt
The second and third substitutions ensure there are no blank lines at the start and end of the file.
While reading the entire file into memory is easy, it's not necessary. The desired output can also be achieved by maintaining a flag that indicates whether the previous line was blank or not.
perl -ne'if (/\S/) { print "\n" if $f; print; $f=0 } else { $f=1 }' some_random_text.txt
This solution also removes blank lines from the start and end of the file.
Given:
$ echo "$txt"
foo
bar
test
You can use sed to reduce the runs of blank lines to a single \n:
$ echo "$txt" | sed '/^$/N;/^\n$/D'
foo
bar
test
Even easier, you can use cat -s:
$ echo "$txt" | cat -s # same output
In perl the idiomatic 1 liner is to use -00 for paragraph mode:
$ echo "$txt" | perl -00pe0 # same output
And in awk you have the flexibility of using paragraph mode by setting RS= and then set ORS= to what you want the replacement for runs of \n to be:
$ echo "$txt" | awk '1' RS= ORS="\n\n" # same output
ikegami correctly states that printf 'a\n\n' | ... will produce two trailing spaces with these solutions. That may or may not be an issue.

SED: addressing two lines before match

Print line, which is situated 2 lines before the match(pattern).
I tried next:
sed -n ': loop
/.*/h
:x
{n;n;/cen/p;}
s/./c/p
t x
s/n/c/p
t loop
{g;p;}
' datafile
The script:
sed -n "1N;2N;/XXX[^\n]*$/P;N;D"
works as follows:
Read the first three lines into the pattern space, 1N;2N
Search for the test string XXX anywhere in the last line, and if found print the first line of the pattern space, P
Append the next line input to pattern space, N
Delete first line from pattern space and restart cycle without any new read, D, noting that 1N;2N is no longer applicable
This might work for you (GNU sed):
sed -n ':a;$!{N;s/\n/&/2;Ta};/^PATTERN\'\''/MP;$!D' file
This will print the line 2 lines before the PATTERN throughout the file.
This one with grep, a bit simpler solution and easy to read [However need to use one pipe]:
grep -B2 'pattern' file_name | sed -n '1,2p'
If you can use awk try this:
awk '/pattern/ {print b} {b=a;a=$0}' file
This will print two line before pattern
I've tested your sed command but the result is strange (and obviously wrong), and you didn't give any explanation. You will have to save three lines in a buffer (named hold space), do a pattern search with the newest line and print the oldest one if it matches:
sed -n '
## At the beginning read three lines.
1 { N; N }
## Append them to "hold space". In following iterations it will append
## only one line.
H
## Get content of "hold space" to "pattern space" and check if the
## pattern matches. If so, extract content of first line (until a
## newline) and exit.
g
/^.*\nsix$/ {
s/^\n//
P
q
}
## Remove the old of the three lines saved and append the new one.
s/^\n[^\n]*//
h
' infile
Assuming and input file (infile) with following content:
one
two
three
four
five
six
seven
eight
nine
ten
It will search six and as output will yield:
four
Here are some other variants:
awk '{a[NR]=$0} /pattern/ {f=NR} END {print a[f-2]}' file
This stores all lines in an array a. When pattern is found store line number.
At then end print that line number from the file.
PS may be slow with large files
Here is another one:
awk 'FNR==NR && /pattern/ {f=NR;next} f-2==FNR' file{,}
This reads the file twice (file{,} is the same as file file)
At first round it finds the pattern and store line number in variable f
Then at second round it prints the line two before the value in f

Grepping out a block of text, regex

Given a large log file, what is the best way to grep a block of text?
text to be ignored
more text to be ignored
--- <---- start capture here
lots of
text with separators like "---"
---
spanning
multiple lines
--- <---- end capture here
text to be ignored
more text to be ignored
What is known?
Max number of characters in line (55 but may be less)
Number of lines in a block
Separator (which may repeat itself)
What regular expression would match this block? Desired output: list of blocks of text.
Please assume Linux command line environment
Several years ago I used this to split patches into hunks:
sed -e '$ {x;q}' -e '/##/ !{H;d}' -e '/##/ x' # note - i know sed better now
Replace /##/ with /---/.
To remove everything before first '---' and after last '---' add -e '1,/---/d' and remove the whole -e '$ {x;q}'.
Result would be like this:
sed -e '1,/---/d' -e '/---/ !{H;d}' -e x
Just tested it and it works with the given example.
Keep it simple:
$ awk 'NR==FNR {if (/^---/) { if (!start) start=NR; end=NR } next} FNR>=start && FNR<=end' file file
--- <---- start capture here
lots of
text with separators like "---"
---
spanning
multiple lines
--- <---- end capture here
$ awk 'NR==FNR {if (/^---/) { if (!start) start=NR; end=NR } next} FNR>start && FNR<end' file file
lots of
text with separators like "---"
---
spanning
multiple lines
If you have enough memory, you can use the following line. Note, however, that it will read the whole logfile into memory!
perl -0777 -lnE 'm{ ^--- .+ ^--- }xms and say $&' logfile

How to use sed to remove only double empty lines?

I found this question and answer on how to remove triple empty lines. However, I need the same only for double empty lines. Ie. all double blank lines should be deleted completely, but single blank lines should be kept.
I know a bit of sed, but the proposed command for removing triple blank lines is over my head:
sed '1N;N;/^\n\n$/d;P;D'
This would be easier with cat:
cat -s
I've commented the sed command you don't understand:
sed '
## In first line: append second line with a newline character between them.
1N;
## Do the same with third line.
N;
## When found three consecutive blank lines, delete them.
## Here there are two newlines but you have to count one more deleted with last "D" command.
/^\n\n$/d;
## The combo "P+D+N" simulates a FIFO, "P+D" prints and deletes from one side while "N" appends
## a line from the other side.
P;
D
'
Remove 1N because we need only two lines in the 'stack' and it's enought with the second N, and change /^\n\n$/d; to /^\n$/d; to delete all two consecutive blank lines.
A test:
Content of infile:
1
2
3
4
5
6
7
Run the sed command:
sed '
N;
/^\n$/d;
P;
D
' infile
That yields:
1
2
3
4
5
6
7
sed '/^$/{N;/^\n$/d;}'
It will delete only two consecutive blank lines in a file. You can use this expression only in file then only you can fully understand. When a blank line will come that it will enter into braces.
Normally sed will read one line. N will append the second line to pattern space. If that line is empty line. the both lines are separated by newline.
/^\n$/ this pattern will match that time only the d will work. Else d not work. d is used to delete the pattern space whole content then start the next cycle.
This would be easier with awk:
awk -v RS='\n\n\n' 1
BUT the above solution only deletes first search of 3 consecutive blank line.
To delete all, 3 consecutive blank lines use below command
sed '1N;N;/^\n\n$/ { N;s/^\n\n//;N;D; };P;D' filename
As far as I can tell none of the solutions here work. cat -s as suggested by #DerMike isn't POSIX compliant (and it's less convenient if you're already using sed for another transformation), and sed 'N;/^\n$/d;P;D' as suggested by #Birei sometimes deletes more newlines than it should.
Instead, sed ':L;N;s/^\n$//;t L' works. For POSIX compliance use sed -e :L -e N -e 's/^\n$//' -e 't L', since POSIX doesn't specify using ; to separate commands.
Example:
$ S='foo\nbar\n\nbaz\n\n\nqux\n\n\n\nquxx\n';\
> paste <(printf "$S")\
> <(printf "$S" | sed -e 'N;/^\n$/d;P;D')\
> <(printf "$S" | sed -e ':L;N;s/^\n$//;t L')
foo foo foo
bar bar bar
baz baz baz
qux
qux
qux quxx
quxx
quxx
$
Here we can see the original file, #Birei's solution, and my solution side-by-side. #Birei's solution deletes all blank lines separating baz and qux, while my solution removes all but one as intended.
Explanation:
:L Create a new label called L.
N Read the next line into the current pattern space,
separated by an "embedded newline."
s/^\n$// Replace the pattern space with the empty pattern space,
corresponding to a single non-embedded newline in the output,
if the current pattern space only contains a single embedded newline,
indicating that a blank line was read into the pattern space by `N`
after a blank line had already been read from the input.
t L Branch to label L if the previous `s` command successfully
substituted text in the pattern space.
In effect, this deletes one recurrent blank line at a time, reading each into the pattern space as an embedded newline with N and deleting them with s.
BUT the above solution only deletes first search of 3 consecutive blank line. To delete all, 3 consecutive blank lines use below command
sed '1N;N;/^\n\n$/ { N;s/^\n\n//;N;D; };P;D' filename
Just pipe it to 'uniq' command and all empty lines regardless the number of them will be shrank to just one. Simpler is better.
Clarification: As Marlar stated this is not a solution if you have "other non-blank consecutive duplicated lines" that you do not want to get rid of. This is a solution in other cases like when trying to cleanup configuration files which was the solution I was after when I saw this question. I solved my problem indeed just using 'uniq'.

sed join lines together

what would be the sed (or other tool) command to join lines together in a file that do not end w/ the character '0'?
I'll have lines like this
412|n|Leader Building Material||||||||||d|d|20||0
which need to be left alone, and then I'll have lines like this for example (which is 3 lines, but only one ends w/ 0)
107|n|Knot Tying Tools|||||Knot Tying Tools
|||||d|d|0||0
which need to be joined/combined into one line
107|n|Knot Tying Tools|||||Knot Tying Tools|||||d|d|0||0
sed ':a;/0$/{N;s/\n//;ba}'
In a loop (branch ba to label :a), if the current line ends in 0 (/0$/) append next line (N) and remove inner newline (s/\n//).
awk:
awk '{while(/0$/) { getline a; $0=$0 a; sub(/\n/,_) }; print}'
Perl:
perl -pe '$_.=<>,s/\n// while /0$/'
bash:
while read line; do
if [ ${line: -1:1} != "0" ] ; then
echo $line
else echo -n $line
fi
done
awk could be short too:
awk '!/0$/{printf $0}/0$/'
test:
kent$ cat t
#aasdfasdf
#asbbb0
#asf
#asdf0
#xxxxxx
#bar
kent$ awk '!/0$/{printf $0}/0$/' t
#aasdfasdf#asbbb0
#asf#asdf0
#xxxxxx#bar
The rating of this answer is surprising ;s (this surprised wink emoticon pun on sed substitution is intentional) given the OP specifications: sed join lines together.
This submission's last comment
"if that's the case check what #ninjalj submitted"
also suggests checking the same answer.
ie. Check using sed ':a;/0$/{N;s/\n//;ba}' verbatim
sed ':a;/0$/{N;s/\n//;ba}'
does
no one
ie. 0
people,
try
nothing,
ie. 0
things,
any more,
ie. 0
tests?
(^D aka eot 004 ctrl-D ␄ ... bash generate via: echo ^V^D)
which will not give (do the test ;):
does no one ie. 0
people, try nothing, ie. 0
things, any more, ie. 0
tests? (^D aka eot 004 ctrl-D ␄ ... bash generate via: echo ^V^D)
To get this use:
sed 'H;${z;x;s/\n//g;p;};/0$/!d;z;x;s/\n//g;'
or:
sed ':a;/0$/!{N;s/\n//;ba}'
not:
sed ':a;/0$/{N;s/\n//;ba}'
Notes:
sed 'H;${x;s/\n//g;p;};/0$/!d;z;x;s/\n//g;'
does not use branching and
is identical to:
sed '${H;z;x;s/\n//g;p;};/0$/!{H;d;};/0$/{H;z;x;s/\n//g;}'
H commences all sequences
d short circuits further script command execution on the current line and starts the next cycle so address selectors following /0$/! can only be /0$/!! so the address selector of
/0$/{H;z;x;s/\n//g;} is redundant and not needed.
if a line does not end with 0 save it in hold space
/0$/!{H;d;}
if a line does end with 0 save it too and then print flush (double entendre ie. purged and lines aligned)
/0$/{H;z;x;s/\n//g;}
NB ${H;z;x;s/\n//g;p;} uses /0$/ ... commands with an extra p to coerce the final print and with a now unnecessary z (to empty and reset pattern space like s/.*//)
A typically cryptic Perl one-liner:
perl -pe 'BEGIN{$/="0\n"}s/\n//g;$_.=$/'
This uses the sequence "0\n" as the record separator (by your question, I'm assuming that every line should end with a zero). Any record then should not have internal newlines, so those are removed, then print the line, appending the 0 and newline that were removed.
Another take to your question would be to ensure each line has 17 pipe-separated fields. This does not assume that the 17th field value must be zero.
awk -F \| '
NF == 17 {print; next}
prev {print prev $0; prev = ""}
{prev = $0}
'
if ends with 0 store, remove newline..
sed '/0$/!N;s/\n//'