sed: display lines before a match - regex

Using sed looking for the last lines before matching lines:
echo -e "aaa\nbbb\nccc\naaa\nccc\naaa\nbbb\nccc" | sed '/aaa/!d' | sed '$!d' #In which order and amount of aaa, bb, ccc, ..., nnn is optional
The example above works well. The second method:
echo -e "aaa\nbbb\nccc\naaa\nccc\naaa\nbbb\nccc" | sed -e '/aaa/!d' -e '$!d'
or:
echo -e "aaa\nbbb\nccc\naaa\nccc\naaa\nbbb\nccc" | sed -e '/aaa/!d;$!d'
The second method does not want me to work. The wikipedia someone wrote that sed can be combined. I do not want to work. What I'm doing wrong and I understand? How should properly look like?

This might work for you (GNU sed):
sed '/aaa/h;$!d;x' file
To catch the last match you must store it in the hold space then retrieve it at the end of the file.

What is the desired output? The first command gives a single line aaa. The second and third commands give no output. There's a solid reason for the discrepancy in the behaviour.
In the first command, you have:
sed '/aaa/!d' | sed '$!d'
The first sed here deletes each line that is not aaa. The output (3 lines containing aaa) is then filtered so that only the last line is printed.
In the second and third commands (which are equivalent), you have:
sed -e '/aaa/!d' -e '$!d'
The first operand deletes each line that is not aaa and starts the next cycle. The second operand deletes every remaining aaa because none of them is on the last line of input (the last line in the input is ccc, which has already been deleted by virtue of not being aaa). So the output you see is exactly what you should expect.
If you want just one aaa, consider using:
grep '^aaa$' | uniq
Though that's a long-winded way of writing:
echo aaa
Presumably, though, this is a simplified version of the real situation (which is a good thing).

$ in the address means the last line. It does not change even if the last line is not being printed because of a previous command. In the pipeline, though, only the printed lines get to the second invocation of sed, and $ again means the last line - now only from the lines printed by the previous sed invocation.

Related

sed retrieve part of line

I have lines of code that look like this:
hi:12345:234 (second line)
How do I write a line of code using the sed command that only prints out the 2nd item in the second line?
My current command looks like this:
sed -n '2p' file which gets the second line, but I don't know what regex to use to match only the 2nd item '12345' and combine with my current command
Could you please try following, written and tested with shown samples in GNU sed.
sed -n '2s/\([^:]*\):\([^:]*\).*/\2/p' Input_file
Explanation: Using -n option of sed will stop the printing for all the lines and printing will happen only for those lines where we are explicitly mentioning p option to print(later in code). Then mentioning 2s means perform substitution on 2nd line only. Then using regex and sed's capability to store matched regex into a temp buffer by which values can be retrieved later by numbering 1,2...and so on. Regex is basically catching 1st part which comes before first occurrence of : and then 2nd part after first occurrence of : to till 2nd occurrence of : as per OP's request. So while doing substitution mentioning /2 will replace whole line with 2nd value stored in buffer as per request, then mentioning p will print that part only in 2nd line.
A couple of solutions:
echo "hi:12345:234" | sed -n '2s/.*:\([0-9]*\):.*/\1/p'
echo "hi:12345:234" | sed -n '2{s/^[^:]*://; s/:.*//p; q}'
echo "hi:12345:234" | awk -F':' 'FNR==2{print $2}'
All display 12345.
sed -n '2s/.*:\([0-9]*\):.*/\1/p' only displays the captured value thanks to -n and p option/flag. It matches a whole string capturing digits between two colons, \1 only keeps the capture.
The sed -n '2{s/^[^:]*://;s/:.*//p;q}' removes all from start till first :, all from the second to end, and then quits (q) so if your file is big, it will be processed quicker.
awk -F':' 'FNR==2{print $2}' splits the second line with a colon and fetches the second item.

Make matching example from sed manual working

I found an example in info sed stating the following:
'^\(.*\)\n\1$'
This matches a string consisting of two equal substrings separated
by a newline.
Trying to implement it in this ways didn't
return any matching lines:
echo -e "test\ntest" | sed -n '/^\(.*\)\n\1$/p'
echo -e "test\ntest" | sed -n 's/^\(.*\)\n\1$/\0/p'
sed version I use is 4.2.2.
Please suggest the way this example can be tested.
This might work for you (GNU sed and bash);
<<<$'test\ntest' sed -En 'N;s/^(.*)\n\1$/\1 == \1/p;s/^(.*)\n(.*)$/\1 != \2/p'
Append the second line of the input to the first and if the two lines are the same, replace them by line1 == line2 otherwise replace them by line1 != line2.
N.B. That both substitutions are trying to match at least a newline and if the first substitution succeeds the second can not. Likewise, if the first substitution never happened the second must.
To make an example work, I will have to use N that will read one more line in a pattern space and allow \n to be matched.

How can I print 2 lines if the second line contains the same match as the first line?

Let's say I have a file with several million lines, organized like this:
#1:N:0:ABC
XYZ
#1:N:0:ABC
ABC
I am trying to write a one-line grep/sed/awk matching function that returns both lines if the NCCGGAGA line from the first line is found in the second line.
When I try to use grep -A1 -P and pipe the matches with a match like '(?<=:)[A-Z]{3}', I get stuck. I think my creativity is failing me here.
With awk
$ awk -F: 'NF==1 && $0 ~ s{print p ORS $0} {s=$NF; p=$0}' ip.txt
#1:N:0:ABC
ABC
-F: use : as delimiter, makes it easy to get last column
s=$NF; p=$0 save last column value and entire line for printing later
NF==1 if line doesn't contain :
$0 ~ s if line contains the last column data saved previously
if search data can contain regex meta characters, use index($0,s) instead to search literally
note that this code assumes input file having line containing : followed by line which doesn't have :
With GNU sed (might work with other versions too, syntax might differ though)
$ sed -nE '/:/{N; /.*:(.*)\n.*\1/p}' ip.txt
#1:N:0:ABC
ABC
/:/ if line contains :
N add next line to pattern space
/.*:(.*)\n.*\1/ capture string after last : and check if it is present in next line
again, this assumes input like shown in question.. this won't work for cases like
#1:N:0:ABC
#1:N:0:XYZ
XYZ
This might work for you (GNU sed):
sed -n 'N;/.*:\(.*\)\n.*\1/p;D' file
Use grep-like option -n to explicitly print lines. Read two lines into the pattern space and print both if they meet the requirements. Always delete the first and repeat.
If you actual Input_file is same as shown example then following may help you too here.
awk -v FS="[: \n]" -v RS="" '$(NF-1)==$NF' Input_file
EDIT: Adding 1 more solution as per Sundeep suggestion too here.
awk -v FS='[:\n]' -v RS= 'index($NF, $(NF-1))' Input_file

Regex Pattern Replace

So i wanted to replace the following
<duration>89</duration>
with
(Expected Result or at least Shoud become this:)
\n<duration>89</duration>
so basically replace every < with \n< in regex So i figured.
sed -e 's/<[^/]/\n</g'
Only problem it obviously outputs
\n<uration>89</duration>
Which brings me to my question. How can i tell regex to mach for a character which follows < (is not /) but stop it from replacing it so i can get my expected result?
Try this:
sed -e 's/<[^/]/\\n&/g' file
or
sed -e 's/<[^/]/\n&/g' file
&: refer to that portion of the pattern space which matched
It can be nicely done with awk:
echo '<duration>89</duration>' | awk '1' RS='<' ORS='\n<'
RS='<' sets the input record separator to<`
ORS='\n<' sets the output record separator to\n<'
1 always evaluates to true. An true condition without an subsequent action specified tells awk to print the record.
echo "<duration>89</duration>" | sed -E 's/<([^\/])/\\n<\1/g'
should do it.
Sample Run
$ echo "<duration>89</duration>
> <tag>Some Stuff</tag>"| sed -E 's/<([^\/])/\\n<\1/g'
\n<duration>89</duration>
\n<tag>Some Stuff</tag>
Your statement is kind of correct with one small problem. sed replaces entire pattern, even any condition you have put. So, [^/] conditional statement also gets replaced. What you need is to preserve this part, hence you can try any of the following two statements:
sed -e 's/<\([^/]\)/\n<\1/g' file
or as pointed by Cyrus
sed -e 's/<[^/]/\n&/g' file
Cheers!
echo '<duration>89</duration>' | awk '{sub(/<dur/,"\\n<dur")}1'
\n<duration>89</duration>

What actually the meaning of "-n" in sed?

According to http://linux.about.com/od/commands/l/blcmdl1_sed.htm
suppress automatic printing of pattern
space
I've tested with or without -n, sed will produce same result
I dont understand what space does it means.
Sed has two places to store text: pattern space and hold space. Pattern space is where each line is put to be processed by sed commands; hold space is an auxiliary place to put some text you may want to use later. You probably will use only pattern space.
Before sed goes to process a line, it is put in the pattern space. Then, sed applies all commands (such as s///) to de pattern space and, by default, prints the resulting text from the pattern space. Let us suppose we have a file myfile with a line like:
The quick brown fox jumps over the lazy dog.
We run the following command:
sed 's/fox/coati/;s/dog/dingo/' myfile
Sed will apply s/fox/coati/ and then s/dog/dingo/ for each line of the file - in this case, the only one we showed above. When it occurs, it will put the line in the pattern space, which will have the following content:
The quick brown fox jumps over the lazy dog.
Then, sed will run the first command. After sed runs the command s/fox/coati/, the content of the pattern space will be:
The quick brown coati jumps over the lazy dog.
Then sed will apply the second command, s/dog/dingo/. After that, the content of the pattern space will be:
The quick brown coati jumps over the lazy dingo.
Note that this only happens in memory - nothing is printed by now.
After all, commands have been applied to the current line, by default, sed will then get the content of the pattern space and print it to the standard output. However, when you give -n as an option to sed, you ask sed not to execute this last step — except if it is explicitly required. So, if you run
sed -n 's/fox/coati/;s/dog/dingo/' myfile
nothing will be printed.
But how could you explicitly request sed to print the pattern space? Well, you can use the p command. When sed finds this command, it will print the content of the pattern space immediately. For example, in the command below we request sed to print the content of the pattern space just after the first command:
sed -n 's/fox/coat/;p;s/dog/dingo/' myfile
The result will be
$ sed -n 's/fox/coati/;p;s/dog/dingo/' myfile
The quick brown coati jumps over the lazy dog.
Note that only fox is replaced. It happens because the second command was not executed before the pattern space was printed. If we want to print the pattern space after both commands, we just put p after the second one:
sed -n 's/fox/coati/;s/dog/dingo/;p' myfile
Another option, if you are using the s/// command, is to pass the p flag to s///:
sed -n 's/fox/coati/;s/dog/dingo/p' myfile
In this case, the line will only be printed if the flagged replacement was executed. It may be very useful!
Just try a sed do-nothing:
sed '' file
and
sed -n '' file
First will print whole file but second will NOT print anything.
This puts sed into quiet mode, where sed will suppress all output except for when explicitly stated by a p command:
-n
--quiet
--silent
By default, sed will print out the pattern space at
the end of each cycle through the script. These
options disable this automatic printing, and sed
will only produce output when explicitly told to
via the p command.
An example of this would be if you wanted to use sed to simulate the actions of grep:
$echo -e "a\nb\nc" | sed -n '/[ab]/ p'
a
b
without the -n you would get an occurrence of c (and two occurrences of a and b)
$ echo "a b c d" | sed "s/a/apple/"
apple b c d
The pattern space is printed implicitly.
$ echo "a b c d" | sed -n "s/a/apple/"
No output.
$ echo "a b c d" | sed -n "s/a/apple/p"
apple b c d
Explicitly print the pattern space.