How can I delete the last word in the current line, but only if a pattern occurs on the next line? - regex

The contents of the file are
some line DELETE_ME
some line this_is_the_pattern
If the this_is_the_pattern occurs in the next line, then delete the last word (in this case DELETE_ME) in the current line.
How can I do this using sed or awk? My understanding is that sed is more appropriate for this task than awk is, because awk is suitable for operations on data stored tabular format. If my understanding is incorrect, please let me know.

$ awk '/this_is_the_pattern/{sub(/[^[:space:]]+$/, "", last)} NR>1{print last} {last=$0} END{print last}' file
some line
some line this_is_the_pattern
How it works
This script uses a single variable called last which contains the previous line in the file. In summary, if the current line contains the pattern, then the last word is removed from last. Otherwise, last is printed as is.
In detail, taking each command in turn:
/this_is_the_pattern/{sub(/[^[:space:]]+$/, "", last)}
If this line has the pattern, remove the final word from the last line.
NR>1{print last}
For each line after the first line, print the last line.
last=$0
Save the current line in variable last.
END{print last}
Print the last line from the file.

awk 'NR>1 && /this_is_the_pattern/ {print t;}
NR>1 && !/this_is_the_pattern/ {print f;}
{f=$0;$NF="";t=$0}
END{print f}' input-file
Note that this will modify whitespace in any lines in which the last field is removed, squeezing runs of whitespace into a single space.
You could simplify this to:
awk 'NR>1 { print( /this_is_the_pattern/? t:f)}
{f=$0;$NF="";t=$0}
END{print f}' input-file
and you can resolve the squeezed whitespace issue with:
awk 'NR>1 { print( /this_is_the_pattern/? t:f)}
{f=$0;sub(" [^ ]*$","");t=$0}
END{print f}' input-file

You could use tac to cat the file backwards, so that you see the pattern first. Then set a flag and delete the last word on the next line you see. Then at the end, reverse the file through tac back to the original order.
tac file | awk '/this_is_the_pattern/{f=1;print;next} f==1{sub(/ [^ ]+$/, "");print;f=0}' | tac

Use buffer to keep previous line in memory
sed -n 'H;1h;1!{x;/\nPAGE/ s/[^ ]*\(\n\)/\1/;P;s/.*\n//;h;$p;}' YourFile
Use loop but same concept
sed -n ':cycle
N;/\nPAGE/ s/[^ ]*\(\n\)/\1/;P;s/.*\n//;$p;b cycle' YourFile
in both case, it remove last word of previous line also the search pattern is on 2 consecutive lines
work with 2 last read lines, test if pattern on last and delete word if present than print first line, remove it and cycle

The idiomatic awk solution is simply to keep a buffer of the previous line (or N lines in the general case) so you can test the current line and then modify and/or print the buffer accordingly:
$ awk '
NR>1 {
if (/this_is_the_pattern/) {
sub(/[^[:space:]]+$/,"",prev)
}
print prev
}
{ prev = $0 }
END { print prev }
' file
some line
some line this_is_the_pattern

Related

grep/pcregrep/sed/awk the data after the last match to the end of a file

I need to grab the content after the last match of ENTRY to the end of the file, and I can't seem to do it. It can be multiple lines and the data can include any character to the end of the file including (,\n, ).
I’ve tried:
tail -1 file # doesn’t work due to it not consistently being one line
grep "^(.*" # only grabs one line
pcregrep -M '\n(.*' file # I think a variation of this is the solution, but I’ve had no luck so far.
File that grows below:
TOP OF FILE
%
ENTRY
(S®s
√6ûíπ‹ôTìßÅDPˆ¬k·Ù"=ÓxF)*†‰ú˚ÃQ´¿J‘\˜©ŒG»‡∫QÆ’<πsµ-ù±ñ∞NäAOilWçk
N+P}V<ôÒ∏≠µW*`Hß”;–GØ»14∏åR"ºã
FD‘mÍõ?*ÊÎÉC)(S®s
√6ûíπ‹ôTìßÅDPˆ¬k·Ù"=ÓxF)*†‰ú˚ÃQ´¿J‘\˜©ŒG»‡∫QÆ’<πsµ-ù±ñ∞NäAOilWçk
N+P}V<ôÒ∏≠µW*`Hß”;–GØ»14∏åR"ºã
FD‘mÍõ?*ÊÎÉC)eq
{
DATA
}
ENTRY
(A® S\kÉflã1»Âbπ¯Ú∞⁄äπHZ#F◊§•Ã*‹¡‹…ÿPkJòÑíòú˛¶à˛¨¢v|u«Ùbó–Ö¶¢∂5ıÜ#¨•˘®#W´≥‡*`H∑”ı–Só¬<˙ìEçöf∞Gg±:œe™flflå)A® S\kÉflã1»Âbπ¯Ú∞⁄äπHZ#F◊§•Ã*‹¡‹…ÿPkJòÑíòú˛¶à˛¨¢v|u«Ùbó–Ö¶¢∂5ıÜ#¨•˘®#W´≥‡*`H∑”ı–Só¬<˙ìEçöf∞Gg±:œe™flflå)eq
{
DATA
}if
ENTRY
(ÌSYõ˛9°\K¬∞≈fl|”/í÷L
Ö˙h/ÜÇi"û£fi±€ÀNéÓ›bÏÿmâ[≈4J’XPü´Z
oÜlø∫…qìõ¢,ßü©cÓ{—˜e&ÚÀÓHÏÜ‚m(Œ∆⁄ˆQ˝òêpoÉÄÂ(S‘E ⁄ !ŸQ§ô6ÉH
$ awk '/^[(]/{s="";} {s=s"\n"$0;} END{print substr(s,2);}' file
(ÌSYõ˛9°\K¬∞≈fl|”/í÷L
Ö˙h/ÜÇi"û£fi±€ÀNéÓ›bÏÿmâ[≈4J’XPü´Z
oÜlø∫…qìõ¢,ßü©cÓ{—˜e&ÚÀÓHÏÜ‚m(Œ∆⁄ˆQ˝òêpoÉÄÂ(S‘E ⁄ !ŸQ§ô6ÉH
How it works
awk implicitly loops through files line-by-line. This script stores whatever we want to print in the variable s.
/^[(]/{s="";}
Every time that we find a line which starts with (, we set s to an empty string.
The purpose of this is to remove everything before the last occurrence of a line starting with (.
s=s"\n"$0
We add the current line onto the end of s.
END{print substr(s,2);}
After we reach the end of the file, we print s (omitting the first character which will be a surplus newline character).
Interesting problem. I think you can do it with just sed. When you find a match, zero the hold space and add the match line to the hold space. On the last line, print the hold space.
sed -n -e '/ENTRY/,$ { /ENTRY/ { h; n; }; H; $ { x; p; } }'
Don't print by default. From the first entry to the end of the file:
If it is an entry line; copy the new line over the hold space and move on.
Otherwise append the line to the hold space.
If it is the last line, swap the hold space and pattern space, and print the pattern space (what was in the hold space).
You might worry about what happens if the last line in the file is an ENTRY line.
Given a data file:
TOP OF FILE
not wanted
ENTRY
could be wanted
ENTRY
but it wasn't
and this isn't
because
ENTRY
this is here
EOF
The output is:
ENTRY
this is here
EOF
If you don't want ENTRY to appear, modify the script slightly:
sed -n -e '/ENTRY/,$ { /ENTRY/ { s/.*//; h; n; }; H; $ { x; s/^\n//; p; } }'
Using tac you could do it:
tac <file> | sed -e '/ENTRY/,$d' | tac
This will print the file with the lines reversed, then use sed to remove everything from what is now the first occurrence of ENTRY to the now end of the file, then reverse the lines again to get the original order.
As Jonathan Leffler pointed out, a faster way to do this--though probably not much because tac will still have a lot to do and it has all the overhead of rquireing 3 processes instead of just one, but the sed could be done more efficiently, but just ending when we find the ENTRY line, instead of processing the rest of the file to remove the lines:
tac <file> | sed -e '/ENTRY/q' | tac
though his answer is often going to be better still. That answer will include the ENTRY line. If you don't want that you could also do
tac <file> | sed -n '/ENTRY/q;p' | tac
to not print any ouptut by default, then quit as soon as you find the ENTRY line, but use the p command to print the lines until you get to that line.
This should work too (at least with gawk)
awk -vRS="ENTRY" 'END{print $0}'
set the record separator as your pattern and print the last record.
loadind file in memory
sed -e 'H;$!d' -e 'x;s/.*ENTRY[[:blank:]]*\n//' YourFile

Regex to move second line to end of first line

I have several lines with certain values and i want to merge every second line or every line beginning with <name> to the end of the line ending with
<id>rd://data1/8b</id>
<name>DM_test1</name>
<id>rd://data2/76f</id>
<name>DM_test_P</name>
so end up with something like
<id>rd://data1/8b</id><name>DM_test1</name>
The reason why it came out like this is because i used two piped xpath queries
Regex
Simply remove the newline at the end of a line ending in </id>. On a windows, replace (<\/id>)\r\n with \1 or $1 (which is perl syntax). On a linux search for (<\/id>)\n and replace it with the same thing.
awk
The ideal solution uses awk. The idea is simply, when the line number is odd, we print the line without a newline, if not we print it with a newline.
awk '{ if(NR % 2) { printf $0 } else { print $0 } }' file
sed
Using sed we place a line in the hold space when it contains <id>´ and append the line to it when it's a` line. Then we remove the newline and print the hold buffer by exchanging it with the pattern space.
sed -n '/<id>.*<\/id>/{h}; /<name>.*<\/name>/{H;x;s/\n//;p}' file
pr
Using pr we can achieve a similar goal:
pr -s --columns 2 file

get the last word in body of text

Given a body of text than can span a varying number of lines, I need to use a grep, sed or awk solution to search through many files for the same pattern and get the last word in the body.
A file can include formats such as these where the word I want can be named anything
call function1(input1,
input2, #comment
input3) #comment
returning randomname1,
randomname2,
success3
call function1(input1,
input2,
input3)
returning randomname3,
randomname2,
randomname3
call function1(input1,
input2,
input3)
returning anothername3,
randomname2, anothername3
I need to print out results as
success3
randomname3
anothername3
Also I need some the filename and line information about each .
I've tried
pcregrep -M 'function1.*(\s*.*){6}(\w+)$' filename.txt
which is too greedy and I still need to print out just the specific grouped value and not the whole pattern. The words function1 and returning in my sample code will always be named as this and can be hard coded within my expression.
Last word of code blocks
Split file in blocks using awk's record separator RS. A record will be defined as a block of text, records are separated by double newlines.
A record consists of fields, each two consecutive fields are separated by white space or a single newline.
Now all we have to do is print the last field for each record, resulting in following code:
awk 'BEGIN{ FS="[\n\t ]"; RS="\n\n"} { print $NF }' file
Explanation:
FS this is the field separator and is set to either a newline, a tab or a space: [\n\t ].
RS this is the record separator and is set to a doulbe newline: \n\n
print $NF this will print the field $ with index NF, which is a variable containing the number of fields. Hence this prints the last field.
Note: To capture all paragraphs the file should end in double newline, this can easily be achieved by pre processing the file using: $ echo -e '\n\n' >> file.
Alternate solution based on comments
A more elegant ans simple solution is as follows:
awk -v RS='' '{ print $NF }' file
How about the following awk solution:
awk 'NF == 0 {if(last) print last; last=""} NF > 0 {last=$NF} END {print last}' file
the $NF is getting the value of the last "word" where NF stands for number of fields. Then the last variable always stores the last word on a line and prints it if it encounters an empty line, representing the end of a paragraph.
New version with matches function1 condition.
awk 'NF == 0 {if(last && hasF) print last; last=hasF=""}
NF > 0 {last=$NF; if(/function1/)hasF=1}
END {if(hasF) print last}' filename.txt
This will produce the output you show from the input file you posted:
$ awk -v RS= '{print $NF}' file
success3
randomname3
anothername3
If you want to print FILENAME and line number like you mention then this may be what you want:
$ cat tst.awk
NF { nr=NR; last=$NF; next }
{ prt() }
END { prt() }
function prt() { if (nr) print FILENAME, nr, last; nr=0 }
$ awk -f tst.awk file
file 6 success3
file 13 randomname3
file 20 anothername3
If that doesn't do what you want, edit your question to provide clearer, more truly representative and accurate sample input and expected output.
This is the perl version of Shellfish's awk solution (plus the keywords):
perl -00 -nE '/function1/ and /returning/ and say ((split)[-1])' file
or, with one regex:
perl -00 -nE '/^(?=.*function1)(?=.*returning).*?(\S+)\s*$/s and say $1' file
But the key is the -00 option which reads the file a paragraph at a time.

Remove \n newline if string contains keyword

I'd like to know if I can remove a \n (newline) only if the current line has one ore more keywords from a list; for instance, I want to remove the \n if it contains the words hello or world.
Example:
this is an original
file with lines
containing words like hello
and world
this is the end of the file
And the result would be:
this is an original
file with lines
containing words like hello and world this is the end of the file
I'd like to use sed, or awk and, if needed, grep, wc or whatever commands work for this purpose. I want to be able to do this on a lot of files.
Using awk you can do:
awk '/hello|world/{printf "%s ", $0; next} 1' file
this is an original
file with lines
containing words like hello and world this is the end of the file
here is simple one using sed
sed -r ':a;$!{N;ba};s/((hello|world)[^\n]*)\n/\1 /g' file
Explanation
:a;$!{N;ba} read whole file into pattern, like this: this is an original\nfile with lines\ncontaining words like hell\
o\nand world\nthis is the end of the file$
s/((hello|world)[^\n]*)\n/\1 /g search the key words hello or world and remove the next \n,
g command in sed substitute stands to apply the replacement to all matches to the regexp, not just the first.
A non-regex approach:
awk '
BEGIN {
# define the word list
w["hello"]
w["world"]
}
{
printf "%s", $0
for (i=1; i<=NF; i++)
if ($i in w) {
printf " "
next
}
print ""
}
'
or a perl one-liner
perl -pe 'BEGIN {#w = qw(hello world)} s/\n/ / if grep {$_ ~~ #w} split'
To edit the file in-place, do:
awk '...' filename > tmpfile && mv tmpfile filename
perl -i -pe '...' filename
This might work for you (GNU sed):
sed -r ':a;/^.*(hello|world).*\'\''/M{$bb;N;ba};:b;s/\n/ /g' file
This checks if the last line, of a possible multi-line, contains the required string(s) and if so reads another line until end-of-file or such that the last line does not contain the/those string(s). Newlines are removed and the line printed.
$ awk '{ORS=(/hello|world/?FS:RS)}1' file
this is an original
file with lines
containing words like hello and world this is the end of the file
sed -n '
:beg
/hello/ b keep
/world/ b keep
H;s/.*//;x;s/\n/ /g;p;b
: keep
H;s/.*//
$ b beg
' YourFile
a bit harder due to check on current line that may include a previous hello or world already
principle:
on every pattern match, keep the string in hold buffer
other wise, load hold buffer and remove \n (use of swap and empty the current line due to limited buffer operation available) and print the content
Add a special case of pattern in last line (normaly hold so not printed otherwise)

SED: addressing two lines before match

Print line, which is situated 2 lines before the match(pattern).
I tried next:
sed -n ': loop
/.*/h
:x
{n;n;/cen/p;}
s/./c/p
t x
s/n/c/p
t loop
{g;p;}
' datafile
The script:
sed -n "1N;2N;/XXX[^\n]*$/P;N;D"
works as follows:
Read the first three lines into the pattern space, 1N;2N
Search for the test string XXX anywhere in the last line, and if found print the first line of the pattern space, P
Append the next line input to pattern space, N
Delete first line from pattern space and restart cycle without any new read, D, noting that 1N;2N is no longer applicable
This might work for you (GNU sed):
sed -n ':a;$!{N;s/\n/&/2;Ta};/^PATTERN\'\''/MP;$!D' file
This will print the line 2 lines before the PATTERN throughout the file.
This one with grep, a bit simpler solution and easy to read [However need to use one pipe]:
grep -B2 'pattern' file_name | sed -n '1,2p'
If you can use awk try this:
awk '/pattern/ {print b} {b=a;a=$0}' file
This will print two line before pattern
I've tested your sed command but the result is strange (and obviously wrong), and you didn't give any explanation. You will have to save three lines in a buffer (named hold space), do a pattern search with the newest line and print the oldest one if it matches:
sed -n '
## At the beginning read three lines.
1 { N; N }
## Append them to "hold space". In following iterations it will append
## only one line.
H
## Get content of "hold space" to "pattern space" and check if the
## pattern matches. If so, extract content of first line (until a
## newline) and exit.
g
/^.*\nsix$/ {
s/^\n//
P
q
}
## Remove the old of the three lines saved and append the new one.
s/^\n[^\n]*//
h
' infile
Assuming and input file (infile) with following content:
one
two
three
four
five
six
seven
eight
nine
ten
It will search six and as output will yield:
four
Here are some other variants:
awk '{a[NR]=$0} /pattern/ {f=NR} END {print a[f-2]}' file
This stores all lines in an array a. When pattern is found store line number.
At then end print that line number from the file.
PS may be slow with large files
Here is another one:
awk 'FNR==NR && /pattern/ {f=NR;next} f-2==FNR' file{,}
This reads the file twice (file{,} is the same as file file)
At first round it finds the pattern and store line number in variable f
Then at second round it prints the line two before the value in f