sed replace last line matching pattern - regex

Given a file like this:
a
b
a
b
I'd like to be able to use sed to replace just the last line that contains an instance of "a" in the file. So if I wanted to replace it with "c", then the output should look like:
a
b
c
b
Note that I need this to work irrespective of how many matches it might encounter, or the details of exactly what the desired pattern or file contents might be. Thanks in advance.

Not quite sed only:
tac file | sed '/a/ {s//c/; :loop; n; b loop}' | tac
testing
% printf "%s\n" a b a b a b | tac | sed '/a/ {s//c/; :loop; n; b loop}' | tac
a
b
a
b
c
b
Reverse the file, then for the first match, make the substitution and then unconditionally slurp up the rest of the file. Then re-reverse the file.
Note, an empty regex (here as s//c/) means re-use the previous regex (/a/)
I'm not a huge sed fan, beyond very simple programs. I would use awk:
tac file | awk '/a/ && !seen {sub(/a/, "c"); seen=1} 1' | tac

Many good answers here; here's a conceptually simple two-pass sed solution assisted by tail that is POSIX-compliant and doesn't read the whole file into memory, similar to Eran Ben-Natan's approach:
sed "$(sed -n '/a/ =' file | tail -n 1)"' s/a/c/' file
sed -n '/a/=' file outputs the numbers of the lines (function =) matching regex a, and tail -n 1 extracts the output's last line, i.e. the number of the line in file file containing the last occurrence of the regex.
Placing command substitution $(sed -n '/a/=' file | tail -n 1) directly before ' s/a/c' results in an outer sed script such as 3 s/a/c/ (with the sample input), which performs the desired substitution only on the last on which the regex occurred.
If the pattern is not found in the input file, the whole command is an effective no-op.

Another approach:
sed "`grep -n '^a$' a | cut -d \: -f 1 | tail -1`s/a/c/" a
The advantage of this approach is that you run sequentially on the file twice, and not read it to memory. This can be meaningful in large files.

This might work for you (GNU sed):
sed -r '/^PATTERN/!b;:a;$!{N;/^(.*)\n(PATTERN.*)/{h;s//\1/p;g;s//\2/};ba};s/^PATTERN/REPLACEMENT/' file
or another way:
sed '/^PATTERN/{x;/./p;x;h;$ba;d};x;/./{x;H;$ba;d};x;b;:a;x;/./{s/^PATTERN/REPLACEMENT/p;d};x' file
or if you like:
sed -r ':a;$!{N;ba};s/^(.*\n?)PATTERN/\1REPLACEMENT/' file
On reflection, this solution may replace the first two:
sed '/a/,$!b;/a/{x;/./p;x;h};/a/!H;$!d;x;s/^a$/c/M' file
If the regexp is no where to found in the file, the file will pass through unchanged. Once the regex matches, all lines will be stored in the hold space and will be printed when one or both conditions are met. If a subsequent regex is encountered, the contents of the hold space is printed and the latest regex replaces it. At the end of file the first line of the hold space will hold the last matching regex and this can be replaced.

Another one:
tr '\n' ' ' | sed 's/\(.*\)a/\1c/' | tr ' ' '\n'
in action:
$ printf "%s\n" a b a b a b | tr '\n' ' ' | sed 's/\(.*\)a/\1c/' | tr ' ' '\n'
a
b
a
b
c
b

A two-pass solution for when buffering the entire input is intolerable:
sed "$(sed -n /a/= file | sed -n '$s/$/ s,a,c,/p' )" file
(the earlier version of this hit a bug with history expansion encountered on a redhat bash-4.1 install, this way avoids a $!d that was being mistakenly expanded.)
A one-pass solution that buffers as little as possible:
sed '/a/!{1h;1!H};/a/{x;1!p};$!d;g;s/a/c/'
Simplest:
tac | sed '0,/a/ s/a/c/' | tac

Here is all done in one single awk
awk 'FNR==NR {if ($0~/a/) f=NR;next} FNR==f {$0="c"} 1' file file
a
b
c
b
This reads the file twice. First run to find last a, second run to change it.

tac infile.txt | sed "s/a/c/; ta ; b ; :a ; N ; ba" | tac
The first tac reverses the lines of infile.txt, the sed expression (see https://stackoverflow.com/a/9149155/2467140) replaces the first match of 'a' with 'c' and prints the remaining lines, and the last tac reverses the lines back to their original order.

Here is a way with only using awk:
awk '{a[NR]=$1}END{x=NR;cnt=1;while(x>0){a[x]=((a[x]=="a"&&--cnt==0)?"c <===":a[x]);x--};for(i=1;i<=NR;i++)print a[i]}' file
$ cat f
a
b
a
b
f
s
f
e
a
v
$ awk '{a[NR]=$1}END{x=NR;cnt=1;while(x>0){a[x]=((a[x]=="a"&&--cnt==0)?"c <===":a[x]);x--};for(i=1;i<=NR;i++)print a[i]}' f
a
b
a
b
f
s
f
e
c <===
v

It can also be done in perl:
perl -e '#a=reverse<>;END{for(#a){if(/a/){s/a/c/;last}}print reverse #a}' temp > your_new_file
Tested:
> cat temp
a
b
c
a
b
> perl -e '#a=reverse<>;END{for(#a){if(/a/){s/a/c/;last}}print reverse #a}' temp
a
b
c
c
b
>

Here's another option:
sed -e '$ a a' -e '$ d' file
The first command appends an a and the second deletes the last line. From the sed(1) man page:
$ Match the last line.
d Delete pattern space. Start next cycle.
a text Append text, which has each embedded newline preceded by a backslash.

Here's the command:
sed '$s/.*/a/' filename.txt
And here it is in action:
> echo "a
> b
> a
> b" > /tmp/file.txt
> sed '$s/.*/a/' /tmp/file.txt
a
b
a
a

awk-only solution:
awk '/a/{printf "%s", all; all=$0"\n"; next}{all=all $0"\n"} END {sub(/^[^\n]*/,"c",all); printf "%s", all}' file
Explanation:
When a line matches a, all lines between the previous a up to (not including) current a (i.e. the content stored in the variable all) is printed
When a line doesn't match a, it gets appended to the variable all.
The last line matching a would not be able to get its all content printed, so you manually print it out in the END block. Before that though, you can substitute the line matching a with whatever you desire.

Given:
$ cat file
a
b
a
b
You can use POSIX grep to count the matches:
$ grep -c '^a' file
2
Then feed that number into awk to print a replacement:
$ awk -v last=$(grep -c '^a' file) '/^a/ && ++cnt==last{ print "c"; next } 1' file
a
b
c
b

Related

Printing Both Matching and Non-Matching Patterns

I am trying to compare two files to then return one of the files columns upon a match. The code that I am using right now is excluding non-matching patterns and just printed out matching patterns. I need to print all results, both matching and non-matching, using grep.
File 1:
A,42.4,-72.2
B,47.2,-75.9
Z,38.3,-70.7
C,41.7,-95.2
File 2:
F
A
B
Z
C
P
E
Current Result:
A,42.4,-72.2
B,47.2,-75.9
Z,38.3,-70.7
C,41.7,-95.2
Expected Result:
F
A,42.4,-72.2
B,47.2,-75.9
Z,38.3,-70.7
C,41.7,-95.2
P
E
Bash Code:
while IFS=',' read point lat lon; do
check=`grep "${point} /home/aaron/file2 | awk '{print $1}'`
echo "${check},${lat},${lon}"
done < /home/aaron/file1
In awk:
$ awk -F, 'NR==FNR{a[$1]=$0;next}{print ($1 in a?a[$1]:$1)}' file1 file2
F
A,42.4,-72.2
B,47.2,-75.9
Z,38.3,-70.7
C,41.7,-95.2
P
E
Explained:
$ awk -F, ' # field separator to ,
NR==FNR { # file1
a[$1]=$0 # hash record to a, use field 1 as key
next
}
{
print ($1 in a?a[$1]:$1) # print match if found, else nonmatch
}
' file1 file2
If you don't care about order, there's a join binary in GNU coreutils that does just what you need :
$sort file1 > sortedFile1
$sort file2 > sortedFile2
$join -t, -a 2 sortedFile1 sortedFile2
A,42.4,-72.2
B,47.2,-75.9
C,41.7,-95.2
E
F
P
Z,38.3,-70.7
It relies on files being sorted and will not work otherwise.
Now will you please get out of my /home/ ?
another join based solution preserving the order
f() { nl -nln -s, -w1 "$1" | sort -t, -k2; }; join -t, -j2 -a2 <(f file1) <(f file2) |
sort -t, -k2 |
cut -d, -f2 --complement
F
A,42.4,-72.2,2
B,47.2,-75.9,3
Z,38.3,-70.7,4
C,41.7,-95.2,5
P
E
Cannot beat the awk solution but another alternative utilizing unix toolchain based on decorate-undecorate pattern.
Problems with your current solution:
1. You are missing a double-quote in grep "${point} /home/aaron/file2.
2. You should start with the other file for printing all lines in that file
while IFS=',' read point; do
echo "${point}$(grep "${point}" /home/aaron/file1 | sed 's/[^,]*,/,/')"
done < /home/aaron/file2
3. The grep can give more than one result. Which one do you want (head -1) ?
An improvement would be
while IFS=',' read point; do
echo "${point}$(grep "^${point}," /home/aaron/file1 | sed -n '1s/[^,]*,/,/p')"
done < /home/aaron/file2
4. Using while is the wrong approach.
For small files it wil get the work done, but you will get stuck with larger files. The reason is that you will call grep for each line in file2, reading file1 a lot of times.
Better is using awk or some other solution.
Another solution is using sed with the output of another sed command:
sed -r 's#([^,]*),(.*)#s/^\1$/\1,\2/#' /home/aaron/file1
This will give commands for the second sed.
sed -f <(sed -r 's#([^,]*),(.*)#s/^\1$/\1,\2/#' /home/aaron/file1) /home/aaron/file2

Sed : print all lines after match

I got my research result after using sed :
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | cut -f 1 - | grep "pattern"
But it only shows the part that I cut. How can I print all lines after a match ?
I'm using zcat so I cannot use awk.
Thanks.
Edited :
This is my log file :
[01/09/2015 00:00:47] INFO=54646486432154646 from=steve idfrom=55516654455457 to=jone idto=5552045646464 guid=100021623456461451463 n
um=6 text=hi my number is 0 811 22 1/12 status=new survstatus=new
My aim is to find all users that spam my site with their telephone numbers (using grep "pattern") then print all the lines to get all the information about each spam. The problem is there may be matches in INFO or id, so I use sed to get the text first.
Printing all lines after a match in sed:
$ sed -ne '/pattern/,$ p'
# alternatively, if you don't want to print the match:
$ sed -e '1,/pattern/ d'
Filtering lines when pattern matches between "text=" and "status=" can be done with a simple grep, no need for sed and cut:
$ grep 'text=.*pattern.* status='
You can use awk
awk '/pattern/,EOF'
n.b. don't be fooled: EOF is just an uninitialized variable, and by default 0 (false). So that condition cannot be satisfied until the end of file.
Perhaps this could be combined with all the previous answers using awk as well.
Maybe this is what you actually want? Find lines matching "pattern" and extract the field after text= up through just before status=?
zcat file* | sed -e '/pattern/s/.*text=\(.*\)status=[^/]*/\1/'
You are not revealing what pattern actually is -- if it's a variable, you cannot use single quotes around it.
Notice that \(.*\)status=[^/]* would match up through survstatus=new in your example. That is probably not what you want? There doesn't seem to be a status= followed by a slash anywhere -- you really should explain in more detail what you are actually trying to accomplish.
Your question title says "all line after a match" so perhaps you want everything after text=? Then that's simply
sed 's/.*text=//'
i.e. replace up through text= with nothing, and keep the rest. (I trust you can figure out how to change the surrounding script into zcat file* | sed '/pattern/s/.*text=//' ... oops, maybe my trust failed.)
The seldom used branch command will do this for you. Until you match, use n for next then branch to beginning. After match, use n to skip the matching line, then a loop copying the remaining lines.
cat file | sed -n -e ':start; /pattern/b match;n; b start; :match n; :copy; p; n ; b copy'
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | ***cut -f 1 - | grep "pattern"***
instead change the last 2 segments of your pipeline so that:
zcat file* | sed -e 's/.*text=\(.*\)status=[^/]*/\1/' | **awk '$1 ~ "pattern" {print $0}'**

Bash replace '\n\n}' string in file

I've got files repeatedly containing the string \n\n} and I need to replace such string with \n} (removing one of the two newlines).
Since such files are dynamically generated through a bash script, I need to embed replacing code inside the script.
I tried with the following commands, but it doesn't work:
cat file.tex | sed -e 's/\n\n}/\n}/g' # it doesn't work!
cat file.tex | perl -p00e 's/\n\n}/\n}/g' # it doesn't work!
cat file.tex | awk -v RS="" '{gsub (/\n\n}/, "\nb")}1' # it does work, but not for large files
You didn't provide any sample input and expected output so it's a guess but maybe this is what you're looking for:
$ cat file
a
b
c
}
d
$ awk '/^$/{f=1;next} f{if(!/^}/)print "";f=0} 1' file
a
b
c
}
d
a way with sed:
sed -i -n ':a;N;$!ba;s/\n\n}/\n}/g;p' file.tex
details:
:a # defines the label "a"
N # append the next line to the pattern space
$!ba # if it is not the last line, go to label a
s/\n\n}/\n}/g # replace all \n\n} with \n}
p # print
The i parameter will change the file in place.
The n parameter prevents to automatically print the lines.
This Perl command will do as you ask
perl -i -0777 -pe's/\n(?=\n})//g' file.tex
This should work:
cat file.tex | sed -e 's/\\n\\n}/\\n}/g'
if \n\n} is written as raw string.
Or if it's new line:
cat file.tex | sed -e ':a;N;$!ba;s/\n\n}/\n}/g'
Another method:
if the first \n is any new line:
text=$(< file.tex)
text=${text//$'\n\n}'/$'\n}'}
printf "%s\n" "$text" #> file
If the first \n is an empty line:
text=$(< file.tex)
text=${text//$'\n\n\n}'/$'\n\n}'}
printf "%s\n" "$text" #> file
Nix-style line filters process the file line-by-line. Thus, you have to do something extra to process an expression which spans lines.
As mentioned by others, '\n\n' is simply an empty line and matches the regular expression /^$/. Perhaps the most efficient thing to do is to save each empty line until you know whether or not the next one will contain a close bracket at the beginning of the line.
cat file.tex | perl -ne 'if ( $b ) { print $b unless m/^\}/; undef $b; } if ( m/^$/ ) { $b=$_; } else { print; } END { print $b if $b; }'
And to clean it all up we add an END block, to process the case that the last line in the file is blank (and we want to keep it).
If you have access to node you can use rexreplace
npm install -g regreplace
and then run
rexreplace '\n\n\}' '\n\}' myfile.txt
Of if you have more files in a dir data you can do
rexreplace '\n\n\}' '\n\}' data/*.txt

Perl, sed, or awk one-liner to change the format of the file

I need advice on how to change the file formatted following way
file1:
A 504688
B jobnameA
A 504690
B jobnameB
A 504691
B jobnameC
...
into file2:
A B
504688 jobnameA
504690 jobnameB
504691 jobnameC
...
One solution I could think of is:
cat file1 | perl -0777 -p -e 's/\s+B/\t/' | awk '{print $2"\t"$3}'.
But I am wondering if there is more efficient way or already known practice that does this job.
perl -nawe 'print "#F[1 .. $#F]", $F[0] eq "A" ? "\t" : "\n"' < /tmp/ab
Look up the options in perlrun.
Another useful one to add is -l (append newline to print), but not in this case.
Assuming your input file is tab separated:
echo $'A\tB'
cut -f2 filename | paste - -
Should be pretty quick because this is exactly what cut and paste were written to do.
awk '/^A/{num=$2}/^B/{print num,$2}' file
Or, alternately,
awk '{num=$2;getline;print num,$2}' file
Here is an sed solution:
sed -e 'N' -e 's/A\s*\(.*\)\nB\s*\(.*\)/\1\t\2/' file
This version will also print the header at the top:
sed '1{h;s/.*/A\tB/p;g};N;s/A\s*\(.*\)\nB\s*\(.*\)/\1\t\2/' file
Or an alternative:
sed -n '/^A\s*/{s///;h};/^B\s*/{s///;H;g;s/\n/\t/p}' file
If your sed does not support semicolons as a command separator for the alternative:
sed -n '
/^A\s*/{ # if the line starts with "A"
s/// # remove the "A" and the whitespace
h # copy the remainder into the hold space
} # end if
/^B\s*/{ # if the line starts with "B"
s/// # remove the "B" and the whitespace
H # append pattern space to hold space
g # copy hold space to pattern space
s/\n/\t/p # replace newline with tab and print
}' file
This version will also print the header at the top:
sed -n '/^A\s*/{s///;h;1s/.*/A\tB/p};/^B\s*/{s///;H;g;s/\n/\t/p}' file
This will work with any header text, not just fixed A and B >>
awk '{a=$1;b=$2;getline;if(c!=1){print a,$1;c=1};print b,$2}' file1 >file2
...and it will print also header row
If you need \t separator, then use:
awk '{a=$1;b=$2;getline;if(c!=1){print a"\t"$1;c=1};print b"\t"$2}' file1 >file2
This might work for you:
sed -e '1i\A\tB' -e 'N;s/A\s*\(\S*\).*\nB\s*\(\S*\).*/\1\t\2/' file

having a regex replacing across lines, retain the newlines?

I'd like to have a substitute or print style command with a regex working across lines. And lines retained.
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | tr -d '\n' | grep -or 'b.*f'
bcdef
or
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | tr -d '\n' | sed -r 's|b(.*)f|y\1z|'
aycdezg
i'd like to use grep or sed because i'd like to know what people would've done before awk or perl ..
would they not have? was .* not available? had they no other equivalent?
to possibly modify some input with a regex that spans across lines, and print it to stdout or output to a file, retaining the lines.
This should do what you're looking for:
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | sed ':a;$s/b\([^f]*\)f/y\1z/;N;ba'
a
y
c
d
e
z
g
It accumulates all the lines then does the replacement. It looks for the first "f". If you want it to look for the last "f", change [^f] to ..
Note that this may make use of features added to sed after AWK or Perl became available (AWK has been around a looong time).
Edit:
To do a multi-line grep requires only a little modification:
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | sed ':a;$s/^[^b]*\(b[^f]*f\)[^f]*$/\1/;N;ba'
b
c
d
e
f
sed can match across newlines through the use of its N command. For example, the following sed command will replace bar followed a newline followed by foo with ###:
$ echo -e "foo\nbar\nbaz\nqux" | sed 'N;s/bar\nbaz/###/;P;D'
foo
###
qux
The N command will append the next input line to the current pattern space separated by an embedded newline (\n)
The P command will print the current pattern space up to and including the first embedded newline.
The D command will delete up to and including the first embedded newline in the pattern space. It will also start next cycle but skip reading from the input if there is still data in the pattern space.
Through the use of these 3 commands, you can essentially do any sort of s command replacement looking across N-lines.
Edit
If your question is how can I remove the need for tr in the two examples above and just use sed then here you go:
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | sed ':a;N;$!ba;s/\n//g;y/ag/yz/'
ybcdefz
Proven tools to the rescue.
echo -e "foo\nbar\nbaz\nqux" | perl -lpe 'BEGIN{$/=""}s/foo\nbar/###/'