Sed replace pattern with file contents - regex

I would like to use sed (I think?) to replace a pattern with file contents.
Example
File 1 (primary file)
Hello <<CODE>>
Goodbye.
File 2 (contents)
Anonymous person,
I am very nice.
File 3 (target)
Hello Anonymous person,
I am very nice.
Goodbye.
Right now, I am using this command:
sed "/<<CODE>>/{r file2
:a;n;ba}" file1 | \
sed "s/<<CODE>>//g" > \
file3
But this outputs:
Hello
Anonymous person,
I am very nice.
Goodbye.
(note the newline after Hello)
How can I do this without getting that newline?
(note that file2 may contain all sorts of things: brackets, newlines, quotes, ...)

Much simpler to use awk:
awk 'FNR==NR{s=(!s)?$0:s RS $0;next} /<<CODE>>/{sub(/<<CODE>>/, s)} 1' file2 file1
Hello Anonymous person,
I am very nice.
Goodbye.
Explanation:
FNR==NR - Execute this block for first file in input i.e. file2
s=(!s)?$0:s RS $0 - Concatenate whole file content in string s
next - Read next line until EOF on first file
/<<CODE>>/ - If a line with <<CODE>> is found execute that block
sub(/<<CODE>>/, s) - Replace <<CODE>> with string s (data of file2)
1 - print the output
EDIT: Non-regex way:
awk 'FNR==NR{s=(!s)?$0:s RS $0; next}
i=index($0, "<<CODE>>"){$0=substr($0, 1, i-1) s substr($0, i+8)} 1' file2 file1

The awk though it might be harder to read, is probably the right way to go.
Just for comparison sake, here's a ruby version
ruby -ne 'BEGIN{#body=File.open("file2").read}; puts gsub(/<<CODE>>/,#body);' < file1

not too bad in Perl:
cat file1 | perl -e "open FH, qq(file2); \$f2=join '', <FH>; chomp \$f2; map {s/<<CODE>>/\$f2/g; print \$_} <STDIN>" > file3
(maybe i am not the best perl coder)
it's straight-forward. read the whole file2 in, substitute it, then print.

You could do simply with:
awk '{gsub("<<CODE>>", filetwo)}1' filetwo="$(<file2)" file1

I am thinking about a solution where we use Bash to evaluate a cat process outside of the single quotes delimitating the sed script, but unfortunately the following doesn't work as soon as your file2 contains a newline:
sed 's/<<CODE>>/'"$(cat file2)"'/' file1
It accepts spaces in file2, but not newlines, as you can see in the following piece of code that works well, whatever the content of file2:
sed 's/<<CODE>>/'"$(cat file2 | tr -d '\n')"'/' file1
But, obviously, this modifies the content of the file before inclusion, which is simply bad. :-(
So if you want to play, you can first tr the newlines to some weird unexpectable character, and translate those back after sed has worked on its script:
sed 's/<<CODE>>/'"$(cat file2 | tr '\n' '\3')"'/' file1 | tr '\3' '\n'

Related

Replace string between two words with the content of another text file

I am bit new to sed and regex.
I was trying to edit a text file, where I want to replace the contents between two keywords in the first file with the entire contents of another text file
it should like this -
keyword1 inbetweenstuff keyword2
to this
keyword1 textfromfile2 keyword2
I was trying this command, but no luck
sed -i 's/(keyword1).*(keyword2)/\1 contentsoffile2 \2/g' file1.txt
You're trying to use the wrong tool. sed is for simple substitutions on individual lines (s/old/new/), that is all. For anything more interesting you should be using awk.
With GNU awk for multi-char RS, gensub(), and the 3rd arg for match():
$ cat file1
keyword1 IN BETWEEN
STUFF ON
ONE OR MORE
LINES keyword2
$ cat file2
NOW IS
THE WINTER OF
OUR DISCONTENT
$ cat tst.awk
BEGIN { RS="^$"; ORS="" }
NR==FNR { new = gensub(/\n$/,"",""); next }
match($0,/(.*keyword1 ).*( keyword2.*)/,a) { print a[1] new a[2] }
$ awk -f tst.awk file2 file1
keyword1 NOW IS
THE WINTER OF
OUR DISCONTENT keyword2
Note that the above treats the contents of file2 as a literal string so the contents of "file2" can be anything. Try any of the sed solutions if "file2" contains an &, for example (or \1 or / or ...). It also doesn't care how many lines are in file2 or how many lines are between the keywords in file1.
Okey here is a ready to use solution:
$ sed -i "s/\(keyword1\).*\(keyword2\)/\1 `cat file2` \2/g" file1
It reads from file2 and replaces text between two keywords inside file1 (works only if contents in file2 is not multiline).
This might work for you (GNU sed):
sed -e '/keyword1\s*/{:a;/\s*keyword2/!{N;ba};s/\n//g;s/keyword1\s*/&\n/;s/\s*keyword2/\n&/;P;e cat inserted_file' -e 's/.*\n//}' file
This looks for keyword1 and keeps that line and perhaps subsequent lines upto keyword2 in the pattern space. Then all newlines are deleted and newlines inserted after keyword1 and before keyword2. The part line before keyword1 is then printed, followed by the inserted_file and lastly then from keyword2 to the end of its line.
This will surround the inserted_file with newlines. If these are not required then post process that file with:
sed -r 'N;s/(keyword1\s*)\n/\1/;s/\n(\s*keyword2)/\1/;P;D' new_file

Pipe awk's results to sed (deletion)

I am using an awk command (someawkcommand) that prints these lines (awkoutput):
>Genome1
ATGCAAAAG
CAATAA
and then, I want to use this output (awkoutput) as the input of a sed command. Something like that:
someawkcommand | sed 's/awkoutput//g' file1.txt > results.txt
file1.txt:
>Genome1
ATGCAAAAG
CAATAA
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
The final objective is to delete all lines in a file (file1.txt) containing the exact pattern found previously by awk.
The file results.txt contains (output of sed):
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
How should I write the sed command? Is there any simple way that sed will recognize the output of awk as its input?
Using GNU awk for multi-char RS:
$ cat file1
>Genome1
ATGCAAAAG
CAATAA
$ cat file2
>Genome1
ATGCAAAAG
CAATAA
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
$ gawk -v RS='^$' -v ORS= 'NR==FNR{rmv=$0;next} {sub(rmv,"")} 1' file1 file2
>Genome2
ATGAAAAA
AAAAAAAA
CAA
>Genome3
ACCC
The stuff that might be non-obvious to newcomers but are very common awk idioms:
-v RS='^$' tells awk to read the whole file as one string (instead of it's default one line at a time).
-v ORS= sets the Output Record Separator to the null string (instead of it's default newline) so that when the file is printed as a string awk doesn't add a newline after it.
NR==FNR is a condition that is only true for the first input file.
1 is a true condition invoking the default action of printing the current record.
Here is a possible sed solution:
someawkcommand | sed -n 's_.*_/&/d;_;H;${x;s_\n__g p}' | sed -f - file1.txt
First sed command turns output from someawkcommand into a sed expression.
Concretely, it turns
>Genome1
ATGCAAAAG
CAATAA
into:
/>Genome1/d;/ATGCAAAAG/d;/CAATAA/d;
(in sed language: delete lines containing those patterns; mind that you will have to escape /,[,],*,^,$ in your awk output if there are some, with another substitution for instance).
Second sed command reads it as input expression (-f - reads sed commands from file -, i.e. gets it from pipe) and applies to file file1.txt.
Remark for other readers:
OP wants to use sed, but as notified in comments, it may not be the easiest way to solve this question. Deleting lines with awk could be simpler. Another (easy) solution could be to use grep with -v (invert match) and -f (read patterns from files) options, in this way:
someawkcommand | grep -v -f - file1.txt
Edit: Following #rici's comments, here is a new command that takes output from awk as a single multiline pattern.
Disclaimer: It gets dirty. Kids, don't do it home. Grown-ups are strongly encouraged to consider avoiding sed for that.
someawkcommand | \
sed -n 'H;${x;s_\n__;s_\n_\\n_g;s_.*_H;${x;s/\\n//;s/&//g p}_ p}' | \
sed -n -f - file1.txt
Output from inner sed is:
H;${x;s/\n//;s/>Genome1\nATGCAAAAG\nCAATAA//g p}
Additional drawback: it will add an empty line instead of removed pattern. Can't fix it easily (problems if pattern is at beginning/end of file). Add a substitution to remove it if you really feel like it.
This is can more easily be done in awk, but the usual "eliminate duplicates" code is not correct. As I understand the question, the goal is to remove entire stanzas from the file.
Here's a possible solution which assumes that the first awk script outputs a single stanza:
awk 'NR == FNR {stanza[nstanza++] = $0; next}
$0 == stanza[i] {++i; next}
/^>/ && i == nstanza {i=0; next}
i {for (j=0; j<i; ++j) print stanza[j]; i=0}
{print $0;}
' <(someawkcommand) file1.txt
This might work for you (GNU sed):
sed '1{h;s/.*/:a;$!{N;ba}/p;d};/^>/!{H;$!d};x;s/\n/\\n/g;s|.*|s/&\\n*//g|p;$s|.*|s/\\n*$//|p;x;h;d' file1
sed -f - file2
This builds a script from file1 and then runs it against file2.
The script slurps in file2 and then does a gobal substitution(s) using the contents of file1. Finally it removes any blank lines at the end file caused by the contents deletion.
To see the script produced from file1, remove the pipe and the second sed command.
An alternative way would be to use diff and sed:
diff -e file2 file1 | sed 's/d/p/g' | sed -nf - file2

AWK replace $0 of second file when match few columns

How I merge two files when two first columns match in both files and replace first file values with second file columns... I mean...
Same number of columns:
FILE 1:
121212,0100,1.1,1.2,
121212,0200,2.1,2.2,
FILE 2:
121212,0100,3.1,3.2,3.3,
121212,0130,4.1,4.2,4.3,
121212,0200,5.1,5.2,5.3,
121212,0230,6.1,6.2,6.3,
OUTPUT:
121212,0100,3.1,3.2,3.3,
121212,0200,5.1,5.2,5.3,
In other words, I need to print $0 of the second file when match $1 and $2 in both files. I understand the logic but I can't implement it using arrays. That apparently should be used.
Please take a moment to explain any code.
Use awk to print the first 2 fields in the pattern file and pipe to grep to do the match:
$ awk 'BEGIN{OFS=FS=","}{print $1,$2}' file1 | grep -f - file2
121212,0100,3.1,3.2,3.3,
121212,0200,5.1,5.2,5.3,
The -f option tells grep to take the pattern from a file but using - instead of a filename makes grep take the patterns from stdin.
So the first awk script produces the patterns from file1 which we pipe to match against in file2 using grep:
$ awk 'BEGIN{OFS=FS=","}{print $1,$2}' file1
121212,0100
121212,0200
You probably want to anchor the match to the beginning of the line using ^:
$ awk 'BEGIN{OFS=FS=","}{print "^"$1,$2}' file1
^121212,0100
^121212,0200
$ awk 'BEGIN{OFS=FS=","}{print "^"$1,$2}' file1 | grep -f - file2
121212,0100,3.1,3.2,3.3,
121212,0200,5.1,5.2,5.3,
Here's one way using awk:
awk -F, 'FNR==NR { a[$1,$2]; next } ($1,$2) in a' file1 file2
Results:
121212,0100,3.1,3.2,3.3,
121212,0200,5.1,5.2,5.3,

Perl, sed, or awk one-liner to change the format of the file

I need advice on how to change the file formatted following way
file1:
A 504688
B jobnameA
A 504690
B jobnameB
A 504691
B jobnameC
...
into file2:
A B
504688 jobnameA
504690 jobnameB
504691 jobnameC
...
One solution I could think of is:
cat file1 | perl -0777 -p -e 's/\s+B/\t/' | awk '{print $2"\t"$3}'.
But I am wondering if there is more efficient way or already known practice that does this job.
perl -nawe 'print "#F[1 .. $#F]", $F[0] eq "A" ? "\t" : "\n"' < /tmp/ab
Look up the options in perlrun.
Another useful one to add is -l (append newline to print), but not in this case.
Assuming your input file is tab separated:
echo $'A\tB'
cut -f2 filename | paste - -
Should be pretty quick because this is exactly what cut and paste were written to do.
awk '/^A/{num=$2}/^B/{print num,$2}' file
Or, alternately,
awk '{num=$2;getline;print num,$2}' file
Here is an sed solution:
sed -e 'N' -e 's/A\s*\(.*\)\nB\s*\(.*\)/\1\t\2/' file
This version will also print the header at the top:
sed '1{h;s/.*/A\tB/p;g};N;s/A\s*\(.*\)\nB\s*\(.*\)/\1\t\2/' file
Or an alternative:
sed -n '/^A\s*/{s///;h};/^B\s*/{s///;H;g;s/\n/\t/p}' file
If your sed does not support semicolons as a command separator for the alternative:
sed -n '
/^A\s*/{ # if the line starts with "A"
s/// # remove the "A" and the whitespace
h # copy the remainder into the hold space
} # end if
/^B\s*/{ # if the line starts with "B"
s/// # remove the "B" and the whitespace
H # append pattern space to hold space
g # copy hold space to pattern space
s/\n/\t/p # replace newline with tab and print
}' file
This version will also print the header at the top:
sed -n '/^A\s*/{s///;h;1s/.*/A\tB/p};/^B\s*/{s///;H;g;s/\n/\t/p}' file
This will work with any header text, not just fixed A and B >>
awk '{a=$1;b=$2;getline;if(c!=1){print a,$1;c=1};print b,$2}' file1 >file2
...and it will print also header row
If you need \t separator, then use:
awk '{a=$1;b=$2;getline;if(c!=1){print a"\t"$1;c=1};print b"\t"$2}' file1 >file2
This might work for you:
sed -e '1i\A\tB' -e 'N;s/A\s*\(\S*\).*\nB\s*\(\S*\).*/\1\t\2/' file

sed one-liner to remove all single newlines?

So for example,
A paragraph's newlines would be removed let's say
it contained only single
newlines.
Then the things I would want to skip out:
However.
Our previous pair of newlines wouldn't.
It’s not a sed solution — although you can always run any sed through s2p of course — but a very easy solution using perl is:
% perl -i.orig -ne 'print unless /^$/' file1 file2 file3
That has the advantage of being extensible to any whitespace on the otherwise blank lines, like spaces and tabs:
% perl -i.orig -ne 'print unless /^\s*$/' file1 file2 file3
In the event that have files with various line endings, like CR or CRLF, you could also do this, assuming you are running perl 5.10 or better:
% perl -0777 -i.orig -ne 's/\R+/\n/' file1 file2 file3
which will normalize all sequences of one or more Unicode line separators into single newlines.
If you have UTF‑8 files that might have (for example) U+00A0 NON-BREAK SPACE in them on otherwise empty lines, you can handle them by telling perl that they are UTF‑8 using the ‑CSD command-line switch:
% perl -CSD -i.orig -ne 'print unless /^\s*$/' file1 file2 file3
UPDATE
I’m really unclear what you mean by removing a paragraph. I think you just mean joining up lines in a paragraph.
If so — if what you want to do is squeeze newlines from paragraphs, then you want to do this:
% perl -i.orig -00 -ple 's/\s*\n\s*/ /g' file1 file2 file3
It may not look like it works, but it does: try it.
Here's a sed solution.
$ sed -n -e '1{${p;b};h;b};/^$/!{H;$!b};x;s/\(.\)\n/\1 /g;p' 5751270.txt
A paragraph would be removed let's say it contained only single newlines.
However.
Our previous pair of newlines wouldn't.
You can try this bash script
#!/bin/bash
exec 8<"file"
while read -r line <&8
do
if (( ${#line} > 0 )); then
read -r next <&8
if (( ${#next} > 0 ));then
continue
else
echo "$line"
echo "$next"
fi
fi
done
exec <&8-