Replace entire paragraph with another from linux command line - c++

The problem I have is pretty straightforward (or so it seems). All I want to do is replace a paragraph of text (it's a header comment) with another paragraph. This will need to happen across a diverse number of files in a directory hierarchy (source code tree).
The paragraph to be replaced must be matched in it's entirety as there are similar text blocks in existence.
e.g.
To Replace
// ----------
// header
// comment
// to be replaced
// ----------
With
// **********
// some replacement
// text
// that could have any
// format
// **********
I have looked at using sed and from what I can tell the most number of lines that it can work on is 2 (with the N command).
My question is: what is the way to do this from the linux command line?
EDIT:
Solution obtained: Best solution was Ikegami's, fully command line and best fit for what I wanted to do.
My final solution required some tweaking; the input data contained a lot of special characters as did the replace data. To deal with this the data needs to be pre processed to insert appropriate \n's and escape characters. The end product is a shell script that takes 3 arguments; File containing text to search for, File containing text to replace with and a folder to recursively parse for files with .cc and .h extension. It's fairly easy to customise from here.
SCRIPT:
#!/bin/bash
if [ -z $1 ]; then
echo 'First parameter is a path to a file that contains the excerpt to be replaced, this must be supplied'
exit 1
fi
if [ -z $2 ]; then
echo 'Second parameter is a path to a file contaiing the text to replace with, this must be supplied'
exit 1
fi
if [ -z $3 ]; then
echo 'Third parameter is the path to the folder to recursively parse and replace in'
exit 1
fi
sed 's!\([]()|\*\$\/&[]\)!\\\1!g' $1 > temp.out
sed ':a;N;$!ba;s/\n/\\n/g' temp.out > final.out
searchString=`cat final.out`
sed 's!\([]|\[]\)!\\\1!g' $2 > replace.out
replaceString=`cat replace.out`
find $3 -regex ".*\.\(cc\|h\)" -execdir perl -i -0777pe "s{$searchString}{$replaceString}" {} +

find -name '*.pm' -exec perl -i~ -0777pe'
s{// ----------\n// header\n// comment\n// to be replaced\n// ----------\n}
{// **********\n// some replacement\n// text\n// that could have any\n// format\n// **********\n};
' {} +

Using perl:
#!/usr/bin/env perl
# script.pl
use strict;
use warnings;
use Inline::Files;
my $lines = join '', <STDIN>; # read stdin
my $repl = join '', <REPL>; # read replacement
my $src = join '', <SRC>; # read source
chomp $repl; # remove trailing \n from $repl
chomp $src; # id. for $src
$lines =~ s#$src#$repl#gm; # global multiline replace
print $lines; # print output
__SRC__
// ----------
// header
// comment
// to be replaced
// ----------
__REPL__
// **********
// some replacement
// text
// that could have any
// format
// **********
Usage: ./script.pl < yourfile.cpp > output.cpp
Requirements: Inline::Files (install from cpan)
Tested on: perl v5.12.4, Linux _ 3.0.0-12-generic #20-Ubuntu SMP Fri Oct 7 14:56:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

This might work:
# cat <<! | sed ':a;N;s/this\nand\nthis\n/something\nelse\n/;ba'
> a
> b
> c
> this
> and
> this
> d
> e
> this
> not
> this
> f
> g
> !
a
b
c
something
else
d
e
this
not
this
f
g
The trick is to slurp everything into the pattern space using the N and the loop :a;...;ba
This is probably more efficient:
sed '1{h;d};H;$!d;x;s/this\nand\nthis\n/something\nelse\n/g;p;d'
A more general purpose solution may use files for match and substitute data like so:
match=$(sed ':a;N;${s/\n/\\n/g};ba;' match_file)
substitute=$(sed ':a;N;${s/\n/\\n/g};ba;' substitute_file)
sed '1{h;d};H;$!d;x;s/'"$match"'/'"$substitute"'/g;p;d' source_file
Another way (probably less efficient) but cleaner looking:
sed -s '$s/$/\n###/' match_file substitute_file |
sed -r '1{h;d};H;${x;:a;s/^((.*)###\n(.*)###\n(.*))\2/\1\3/;ta;s/(.*###\n){2}//;p};d' - source_file
The last uses the GNU sed --separate option to treat each file as a separate entity. The second sed command uses a loop for the substitute to obviate .* greediness.

As long as the header comments are delimited uniquely (i.e., no other header comment starts with // ----------), and the replacement text is constant, the following awk script should do what you need:
BEGIN { normal = 1 }
/\/\/ ----------/ {
if (normal) {
normal = 0;
print "// **********";
print "// some replacement";
print "// text";
print "// that could have any";
print "// format";
print "// **********";
} else {
normal = 1;
next;
}
}
{
if (normal) print;
}
This prints everything it sees until it runs into the paragraph delimiter. When it sees the first one, it prints out the replacement paragraph. Until it sees the 2nd paragraph delimiter, it will print nothing. When it sees the 2nd paragraph delimiter, it will start printing lines normally again with the next line.
While you can technically do this from the command line, you may run into tricky shell quoting issues, especially if the replacement text has any single quotes. It may be easier to put the script in a file. Just put #!/usr/bin/awk -f (or whatever path which awk returns) at the top.
EDIT
To match multiple lines in awk, you'll need to use getline. Perhaps something like this:
/\/\/ ----------/ {
lines[0] = "// header";
lines[1] = "// comment";
lines[2] = "// to be replaced";
lines[3] = "// ----------";
linesRead = $0 "\n";
for (i = 0; i < 4; i++) {
getline line;
linesRead = linesRead line;
if (line != lines[i]) {
print linesRead; # print partial matches
next;
}
}
# print the replacement paragraph here
next;
}

Related

How can I delete the lines starting with "//" (e.g., file header) which are at the beginning of a file?

I want to delete the header from all the files, and the header has the lines starting with //.
If I want to delete all the lines that starts with //, I can do following:
sed '/^\/\//d'
But, that is not something I need to do. I just need to delete the lines in the beginning of the file that starts with //.
Sample file:
// This is the header
// This should be deleted
print "Hi"
// This should not be deleted
print "Hello"
Expected output:
print "Hi"
// This should not be deleted
print "Hello"
Update:
If there is a new line in the beginning or in-between, it doesn't work. Is there any way to take care of that scenario?
Sample file:
< new empty line >
// This is the header
< new empty line >
// This should be deleted
print "Hi"
// This should not be deleted
print "Hello"
Expected output:
print "Hi"
// This should not be deleted
print "Hello"
Can someone suggest a way to do this? Thanks in advance!
Update: The accepted answer works well for white space in the beginning or in-between.
Could you please try following. This also takes care of new line scenario too, written and tested in https://ideone.com/IKN3QR
awk '
(NF == 0 || /^[[:blank:]]*\/\//) && !found{
next
}
NF{
found=1
}
1
' Input_file
Explanation: Simply checking conditions if a line either is empty OR starting from // AND variable found is NULL then simply skip those lines. Once any line without // found then setting variable found here so all next coming lines should be printed from line where it's get set to till end of Input_file printed.
With sed:
sed -n '1{:a; /^[[:space:]]*\/\/\|^$/ {n; ba}};p' file
print "Hi"
// This should not be deleted
print "Hello"
Slightly shorter version with GNU sed:
sed -nE '1{:a; /^\s*\/\/|^$/ {n; ba}};p' file
Explanation:
1 { # execute this block on the fist line only
:a; # this is a label
/^\s*\/\/|^$/ { n; # on lines matching `^\s*\/\/` or `^$`, do: read the next line
ba } # and go to label :a
}; # end block
p # print line unchanged:
# we only get here after the header or when it's not found
sed -n makes sed not print any lines without the p command.
Edit: updated the pattern to also skip empty lines.
I sounds like you just want to start printing from the first line that's neither blank nor just a comment:
$ awk 'NF && ($1 !~ "^//"){f=1} f' file
print "Hi"
// This should not be deleted
print "Hello"
The above simply sets a flag f when it finds such a line and prints every line from then on. It will work using any awk in any shell on every UNIX box.
Note that, unlike some of the potential solutions posted, it doesn't store more than 1 line at a time in memory and so will work no matter how large your input file is.
It was tested against this input:
$ cat file
// This is the header
// This should be deleted
print "Hi"
// This should not be deleted
print "Hello"
To run the above on many files at once and modify each file as you go is this with GNU awk:
awk -i inplace 'NF && ($1 !~ "^//"){f=1} f' *
and this with any awk:
ip_awk() { local f t=$(mktemp) && for f in "${#:2}"; do awk "$1" "$f" > "$t" && mv -- "$t" "$f"; done; }
ip_awk 'NF && ($1 !~ "^//"){f=1} f' *
In case perl is available then this may also work in slurp mode:
perl -0777 -pe 's~\A(?:\h*(?://.*)?\R+)+~~' file
\A will only match start of the file and (?:\h*(?://.*)?\R+)+ will match 1 or more lines that are blank or have // with optional leading spaces.
With GNU sed:
sed -i -Ez 's/^((\/\/[^\n]*|\s*)\n)+//' file
The ^((\/\/[^\n]*|\s*)\n)+ expression will match one or more lines starting with //, also matching blank lines, only at the start of the file.
Using ed (the file editor that the stream editor sed is based on),
printf '1,/^[^/]/ g|^\(//.*\)\{0,1\}$| d\nw\n' | ed tmp.txt
Some explanations are probably in order.
ed takes the name of the file to edit as an argument, and reads commands from standard input. Each command is terminated by a newline. (You could also read commands from a here document, rather than from printf via a pipe.)
1,/^[^/]/ addresses the first lines in the file, up to and including the first one that does not start with /. (All the lines you want to delete will be included in this set.)
g|^\(//.*\)\{0,1\}$|d deletes all the addressed lines that are either empty or do start with //.
w saves the changes.
Step 2 is a bit ugly; unfortunately, ed does not support regular expression operators you may take for granted, like ? or |. Breaking the regular expression down a bit:
^ matches the start of the line.
//.* matches // followed by zero or more characters.
\(//.*\)\{0,1\} matches the preceding regular expression 0 or 1 times (i.e., optionally)
$ matches the end of the line.

Replace a block of text

I have a file in this pattern:
Some text
---
## [Unreleased]
More text here
I need to replace the text between '---' and '## [Unreleased]' with something else in a shell script.
How can it be achieved using sed or awk?
Perl to the rescue!
perl -lne 'my #replacement = ("First line", "Second line");
if ($p = (/^---$/ .. /^## \[Unreleased\]/)) {
print $replacement[$p-1];
} else { print }'
The flip-flop operator .. tells you whether you're between the two strings, moreover, it returns the line number relative to the range.
This might work for you (GNU sed):
sed '/^---/,/^## \[Unreleased\]/c\something else' file
Change the lines between two regexp to the required string.
This example may help you.
$ cat f
Some text
---
## [Unreleased]
More text here
$ seq 1 5 >mydata.txt
$ cat mydata.txt
1
2
3
4
5
$ awk '/^---/{f=1; while(getline < c)print;close(c);next}/^## \[Unreleased\]/{f=0;next}!f' c="mydata.txt" f
Some text
1
2
3
4
5
More text here
awk -v RS="\0" 'gsub(/---\n\n## \[Unreleased\]\n/,"something")+1' file
give this line a try.
An awk solution that:
is portable (POSIX-compliant).
can deal with any number of lines between the start line and the end line of the block, and potentially with multiple blocks (although they'd all be replaced with the same text).
reads the file line by line (as opposed to reading the entire file at once).
awk -v new='something else' '
/^---$/ { f=1; next } # Block start: set flag, skip line
f && /^## \[Unreleased\]$/ { f=0; print new; next } # Block end: unset flag, print new txt
! f # Print line, if before or after block
' file

Remove matching and previous line

I need to remove a line containing "not a dynamic executable" and a previous line from a stream using grep, awk, sed or something other. My current working solution would be to tr the entire stream to strip off newlines then replace the newline preceding my match with something else using sed then use tr to add the newlines back in and then use grep -v. I'm somewhat weary of artifacts with this approach, but I don't see how else I can to it at the moment:
tr '\n' '|' | sed 's/|\tnot a dynamic executable/__MY_REMOVE/g' | tr '|' '\n'
EDIT:
Input is a list of mixed files piped to xargs ldd, basically I want to ignore all output about non library files since that has nothing to do with what I'm doing next. I didn't want to use lib*.so mask since that could concievably be different
Most simply with pcregrep in multi-line mode:
pcregrep -vM '\n\tnot a dynamic executable' filename
If pcregrep is not available to you, then awk or sed can also do this by reading one line ahead and skipping the printing of previous lines when a marker line appears.
You could be boring (and sensible) with awk:
awk '/^\tnot a dynamic executable/ { flag = 1; next } !flag && NR > 1 { print lastline; } { flag = 0; lastline = $0 } END { if(!flag) print }' filename
That is:
/^\tnot a dynamic executable/ { # in lines that start with the marker
flag = 1 # set a flag
next # and do nothing (do not print the last line)
}
!flag && NR > 1 { # if the last line was not flagged and
# is not the first line
print lastline # print it
}
{ # and if you got this far,
flag = 0 # unset the flag
lastline = $0 # and remember the line to be possibly
# printed.
}
END { # in the end
if(!flag) print # print the last line if it was not flagged
}
But sed is fun:
sed ':a; $! { N; /\n\tnot a dynamic executable/ d; P; s/.*\n//; ba }' filename
Explanation:
:a # jump label
$! { # unless we reached the end of the input:
N # fetch the next line, append it
/\n\tnot a dynamic executable/ d # if the result contains a newline followed
# by "\tnot a dynamic executable", discard
# the pattern space and start at the top
# with the next line. This effectively
# removes the matching line and the one
# before it from the output.
# Otherwise:
P # print the pattern space up to the newline
s/.*\n// # remove the stuff we just printed from
# the pattern space, so that only the
# second line is in it
ba # and go to a
}
# and at the end, drop off here to print
# the last line (unless it was discarded).
Or, if the file is small enough to be completely stored in memory:
sed ':a $!{N;ba}; s/[^\n]*\n\tnot a dynamic executable[^\n]*\n//g' filename
Where
:a $!{ N; ba } # read the whole file into
# the pattern space
s/[^\n]*\n\tnot a dynamic executable[^\n]*\n//g # and cut out the offending bit.
This might work for you (GNU sed):
sed 'N;/\n.*not a dynamic executable/d;P;D' file
This keeps a moving window of 2 lines and deletes them both if the desired string is found in the second. If not the first line is printed and then deleted and then next line appended and the process repeated.
Always keep in mind that while grep and sed are line-oriented awk is record-oriented and so can easily handle problems that span multiple lines.
It's a guess given you didn't post any sample input and expected output but it sounds like all you need is (using GNU awk for multi-char RS):
awk -v RS='^$' -v ORS= '{gsub(/[^\n]+\n\tnot a dynamic executable/,"")}1' file

Print several lines between patterns (first pattern not unique)

Need help with sed/awk/grep/whatever could solve my task.
I have a large file and I need to extract multiple sequential lines from it.
I have start pattern: <DN>
and end pattern: </GR>
and several lines in between, like this:
<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>
<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>
I've tried this:
sed -n '/\<DN\>/,/\<\/GR\>/p'
and several other ones (using awk and sed).
It works okay, but the problem is that the source file may contain lines starting with <DN> and without </GR> in the end of the bunch of lines, and then starts a part with another and normal in the end:
<DN>234</DN> - unneded DN
<AB>sdfsd</AB>
<DC>456456</DC>
<EF>6575675 sdfsd</EF>
....really large piece of unwanted text here....
<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>
<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>
<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>
How can I extract only needed lines and ignore garbage pieces of log, containing <DN> without ending </GR>?
And next, I need to convert a multiline pieces from <DN> to </GR> to a file with single lines, starting with <DN> and ending with </GR>.
Any help would be appreciated. I'm stuck
This might work for you (GNU sed):
sed -n '/<DN>/{h;b};x;/./G;x;/<\/GR/{x;/./p;z;x}' file
Use the hold space to store lines between <DN> and </GR>.
awk '
# Lines that start with '<DN>' start our matching.
/^<DN>/ {
# If we saw a start without a matching end throw everything we've saved away.
if (dn) {
d=""
}
# Mark being in a '<DN>' element.
dn=1
# Save the current line.
d=$0
next
}
# Lines that end with '</GR>$' end our matching (but only if we are currently in a match).
dn && /<\/GR>$/ {
# We aren't in a <DN> element anymore.
dn=0
# Print out the lines we've saved and the current line.
printf "%s%s%s\n", d, OFS, $0
# Reset our saved contents.
d=""
next
}
# If we are in a <DN> element and have saved contents append the current line to the contents (separated by OFS).
dn && d {
d=d OFS $0
}
' file
awk '
/^<DN>/ { n = 1 }
n { lines[n++] = $0 }
n && /<\/GR>$/ {
for (i=1; i<n; i++) printf "%s", lines[i]
print ""
n = 0
}
' file
with bash:
fun ()
{
local line output;
while IFS= read -r line; do
if [[ $line =~ ^'<DN>' ]]; then
output=$line;
else
if [[ -n $output ]]; then
output=$output$'\n'$line;
if [[ $line =~ '</GR>'$ ]]; then
echo "$output";
output=;
fi;
fi;
fi;
done
}
fun <file
You could use pcregrep tool for this.
$ pcregrep -o -M '(?s)(?<=^|\s)<DN>(?:(?!<DN>).)*?</GR>(?=\n|$)' file
<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>
<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>

sed/awk replace in all matches

I want to invert all the color values in a bunch of files. The colors are all in the hex format #ff3300 so the inversion could be done characterwise with the sed command
y/0123456789abcdef/fedcba9876543210/
How can I loop through all the color matches and do the char translation in sed or awk?
EDIT:
sample input:
random text... #ffffff_random_text_#000000__
asdf#00ff00
asdfghj
desired output:
random text... #000000_random_text_#ffffff__
asdf#ff00ff
asdfghj
EDIT: I changed my response as per your edit.
OK, sed may result in a difficult processing. awk could do the trick more or less easily, but I find perl much more easy for this task:
$ perl -pe 's/#[0-9a-f]+/$&=~tr%0123456789abcdef%fedcba9876543210%r/ge' <infile >outfile
Basically you find the pattern, then execute the right-hand side, which executes the tr on the match, and substitutes the value there.
The inversion is really a subtraction. To invert a hex, you just subtract it from ffffff.
With this in mind, you can build a simple script to process each line, extract hexes, invert them, and inject them back to the line.
This is using Bash (see arrays, printf -v, += etc) only (no external tools there):
#!/usr/bin/env bash
[[ -f $1 ]] || { printf "error: cannot find file: %s\n" "$1" >&2; exit 1; }
while read -r; do
# split line with '#' as separator
IFS='#' toks=( $REPLY )
for tok in "${toks[#]}"; do
# extract hex
read -n6 hex <<< "$tok"
# is it really a hex ?
if [[ $hex =~ [0-9a-fA-F]{6} ]]; then
# compute inversion
inv="$((16#ffffff - 16#$hex))"
# zero pad the result
printf -v inv "%06x" "$inv"
# replace hex with inv
tok="${tok/$hex/$inv}"
fi
# build the modified line
line+="#$tok"
done
# print the modified line and clean it for reuse
printf "%s\n" "${line#\#}"
unset line
done < "$1"
use it like:
$ ./invhex infile > outfile
test case input:
random text... #ffffff_random_text_#000000__
asdf#00ff00
bdf#cvb_foo
asdfghj
#bdfg
processed output:
random text... #000000_random_text_#ffffff__
asdf#ff00ff
bdf#cvb_foo
asdfghj
#bdfg
This might work for you (GNU sed):
sed '/#[a-f0-9]\{6\}\>/!b
s//\n&/g
h
s/[^\n]*\(\n.\{7\}\)[^\n]*/\1/g
y/0123456789abcdef/fedcba9876543210/
H
g
:a;s/\n.\{7\}\(.*\n\)\n\(.\{7\}\)/\2\1/;ta
s/\n//' file
Explanation:
/#[a-f0-9]\{6\}\>/!b bail out on lines not containing the required pattern
s//\n&/g prepend every pattern with a newline
h copy this to the hold space
s/[^\n]*\(\n.\{7\}\)[^\n]*/\1/g delete everything but the required pattern(s)
y/0123456789abcdef/fedcba9876543210/ transform the pattern(s)
H append the new pattern(s) to the hold space
g overwrite the pattern space with the contents of the hold space
:a;s/\n.\{7\}\(.*\n\)\n\(.\{7\}\)/\2\1/;ta replace the old pattern(s) with the new.
s/\n// remove the newline artifact from the H command.
This works...
cat test.txt |sed -e 's/\#\([0123456789abcdef]\{6\}\)/\n\#\1\n/g' |sed -e ' /^#.*/ y/0123456789abcdef/fedcba9876543210/' | awk '{lastType=type;type= substr($0,1,1)=="#";} type==lastType && length(line)>0 {print line;line=$0} type!=lastType {line=line$0} length(line)==0 {line=$0} END {print line}'
The first sed command inserts line breaks around the hex codes, then it is possible to make the substitution on all lines starting with a hash. There are probably an elegant solution to merge the lines back again, but the awk command does the job. The only assumption there is that there won't be two hex-codes following directly after each other. If so, this step has to be revised.