Replace a block of text - regex

I have a file in this pattern:
Some text
---
## [Unreleased]
More text here
I need to replace the text between '---' and '## [Unreleased]' with something else in a shell script.
How can it be achieved using sed or awk?

Perl to the rescue!
perl -lne 'my #replacement = ("First line", "Second line");
if ($p = (/^---$/ .. /^## \[Unreleased\]/)) {
print $replacement[$p-1];
} else { print }'
The flip-flop operator .. tells you whether you're between the two strings, moreover, it returns the line number relative to the range.

This might work for you (GNU sed):
sed '/^---/,/^## \[Unreleased\]/c\something else' file
Change the lines between two regexp to the required string.

This example may help you.
$ cat f
Some text
---
## [Unreleased]
More text here
$ seq 1 5 >mydata.txt
$ cat mydata.txt
1
2
3
4
5
$ awk '/^---/{f=1; while(getline < c)print;close(c);next}/^## \[Unreleased\]/{f=0;next}!f' c="mydata.txt" f
Some text
1
2
3
4
5
More text here

awk -v RS="\0" 'gsub(/---\n\n## \[Unreleased\]\n/,"something")+1' file
give this line a try.

An awk solution that:
is portable (POSIX-compliant).
can deal with any number of lines between the start line and the end line of the block, and potentially with multiple blocks (although they'd all be replaced with the same text).
reads the file line by line (as opposed to reading the entire file at once).
awk -v new='something else' '
/^---$/ { f=1; next } # Block start: set flag, skip line
f && /^## \[Unreleased\]$/ { f=0; print new; next } # Block end: unset flag, print new txt
! f # Print line, if before or after block
' file

Related

How can I delete the lines starting with "//" (e.g., file header) which are at the beginning of a file?

I want to delete the header from all the files, and the header has the lines starting with //.
If I want to delete all the lines that starts with //, I can do following:
sed '/^\/\//d'
But, that is not something I need to do. I just need to delete the lines in the beginning of the file that starts with //.
Sample file:
// This is the header
// This should be deleted
print "Hi"
// This should not be deleted
print "Hello"
Expected output:
print "Hi"
// This should not be deleted
print "Hello"
Update:
If there is a new line in the beginning or in-between, it doesn't work. Is there any way to take care of that scenario?
Sample file:
< new empty line >
// This is the header
< new empty line >
// This should be deleted
print "Hi"
// This should not be deleted
print "Hello"
Expected output:
print "Hi"
// This should not be deleted
print "Hello"
Can someone suggest a way to do this? Thanks in advance!
Update: The accepted answer works well for white space in the beginning or in-between.
Could you please try following. This also takes care of new line scenario too, written and tested in https://ideone.com/IKN3QR
awk '
(NF == 0 || /^[[:blank:]]*\/\//) && !found{
next
}
NF{
found=1
}
1
' Input_file
Explanation: Simply checking conditions if a line either is empty OR starting from // AND variable found is NULL then simply skip those lines. Once any line without // found then setting variable found here so all next coming lines should be printed from line where it's get set to till end of Input_file printed.
With sed:
sed -n '1{:a; /^[[:space:]]*\/\/\|^$/ {n; ba}};p' file
print "Hi"
// This should not be deleted
print "Hello"
Slightly shorter version with GNU sed:
sed -nE '1{:a; /^\s*\/\/|^$/ {n; ba}};p' file
Explanation:
1 { # execute this block on the fist line only
:a; # this is a label
/^\s*\/\/|^$/ { n; # on lines matching `^\s*\/\/` or `^$`, do: read the next line
ba } # and go to label :a
}; # end block
p # print line unchanged:
# we only get here after the header or when it's not found
sed -n makes sed not print any lines without the p command.
Edit: updated the pattern to also skip empty lines.
I sounds like you just want to start printing from the first line that's neither blank nor just a comment:
$ awk 'NF && ($1 !~ "^//"){f=1} f' file
print "Hi"
// This should not be deleted
print "Hello"
The above simply sets a flag f when it finds such a line and prints every line from then on. It will work using any awk in any shell on every UNIX box.
Note that, unlike some of the potential solutions posted, it doesn't store more than 1 line at a time in memory and so will work no matter how large your input file is.
It was tested against this input:
$ cat file
// This is the header
// This should be deleted
print "Hi"
// This should not be deleted
print "Hello"
To run the above on many files at once and modify each file as you go is this with GNU awk:
awk -i inplace 'NF && ($1 !~ "^//"){f=1} f' *
and this with any awk:
ip_awk() { local f t=$(mktemp) && for f in "${#:2}"; do awk "$1" "$f" > "$t" && mv -- "$t" "$f"; done; }
ip_awk 'NF && ($1 !~ "^//"){f=1} f' *
In case perl is available then this may also work in slurp mode:
perl -0777 -pe 's~\A(?:\h*(?://.*)?\R+)+~~' file
\A will only match start of the file and (?:\h*(?://.*)?\R+)+ will match 1 or more lines that are blank or have // with optional leading spaces.
With GNU sed:
sed -i -Ez 's/^((\/\/[^\n]*|\s*)\n)+//' file
The ^((\/\/[^\n]*|\s*)\n)+ expression will match one or more lines starting with //, also matching blank lines, only at the start of the file.
Using ed (the file editor that the stream editor sed is based on),
printf '1,/^[^/]/ g|^\(//.*\)\{0,1\}$| d\nw\n' | ed tmp.txt
Some explanations are probably in order.
ed takes the name of the file to edit as an argument, and reads commands from standard input. Each command is terminated by a newline. (You could also read commands from a here document, rather than from printf via a pipe.)
1,/^[^/]/ addresses the first lines in the file, up to and including the first one that does not start with /. (All the lines you want to delete will be included in this set.)
g|^\(//.*\)\{0,1\}$|d deletes all the addressed lines that are either empty or do start with //.
w saves the changes.
Step 2 is a bit ugly; unfortunately, ed does not support regular expression operators you may take for granted, like ? or |. Breaking the regular expression down a bit:
^ matches the start of the line.
//.* matches // followed by zero or more characters.
\(//.*\)\{0,1\} matches the preceding regular expression 0 or 1 times (i.e., optionally)
$ matches the end of the line.

awk concatenate strings till contain substring

I have a awk script from this example:
awk '/START/{if (x) print x; x="";}{x=(!x)?$0:x","$0;}END{print x;}' file
Here's a sample file with lines:
$ cat file
START
1
2
3
4
5
end
6
7
START
1
2
3
end
5
6
7
So I need to stop concatenating when destination string would contain end word, so the desired output is:
START,1,2,3,4,5,end
START,1,2,3,end
Short Awk solution (though it will check for /end/ pattern twice):
awk '/START/,/end/{ printf "%s%s",$0,(/^end/? ORS:",") }' file
The output:
START,1,2,3,4,5,end
START,1,2,3,end
/START/,/end/ - range pattern
A range pattern is made of two patterns separated by a comma, in the
form ‘begpat, endpat’. It is used to match ranges of consecutive
input records. The first pattern, begpat, controls where the range
begins, while endpat controls where the pattern ends.
/^end/? ORS:"," - set delimiter for the current item within a range
here is another awk
$ awk '/START/{ORS=","} /end/ && ORS=RS; ORS!=RS' file
START,1,2,3,4,5,end
START,1,2,3,end
Note that /end/ && ORS=RS; is shortened form of /end/{ORS=RS; print}
You can use this awk:
awk '/START/{p=1; x=""} p{x = x (x=="" ? "" : ",") $0} /end/{if (x) print x; p=0}' file
START,1,2,3,4,5,end
START,1,2,3,end
Another way, similar to answers in How to select lines between two patterns?
$ awk '/START/{ORS=","; f=1} /end/{ORS=RS; print; f=0} f' ip.txt
START,1,2,3,4,5,end
START,1,2,3,end
this doesn't need a buffer, but doesn't check if START had a corresponding end
/START/{ORS=","; f=1} set ORS as , and set a flag (which controls what lines to print)
/end/{ORS=RS; print; f=0} set ORS to newline on ending condition. Print the line and clear the flag
f print input record as long as this flag is set
Since we seem to have gone down the rabbit hole with ways to do this, here's a fairly reasonable approach with GNU awk for multi-char RS, RT, and gensub():
$ awk -v RS='end' -v OFS=',' 'RT{$0=gensub(/.*(START)/,"\\1",1); $NF=$NF OFS RT; print}' file
START,1,2,3,4,5,end
START,1,2,3,end

Print several lines between patterns (first pattern not unique)

Need help with sed/awk/grep/whatever could solve my task.
I have a large file and I need to extract multiple sequential lines from it.
I have start pattern: <DN>
and end pattern: </GR>
and several lines in between, like this:
<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>
<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>
I've tried this:
sed -n '/\<DN\>/,/\<\/GR\>/p'
and several other ones (using awk and sed).
It works okay, but the problem is that the source file may contain lines starting with <DN> and without </GR> in the end of the bunch of lines, and then starts a part with another and normal in the end:
<DN>234</DN> - unneded DN
<AB>sdfsd</AB>
<DC>456456</DC>
<EF>6575675 sdfsd</EF>
....really large piece of unwanted text here....
<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>
<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>
<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>
How can I extract only needed lines and ignore garbage pieces of log, containing <DN> without ending </GR>?
And next, I need to convert a multiline pieces from <DN> to </GR> to a file with single lines, starting with <DN> and ending with </GR>.
Any help would be appreciated. I'm stuck
This might work for you (GNU sed):
sed -n '/<DN>/{h;b};x;/./G;x;/<\/GR/{x;/./p;z;x}' file
Use the hold space to store lines between <DN> and </GR>.
awk '
# Lines that start with '<DN>' start our matching.
/^<DN>/ {
# If we saw a start without a matching end throw everything we've saved away.
if (dn) {
d=""
}
# Mark being in a '<DN>' element.
dn=1
# Save the current line.
d=$0
next
}
# Lines that end with '</GR>$' end our matching (but only if we are currently in a match).
dn && /<\/GR>$/ {
# We aren't in a <DN> element anymore.
dn=0
# Print out the lines we've saved and the current line.
printf "%s%s%s\n", d, OFS, $0
# Reset our saved contents.
d=""
next
}
# If we are in a <DN> element and have saved contents append the current line to the contents (separated by OFS).
dn && d {
d=d OFS $0
}
' file
awk '
/^<DN>/ { n = 1 }
n { lines[n++] = $0 }
n && /<\/GR>$/ {
for (i=1; i<n; i++) printf "%s", lines[i]
print ""
n = 0
}
' file
with bash:
fun ()
{
local line output;
while IFS= read -r line; do
if [[ $line =~ ^'<DN>' ]]; then
output=$line;
else
if [[ -n $output ]]; then
output=$output$'\n'$line;
if [[ $line =~ '</GR>'$ ]]; then
echo "$output";
output=;
fi;
fi;
fi;
done
}
fun <file
You could use pcregrep tool for this.
$ pcregrep -o -M '(?s)(?<=^|\s)<DN>(?:(?!<DN>).)*?</GR>(?=\n|$)' file
<DN>234</DN>
<DD>sdfsd</DD>
<BR>456456</BR>
<COL>6575675 sdfsd</COL>
<RAC>456464</RAC>
<GR>sdfsdfsFFFDd</GR>

Replace previous when match regular expression

I need to delete the "end of line" of the previous line when current line starts is not a number ^[!0-9], basically if match, append to the line before, I'm a sed & awk n00b, and really like them btw. thanks
edit:
$ cat file
1;1;1;text,1
2;4;;8;some;1;1;1;more
100;tex
t
broke
4564;1;1;"also
";12,2121;546465
$ "script" file
1;1;1;text,1
2;4;;8;some;1;1;1;more
100;text broke
4564;1;1;"also";12,2121;546465
You didn't post any sample input or expected output so this is a guess but it sounds like what you're asking for:
$ cat file
a
b
3
4
c
d
$ awk '{printf "%s%s",(NR>1 && /^[[:digit:]]/ ? ORS : ""),$0} END{print ""}' file
ab
3
4cd
On the OPs newly posted input:
$ awk '{printf "%s%s",(NR>1 && /^[[:digit:]]/ ? ORS : ""),$0} END{print ""}' file
1;1;1;text,1
2;4;;8;some;1;1;1;more
100;textbroke
4564;1;1;"also";12,2121;546465
This might work for you (GNU sed):
sed -r ':a;$!N;s/\n([^0-9]|$)/\1/;ta;P;D' file
Keep two lines in the pattern space and if the start of the second line is empty or does not start with an integer, remove the newline.
if you have Ruby on your system
array = File.open("file").readlines
array.each_with_index do |val,ind|
array[ind-1].chomp! if not val[/^\d/] # just chomp off the previous item's \n
end
puts array.join
output
# ruby test.rb
1;1;1;text,1
2;4;;8;some;1;1;1;more
100;textbroke
4564;1;1;"also";12,2121;546465

AWK end of line sign in regular expressions

I have a simple awk script named "script.awk" that contains:
/\/some_simple_string/ { print $0;}
I'm using it to parse some file that contains:
(by using: cat file | awk -f script.awk)
14 catcat one_two/some_thing
15 catcat one_three/one_more_some_simple_string
16 dogdog one_two/some_simple_string_again
17 dogdog one_four/some_simple_string
18 qweqwe firefire/ppp
I want the script to only print the stroke that fully reflect "/some_simple_string[END_OF_LINE]" but not 2 or 3.
Is there any simple way to do it?
I think, the most appropriate way is to add end-of-line sigh to the regular expression.
So it will parse only strokes that starting with "/some.." and have a new line at the end of "..string[END_OF_LINE]"
Desired output:
17 dogdog one_four/some_simple_string
Sorry for confusion, I was asking for END OF LINE sign in regular expressions.
The correct answer is:
/\/some_simple_string$/ { print $0;}
You can always use:
/\/some_simple_string$/ { print $0 }
I.e. match not only "some_simple_string" but match "/some_simple_string" followed by the end of the line ($ is end of line in regex)
grep '\some_simple_string$' file | tail -n 1 should do the trick.
Or if you really want to use awk do awk '/\/some_simple_string/{x = $0}END{print x}'
To return just the last of a group of matches, ...
Store the line in a variable and print it in the END block.
/some_simple_string/ { x = $0 }
END{ print x }
To print all the matches that end with the string /some_simple_string using regular expression you need to anchor to the the end of the line using $. The most suitable tool for this job is grep:
$ grep '/some_simple_string$' file
In awk the command is much the same:
$ awk '/[/]some_simple_string$/' file
To print all lines after the matching you would do:
$ awk 'print_flag{print;f=0} /[/]some_simple_string$/{print_flag=1}' file
Or just combine grep and tail if it makes it clearer using context option -A to print the following lines:
$ grep -A1 '/some_simple_string$' file | tail -n 1
I sometimes find that the input records can have a trailing carriage return (\r).
Yes, I deal with both Windows and Linux text files.
So I add the following 'pre-processor' to my awk scripts:
1 == 1 { # preprocess all records
res = gsub("\r", "") # remove unwanted trailing char
if(res>0 && NR<100) { print "(removed stuff)" > "/dev/stderr" } # optional
}
more optimally, let FS do the work instead of having it perform unnecessary and unrelated field splitting (adding the \r bit for Windows/DOS completeness):
mawk '!_<NF' FS='[/]some_simple_string[\r]?$'
17 dogdog one_four/some_simple_string