replace multi line text with multiline text in sed and perl - regex

Here is my file
start
exit 0
status
exit 0
stop
exit 0
the result file should be
start
exit 0
status
exit 152
stop
exit 0
any help is appreciated
How can I do it using sed and perl

Using Awk
Input
$ cat f
start
exit 0
status
exit 0
stop
exit 0
Output
$ awk '/exit/ && p {sub($NF,"152")}{p=/status/}1' f
start
exit 0
status
exit 152
stop
exit 0
Explanation
p=/status/ Set variable p true whenever awk finds word status
/exit/ && p when awk finds word exit on current record and variable p evaluates true then
sub(regexp, replacement [, target])
Search target, which is treated as a string, for the leftmost, longest substring matched by the regular expression regexp. Modify the entire string by replacing the matched text with replacement. The modified string becomes the new value of target. Return the number of substitutions made (zero or one). If 3rd argument is omitted, then the default is to use and alter $0.
sub($NF,"152")
Substitute last field ($NF) with 152 of current record, since 3rd argument is not given hence, default $0 is altered.
} 1
1 always evaluates to true, it performs default operation {print $0}

perl -pe's/exit \K\d+/152/ if $f; $f = /status/'
Usage:
perl -i~ -pe'...' file # Edit in place with backup
perl -i -pe'...' file # Edit in place without backup
perl -pe'...' file.in >file.out # Read from file
perl -pe'...' <file.in >file.out # Read from stdin

With sed one can use:
$ sed '/status/{n;s/0/152/}' input
start
exit 0
status
exit 152
stop
exit 0

Related

Replace a block of text

I have a file in this pattern:
Some text
---
## [Unreleased]
More text here
I need to replace the text between '---' and '## [Unreleased]' with something else in a shell script.
How can it be achieved using sed or awk?
Perl to the rescue!
perl -lne 'my #replacement = ("First line", "Second line");
if ($p = (/^---$/ .. /^## \[Unreleased\]/)) {
print $replacement[$p-1];
} else { print }'
The flip-flop operator .. tells you whether you're between the two strings, moreover, it returns the line number relative to the range.
This might work for you (GNU sed):
sed '/^---/,/^## \[Unreleased\]/c\something else' file
Change the lines between two regexp to the required string.
This example may help you.
$ cat f
Some text
---
## [Unreleased]
More text here
$ seq 1 5 >mydata.txt
$ cat mydata.txt
1
2
3
4
5
$ awk '/^---/{f=1; while(getline < c)print;close(c);next}/^## \[Unreleased\]/{f=0;next}!f' c="mydata.txt" f
Some text
1
2
3
4
5
More text here
awk -v RS="\0" 'gsub(/---\n\n## \[Unreleased\]\n/,"something")+1' file
give this line a try.
An awk solution that:
is portable (POSIX-compliant).
can deal with any number of lines between the start line and the end line of the block, and potentially with multiple blocks (although they'd all be replaced with the same text).
reads the file line by line (as opposed to reading the entire file at once).
awk -v new='something else' '
/^---$/ { f=1; next } # Block start: set flag, skip line
f && /^## \[Unreleased\]$/ { f=0; print new; next } # Block end: unset flag, print new txt
! f # Print line, if before or after block
' file

loop inside sed script doesn't terminate

Say I have a file like this:
...
{
...
cout<< "---at time:\t\t"<< sc_core::sc_time_stamp()<<"\n\n" << endl;
...
cout<< "---at time:\t\t"<< sc_core::sc_time_stamp()<<"\n\n" << endl;
...
}
...
{
...
if(strcmp(argv[1], "--io_freq1_Mhz") != 0)
if(strcmp(argv[2], "--io_freq2_Mhz") != 0)
...
if(strcmp(argv[11], "--bytes_per_word") != 0)
...
if(strcmp(argv[23], "--mem_size_bytes") != 0)
...
}
What I want do first is, using sed, loading all lines containing the pattern
"--foo"
into pattern space and printing out just the part inside the paranthesis, so I use the command:
sed -n -e 's/.*[^-]\(-\{2\}[^-].*\)"\(.*\)/\1/p' file
which does exactly what I want, so I get as an output:
--io_freq1_Mhz
--io_freq2_Mhz
...
--bytes_per_word
...
--mem_size_bytes
Next I want to merge all the lines into one and separating the content by a blank. I can solve this using command substitution:
echo `sed -n -e 's/.*[^-]\(-\{2\}[^-].*\)"\(.*\)/\1/p' file`
This gives me:
--io_freq1_Mhz --io_freq2_Mhz --... --bytes_per_word --... --mem_size_bytes
Next I want to insert a number betwenn the parameters, for example 1, so the final result should look like:
--io_freq1_Mhz 1 --io_freq2_Mhz 1 --... 1 --bytes_per_word 1 --... 1 --mem_size_bytes 1
I can almost solve this. I'm using the command:
echo `sed -n -e 's/.*[^-]\(-\{2\}[^-].*\)"\(.*\)/\1/p' file` | sed -n -e ':start { s/\(--[^\ ]*\) -/\1 1 -/p; b start }' | sed -n -e 's/\(--.*[^\ ]\)/\1 1/p'
but I get two minor problems. First of all, before I'm jumping back to my start mark the output gets piped into the last sed statement, that means I get as an output:
--io_freq1_Mhz 1 --io_freq2_Mhz --... --bytes_per_word --... --mem_size_bytes 1
--io_freq1_Mhz 1 --io_freq2_Mhz 1 --... --bytes_per_word --... --mem_size_bytes 1
and so on. So my first question is, how I can avoid piping the output every time into my last sed statement. Can I achieve this using different sed options/flags?
The second problem is, that the command doesn't terminate. The iteration ends with
--io_freq1_Mhz 1 --io_freq2_Mhz 1 --... --bytes_per_word 1 --... --third_last_item 1 --second_last_item mem_size_bytes 1
As can be seen, behind the second last item a '1' isn't appended and additionally the whole command doesn't terminate. I have to terminate it using Ctrl-C.
Use a minor modification of your first command:
sed -n -e 's/.*[^-]\(-\{2\}[^-].*\)"\(.*\)/\1 1/p' file | tr '\n' ' '
Having extracted the name, append the digit after it. The tr command converts newlines into blanks. You could do it all in sed; it would be fiddly, that's all.
Actually, it isn't all that much more fiddly, but it requires a different way of looking at the process. Specifically, you need to save the matching patterns in the hold space, and then process them all at the end of the input:
sed -n \
-e '/.*[^-]\(-\{2\}[^-].*\)"\(.*\)/{ s//\1 1/; H; }' \
-e '$ { x; s/\n/ /g; p; }' file
The semicolons before the } characters are necessary with BSD (macOS) sed, but not with GNU sed. The first -e option finds lines that match your pattern, and then applies a substitute command to the line to retain just the --name part plus a digit 1, and then appends that information to the hold space after a newline. The second -e option works on the last line. It exchanges the pattern and hold spaces, then replaces every newline with a blank and prints the result, including a trailing newline which the script with tr replaces with a blank.
Output (note the leading blank):
--io_freq1_Mhz 1 --io_freq2_Mhz 1 --bytes_per_word 1 --mem_size_bytes 1
If you don't want the leading blank, remove it before printing (add s/^ //; before the p).

pulling text between two patterns with awk script

Input text file:
This is a simple test file.
#BEGIN
These lines should be extracted by our script.
Everything here will be copied.
#END
That should be all.
#BEGIN
Nothing from here.
#END
Desired output:
These lines should be extracted by our script.
Everything here will be copied.
My awk script is:
#!/usr/bin/awk -f
$1 ~ /#BEGIN/{a=1;next};a;$1 ~ /#END/ {exit}
and my current output is:
These lines should be extracted by our script.
Everything here will be copied.
#END
The only problem I'm having is that I'm still printing the "#END". I've been trying for a long time to somehow eliminate that. Not sure how to exactly do it.
This becomes obvious IMO is we comment each command in the script. The script can be written like this:
#!/usr/bin/awk -f
$1 ~ /#BEGIN/ { # If we match the BEGIN line
a=1 # Set a flag to one
next # skip to the next line
}
a != 0 { # if the flag is not zero
print $0 # print the current line
}
$1 ~ /#END/ { # if we match the END line
exit # exit the process
}
Note that I expanded a to the equivalent form a!=0{print $0}, to make the point clearer.
So the script starts printing each line when the flag is set, and when it reaches the END line, it has already printed the line before it exits. Since you don't want the END line to be printed, you should exit before you print the line. So the script should become:
#!/usr/bin/awk -f
$1 ~ /#BEGIN/ { # If we match the BEGIN line
a=1 # Set a flag to one
next # skip to the next line
}
$1 ~ /#END/ { # if we match the END line
exit # exit the process
}
a != 0 { # if the flag is not zero
print $0 # print the current line
}
In this case, we exit before the line is printed. In a condensed form, it can be written as:
awk '$1~/#BEGIN/{a=1;next}$1~/#END/{exit}a' file
or a bit shorter
awk '$1~/#END/{exit}a;$1~/#BEGIN/{a=1}' file
Regarding the additional constraints raised in the comments, to avoid skipping any BEGIN blocks within the block that is to be printed, we should remove the next statement, and rearrange the lines like in the example right above. In an expanded form it would be like this:
#!/usr/bin/awk -f
$1 ~ /#END/ { # if we match the END line
exit # exit the process
}
a != 0 { # if the flag is not zero
print $0 # print the current line
}
$1 ~ /#BEGIN/ { # If we match the BEGIN line
a=1 # Set a flag to one
}
To also avoid exiting if an END line is found before the block to be printed, we can check if the flag is set before exiting:
#!/usr/bin/awk -f
$1 ~ /#END/ && a != 0 { # if we match the END line and the flag is set
exit # exit the process
}
a != 0 { # if the flag is not zero
print $0 # print the current line
}
$1 ~ /#BEGIN/ { # If we match the BEGIN line
a=1 # Set a flag to one
}
or in a condensed form:
awk '$1~/#END/&&a{exit}a;$1~/#BEGIN/{a=1}' file
Try below sed command to get desired output -
vipin#kali:~$ sed '/#BEGIN/,/#END/!d;/END/q' kk.txt|sed '1d;$d'
These lines should be extracted by our script.
Everything here will be copied.
vipin#kali:~$
Explanation -
use d to delete the content between two expression but !d will print them and then q for quit where command found END.
1d;$d to replace first and last line in our case #BEGIN and #END

Regular expression Bash issue

I have to match a string composed of only lowercase characters repeated 2 times , for example ballball or printprint. For example the word ball is not accepted because is not repeated 2 time.
For this reason I have this code:
read input
expr='^(([a-z]*){2})$'
if [[ $input =~ $expr ]]; then
echo "OK string"
exit 0
fi
exit 10
but it doesn't work , for example if I insert ball the script prints "OK string".
What do I wrong?
Not all Bash versions support backreferences in regexes natively. If yours doesn't, you can use an external tool such as grep:
read input
re='^\([a-z]\+\)\1$'
if grep -q "$re" <<< "$input"; then
echo "OK string"
exit 0
fi
exit 1
grep -q is silent and has a successful exit status if there was a match. Notice how (, + and ) have to be escaped for grep. (grep -E would understand () without escaping.)
Also, I've replaced your * with + so we don't match the empty string.
Alternatively: your requirement means that a matching string has two identical halves, so we can check for just that, without any regexes:
read input
half=$(( ${#input} / 2 ))
if (( half > 0 )) && [[ ${input:0:$half} = ${input:$half} ]]; then
echo "OK string"
fi
This uses Substring Expansion; the first check is to make sure that the empty string doesn't match.
Your requirement is to match strings made of two repeated words. This is easy to do by just checking if the first half of your string is equal to the remaining part. No need to use regexps...
$ var="byebye" && len=$((${#var}/2))
$ test ${var:0:$len} = ${var:$len} && { echo ok ; } || echo no
ok
$ var="abcdef" && len=$((${#var}/2))
$ test ${var:0:$len} = ${var:$len} && { echo ok ; } || echo no
no
The regex [a-z]* will match any alphanumeric or empty string.
([a-z]*){2} will match any two of those.
Ergo, ^(([a-z]*){2})$ will match any string containing zero or more alphanumeric characters.
Using the suggestion from #hwnd (replacing {2} with \1) will enforce a match on two identical strings.
N.B: You will need a fairly recent version of bash. Tested in bash 4.3.11.

sed join lines together

what would be the sed (or other tool) command to join lines together in a file that do not end w/ the character '0'?
I'll have lines like this
412|n|Leader Building Material||||||||||d|d|20||0
which need to be left alone, and then I'll have lines like this for example (which is 3 lines, but only one ends w/ 0)
107|n|Knot Tying Tools|||||Knot Tying Tools
|||||d|d|0||0
which need to be joined/combined into one line
107|n|Knot Tying Tools|||||Knot Tying Tools|||||d|d|0||0
sed ':a;/0$/{N;s/\n//;ba}'
In a loop (branch ba to label :a), if the current line ends in 0 (/0$/) append next line (N) and remove inner newline (s/\n//).
awk:
awk '{while(/0$/) { getline a; $0=$0 a; sub(/\n/,_) }; print}'
Perl:
perl -pe '$_.=<>,s/\n// while /0$/'
bash:
while read line; do
if [ ${line: -1:1} != "0" ] ; then
echo $line
else echo -n $line
fi
done
awk could be short too:
awk '!/0$/{printf $0}/0$/'
test:
kent$ cat t
#aasdfasdf
#asbbb0
#asf
#asdf0
#xxxxxx
#bar
kent$ awk '!/0$/{printf $0}/0$/' t
#aasdfasdf#asbbb0
#asf#asdf0
#xxxxxx#bar
The rating of this answer is surprising ;s (this surprised wink emoticon pun on sed substitution is intentional) given the OP specifications: sed join lines together.
This submission's last comment
"if that's the case check what #ninjalj submitted"
also suggests checking the same answer.
ie. Check using sed ':a;/0$/{N;s/\n//;ba}' verbatim
sed ':a;/0$/{N;s/\n//;ba}'
does
no one
ie. 0
people,
try
nothing,
ie. 0
things,
any more,
ie. 0
tests?
(^D aka eot 004 ctrl-D ␄ ... bash generate via: echo ^V^D)
which will not give (do the test ;):
does no one ie. 0
people, try nothing, ie. 0
things, any more, ie. 0
tests? (^D aka eot 004 ctrl-D ␄ ... bash generate via: echo ^V^D)
To get this use:
sed 'H;${z;x;s/\n//g;p;};/0$/!d;z;x;s/\n//g;'
or:
sed ':a;/0$/!{N;s/\n//;ba}'
not:
sed ':a;/0$/{N;s/\n//;ba}'
Notes:
sed 'H;${x;s/\n//g;p;};/0$/!d;z;x;s/\n//g;'
does not use branching and
is identical to:
sed '${H;z;x;s/\n//g;p;};/0$/!{H;d;};/0$/{H;z;x;s/\n//g;}'
H commences all sequences
d short circuits further script command execution on the current line and starts the next cycle so address selectors following /0$/! can only be /0$/!! so the address selector of
/0$/{H;z;x;s/\n//g;} is redundant and not needed.
if a line does not end with 0 save it in hold space
/0$/!{H;d;}
if a line does end with 0 save it too and then print flush (double entendre ie. purged and lines aligned)
/0$/{H;z;x;s/\n//g;}
NB ${H;z;x;s/\n//g;p;} uses /0$/ ... commands with an extra p to coerce the final print and with a now unnecessary z (to empty and reset pattern space like s/.*//)
A typically cryptic Perl one-liner:
perl -pe 'BEGIN{$/="0\n"}s/\n//g;$_.=$/'
This uses the sequence "0\n" as the record separator (by your question, I'm assuming that every line should end with a zero). Any record then should not have internal newlines, so those are removed, then print the line, appending the 0 and newline that were removed.
Another take to your question would be to ensure each line has 17 pipe-separated fields. This does not assume that the 17th field value must be zero.
awk -F \| '
NF == 17 {print; next}
prev {print prev $0; prev = ""}
{prev = $0}
'
if ends with 0 store, remove newline..
sed '/0$/!N;s/\n//'