Regexp replace for mutil-line - regex

I need to replace a string between two lines. For example:
"aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" ==> "aaa\nfoo\nfoo\naaa\nright\nbbb\nfoo\nbbb"
I want to use perl like following format but failed:
echo -e "aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" | perl -pe "code here"
So is there a good way to deal with it?
Both perl and awk is ok.

Perl:
echo -e "aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" | perl -p00e 's/aaa\nfoo\nbbb/aaa\nright\nbbb/'
if you need to match a pattern multiline, you must change the record separator. the flags m and s can be useful too.
see also
perl --help # -0
perldoc perlvar # $/
perldoc perlre # /Modifiers
perl -MO=Deparse -p00e 's/aaa\nfoo\nbbb/aaa\nright\nbbb/'
Awk:
echo -e "aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" | awk 'BEGIN{RS=""}{sub(/aaa\nfoo\nbbb/,"aaa\nright\nbbb",$0);print}'

Related

Why \d\+ or \d+ is not equal to \d* here?

Bash +debian.
To match port number at the end of info.
s="2017-04-17 08:16:14 INFO connecting lh3.googleusercontent.com:443 from 111.111.111.111:26215"
echo $s | sed 's/\(.*\):\(\d*\)/\2/'
26215
Let's match it with \d+ or \d+ in sed.
echo $s | sed 's/\(.*\):\(\d\+\)/\2/'
echo $s | sed 's/\(.*\):\(\d+\)/\2/'
All of them get the whole string as output.
2017-04-17 08:16:14 INFO connecting lh3.googleusercontent.com:443 from 111.111.111.111:26215
None of them can match port number at the end,why?
There is an easier sed pattern to use:
$ echo "$s" | sed -nE 's/.*:([^:])/\1/p'
26215
As stated in comments, regular sed does not have perl meta characters. You need to use the POSIX character class of [[:digit:]]
Explanation:
sed -nE 's/.*:([^:])/\1/p'
^ only print if there is a match
^ use ERE and you don't need to escape the parens
^ capture up to the rightmost :
^ ^ -E means you don't need to escape parens
^ all characters except :
^ print if there is a match
Or, if you want to be more specific you want only digits:
$ echo "$s" | sed -nE 's/.*:([[:digit:]]+$)/\1/p'
26215
Note + to make sure there is at least one digit and $ to match only at the end of the line.
There is a summary of different regex flavors HERE. With -E sed is using ERE the same as egrep.
\d is a PCRE extension not present in BRE or ERE syntax (as used by standard UNIX tools).
In this particular case, there's no need to use any tools not built into bash for this purpose at all:
s="2017-04-17 08:16:14 INFO connecting lh3.googleusercontent.com:443 from 111.111.111.111:26215"
echo "Port is ${s##*:}"
This is a parameter expansion; when dealing with small amounts of data, such built-in capabilities are much more efficient than running external tools.
There's also native ERE support built into the shell, as follows:
re=':([[:digit:]]+)$'
[[ $s =~ $re ]] && echo "Port is ${BASH_REMATCH[1]}"
BashFAQ #100 also goes into detail on bash string manipulation.
All you need is this:
echo ${s##*:}
Learn your shell string operators.
s="2017-04-17 08:16:14 INFO connecting lh3.googleusercontent.com:443 from 111.111.111.111:26215"
1.grep
echo $s |grep -Po '\d+$'
2.ack
echo $s |ack -o '\d+$'
3.sed
echo $s |sed 's/.*\://'
4.awk
echo $s |awk -F: '{print $NF}'
Self-answer by OP moved from question to community wiki answer, per consensus on meta:
There is no expression \d to stand for numbers in sed.
To get with awk simply with :
echo $s |awk -F: '{print $NF}'
26215

search and replace regexp gives two different outputs if grouping metacharacters are used

I get different outputs from search and replace regexp in perl depending whether I use in place replace (sed alternative) or regular search replace and also depending on whether I use \1 or $1:
──> cat test1.txt
orig.avg.10
──> cat test2.txt
orig.avg.10
# EXPECTED
──> cat test1.txt | perl -lne '$_ =~ s/(avg\.[0-9]+)/$1\.vec/; print $_'
orig.avg.10.vec
# EXPECTED
──> cat test1.txt | perl -lne '$_ =~ s/(avg\.[0-9]+)/\1\.vec/; print $_'
orig.avg.10.vec
# EXPECTED
──> perl -p -i.bak -e "s/(avg\.[0-9]+)/\1\.vec/" test2.txt
──> cat test2.txt
orig.avg.10.vec
# UNEXPECTED
──> perl -p -i.bak -e "s/(avg\.[0-9]+)/$1\.vec/" test1.txt
──> cat test1.txt
orig..vec
Why this happens?
You are using " to wrap your perl code, but doing so means the shell can and will interpolate $1.
Use ' instead and everything will work as expected.
The problem is that I've used in #UNEXPECTED case double quotes which makes expanding $1 variable. Sometimes one need to write down the questions before realized the case.

Perl equivalent of grep -Eo

In shell scripting, grep -Eo {regex} {file} returns the matched part of the regex. For example:
$ echo \
'For support, visit <http://www.example1.com/support>
You can also visit <http://www.example2.com/products> for information.'
| grep -Eo 'http://[a-z0-9_.-]+/'
http://www.example1.com/
http://www.example2.com/
How would I do this with Perl?
Here are two ways:
In Perl, the special variable $& contains the matched part of the regular expression.
perl -ne 'print "$&\n" if m#http://[a-z0-9_.-]+/#' < input
If your regular expression contains capture groups, the patterns matched by those groups are assigned to the variables $1, $2, ...
perl -ne 'print "$1\n" if m#(http://[a-z0-9_.-]+/)#' < input
To get -o functionality I suggest the following:
echo abcdabcd | perl -lne 'while ($_ =~ s/(bc)//){print $1}'
bc
bc
echo abcdabcd | grep -Eo 'bc'
bc
bc
But for you example I suggest perl -pe 's|(http://[\w-\.]+/).*|$1|g':
echo 'http://www.example1.com/support' | perl -pe 's|(http://[\w-\.]+/).*|$1|g'
http://www.example1.com/

How to use variable in regex?

How do I replace regex with $var in this command?
echo "$many_lines" | sed -n '/regex/{g;1!p;};h'
$var could look like fs2#auto-17.
The sed command will output the line immediately before a regex, but not the line containing the regex.
If all this can be done easier with a Perl one-liner, then it is fine with me.
It is not beautiful, but this gives me the previous line to $var which is want I wanted.
echo "$many_lines" | grep -B1 "$var" | grep -v "$var"
In Perl regexes, you can interpolate variable contents into regexes like /$foo/. However, the contents will be interpreted as a pattern. If you want to match the literal content of $foo, you have to escape the metacharacters: $safe = quotemeta $foo; /$safe/. This can be shortended to /\Q$foo\E/, which is what you usually want. A trailing \E is optional.
I don't know if the sed regex engine has a similar feature.
A Perl one-liner: perl -ne'$var = "..."; print $p if /\Q$var/; $p=$_'
Use double quotes instead of single quotes to allow variable expansion:
echo $many_lines | sed -n "/$var/"'{g;1!p;};h'
Since you are looking for a line before the regex, with a single one liner it will not be that trivial and beautiful, but here is how I will do it (Using Perl only):
echo "$many_lines" | perl -nle 'print $. if /\Q$var/' | while read line_no; do
export line_no
echo $many_lines | perl -nle 'print if $. - 1 == $ENV{line_no}'
done
or if you want to do in one line
echo "$many_lines" | perl -nle 'BEGIN {my $content = ""; } $content .= $_; END { while ($content =~ m#([^\n]+)\n[^\n]+\Q$var#gosm) { print $1 }}'
Or this one, should definitely match:
echo "$many_lines" | perl -nle 'BEGIN {my #contents; } push #contents, $_; if ($contents[-1] =~ m#\Q$var#o)') { print $contents[-2] if defined $contents[-2]; }
Or you can use here-documents too, if you don't want to escape the double quotes!
In Perl it looks like this:
$heredoc = <<HEREDOC;
here is your text and $var is your parameter
HEREDOC
Its important to end the heredoc with the same string you began, in my example its "HEREDOC" in a new line!

Print all matches of a regular expression from the command line?

What's the simplest way to print all matches (either one line per match or one line per line of input) to a regular expression on a unix command line? Note that there may be 0 or more than 1 match per line of input.
I assume there must be some way to do this with sed, awk, grep, and/or perl, and I'm hoping for a simple command line solution so it will show up in my bash history when needed in the future.
EDIT: To clarify, I do not want to print all matching lines, only the matches to the regular expression. For example, a line might have 1000 characters, but there are only two 10-character matches to the regular expression. I'm only interested in those two 10-character matches.
Assuming you only use non-capturing parentheses,
perl -wnE'say /yourregex/g'
or
perl -wnE'say for /yourregex/g'
Sample use:
$ echo -ne 'fod,food,fad\nbar\nfooooood\n' | perl -wnE'say for /fo*d/g'
fod
food
fooooood
$ echo -ne 'fod,food,fad\nbar\nfooooood\n' | perl -wnE'say /fo*d/g'
fodfood
fooooood
Unless I misunderstand your question, the following will do the trick
grep -o 'fo.*d' input.txt
For more details see:
GNU grep (most platforms)
Solaris grep
AIX grep
HP-UX grep
Going off the comment, and assuming you're passed the input from a pipe or otherwise on STDIN:
perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}'
Usage:
cat SOME_TEXT_FILE | perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' 'YOUR_REGEX'
or I would just stuff that whole mess into a bash function...
bggrep ()
{
if [ "x$1" != "x" ]; then
perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' $1;
else
echo "Usage: bggrep <regex>";
fi
}
Usage is the same, just cleaner-looking:
cat SOME_TEXT_FILE | bggrep 'YOUR_REGEX'
(or just type the command itself and enter the text to match line-by-line, but that didn't seem a likely use case :).
Example (from your comment):
bash$ cat garbage
fod,food,fad
bar
fooooooood
bash$ cat garbage | perl -e 'my $re=shift;$re=~qr{$re};while(<STDIN>){if(/($re)/g){print"$1\n"}while(m/\G.*?($re)/g){print"$1\n"}}' 'fo*d'
fod
food
fooooooood
or...
bash$ cat garbage | bggrep 'fo*d'
fod
food
fooooooood
perl -MSmart::Comments -ne '#a=m/(...)/g;print;' -e '### #a'