Perl equivalent of grep -Eo - regex

In shell scripting, grep -Eo {regex} {file} returns the matched part of the regex. For example:
$ echo \
'For support, visit <http://www.example1.com/support>
You can also visit <http://www.example2.com/products> for information.'
| grep -Eo 'http://[a-z0-9_.-]+/'
http://www.example1.com/
http://www.example2.com/
How would I do this with Perl?

Here are two ways:
In Perl, the special variable $& contains the matched part of the regular expression.
perl -ne 'print "$&\n" if m#http://[a-z0-9_.-]+/#' < input
If your regular expression contains capture groups, the patterns matched by those groups are assigned to the variables $1, $2, ...
perl -ne 'print "$1\n" if m#(http://[a-z0-9_.-]+/)#' < input

To get -o functionality I suggest the following:
echo abcdabcd | perl -lne 'while ($_ =~ s/(bc)//){print $1}'
bc
bc
echo abcdabcd | grep -Eo 'bc'
bc
bc
But for you example I suggest perl -pe 's|(http://[\w-\.]+/).*|$1|g':
echo 'http://www.example1.com/support' | perl -pe 's|(http://[\w-\.]+/).*|$1|g'
http://www.example1.com/

Related

One-liner to print all lines between two patterns

Using one line of Perl code, what is the shortest way possible to print all the lines between two patterns not including the lines with the patterns?
If this is file.txt:
aaa
START
bbb
ccc
ddd
END
eee
fff
I want to print this:
bbb
ccc
ddd
I can get most of the way there using something like this:
perl -ne 'print if (/^START/../^END/);'
That includes the START and END lines, though.
I can get the job done like this:
perl -ne 'if (/^START/../^END/) { print unless (/^(START)|(END)/); };' file.txt
But that seems redundant.
What I'd really like to do is use lookbehind and lookahead assertions like this:
perl -ne 'print if (/^(?<=START)/../(?=END)/);' file.txt
But that doesn't work and I think I've got something just a little bit wrong in my regex.
These are just some of the variations I've tried that produce no output:
perl -ne 'print if (/^(?<=START)/../^.*$(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../$(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../(?=^END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../.*(?=END)/s);' file.txt
Read the whole file, match, and print.
perl -0777 -e 'print <> =~ /START.*?\n(.*?)END.*?/gs;' file.txt
May drop .*? after START|END if alone on line.
Then drop \n for a blank line between segments.
Read file, split line by START|END, print every odd of #F
perl -0777 -F"START|END" -ane 'print #F[ grep { $_ & 1 } (0..$#F) ]' file.txt
Use END { } block for extra processing. Uses }{ for END { }.
perl -ne 'push #r, $_ if (/^START/../^END/); }{ print "#r[1..$#r-1]"' file.txt
Works as it stands only for a single such segment in the file.
It seems kind of arbitrary to place a single-line restriction on this, but here's one way to do it:
$ perl -wne 'last if /^END/; print if $p; $p = 1 if /^START/;' file.txt
perl -e 'print split(/.*START.|END.*/s, join("", <>))' file.txt
perl -ne 'print if /START/../END/' file.txt | perl -ne 'print unless $.==1 or eof'
perl -ne 'print if /START/../END/' file.txt | sed -e '$d' -n -e '1\!p'
I don't see why you are so insistent on using lookarounds, but here are a couple of ways to do it.
perl -ne 'print if /^(?=START)/../^(?=END)/'
This finds the terminators without actually matching them. A zero-length match which satisfies the lookahead is matched.
Your lookbehind wasn't working because it was trying to find beginning of line ^ with START before it on the same line, which can obviously never match. Factor the ^ into the zero-width assertion and it will work:
perl -ne 'print if /(?<=^START)/../(?<=^END)/'
As suggested in comments by #ThisSuitIsBlackNot you can use the sequence number to omit the START and END tokens.
perl -ne '$s = /^START/../^END/; print if ($s>1 && $s !~ /E0/)'
The lookarounds don't contribute anything useful so I did not develop those examples fully. You can adapt this to one of the lookaround examples above if you care more about using lookarounds than about code maintainability and speed of execution.

How to display part of matched pattern in grep?

I wanted to extract 12 from a text like "abc_12_1". I am trying like this
echo "abc_12_1" | grep -Eo '[a-zA-Z]+_[0-9]+_1'
abc_12_1
But I am not able to select the digit after first _ in string, the output of above command is whole string. I am looking for some alternative in grep which I have in following Perl pattern matching.
perl -e '"abc_55_1" =~ m/[a-zA-Z]+_([0-9]+)_1/ ; print $1'
55
Is it possible with grep?
Using perl:
$ echo "abc_12_1" | perl -lne 'print /_(\d+)_/'
12
or grep:
$ echo "abc_12_1" | grep -oP '(?<=_)\d+(?=_)'
12
You could use cut:
cut -d_ -f2 <<< "abc_12_1"
Using grep:
grep -oP '(?<=_).*?(?=_)' <<< "abc_12_1"
Both would yield 12.
One way is to use awk
echo "abc_12_1" | awk -F_ '{print $2}'
12
Or grep
echo "abc_12_1" | grep -o "[0-9][0-9]"
12
Using grep with extended regex
grep -oE "[0-9]{2}" # Get only hits with two digits
grep -oE "[0-9]{2,}" # Get hits with two or more digits

How do I form the correct regular expression to capture everything before parentheses?

current I have a set strings that are of the format
customName(path/to/the/relevant/directory|file.ext#FileRefrence_12345)
From this I could like to extract customName, the characters before the first parentheses, using sed.
My best guesses so far are:
echo $s | sed 's/([^(])+\(.*\)/\1/g'
echo $s | sed 's/([^\(])+\(.*\)/\1/g'
However, using these I get the error:
sed: -e expression #1, char 21: Unmatched ( or \(
So how do I form the correct regular expression? and why is it relevant that I do not have a matched \( is it is just an escaped character for my expression, not a character used for formatting?
you could substitute everything after the opening parenthesis, like this (note that parentheses by default do not need to be escaped in sed)
echo 'customName(path/to/the/relevant/directory|file.ext#FileRefrence_12345)' |
sed -e 's/(.*//'
grep
kent$ echo "customName(blah)"|grep -o '^[^(]*'
customName
sed
kent$ echo "customName(blah)"|sed 's/(.*//'
customName
note I changed the stuff between the brackets.
Different options:
$ echo $s | sed 's/(.*//' #sed (everything before "(")
customName
$ echo $s | cut -d"(" -f1 #cut (delimiter is "(", print 1st block)
customName
$ echo $s | awk -F"(" '{print $1}' #awk (field separator is "(", print 1st)
customName
$ echo ${s%(*} #bash command substitution
customName

How to use variable in regex?

How do I replace regex with $var in this command?
echo "$many_lines" | sed -n '/regex/{g;1!p;};h'
$var could look like fs2#auto-17.
The sed command will output the line immediately before a regex, but not the line containing the regex.
If all this can be done easier with a Perl one-liner, then it is fine with me.
It is not beautiful, but this gives me the previous line to $var which is want I wanted.
echo "$many_lines" | grep -B1 "$var" | grep -v "$var"
In Perl regexes, you can interpolate variable contents into regexes like /$foo/. However, the contents will be interpreted as a pattern. If you want to match the literal content of $foo, you have to escape the metacharacters: $safe = quotemeta $foo; /$safe/. This can be shortended to /\Q$foo\E/, which is what you usually want. A trailing \E is optional.
I don't know if the sed regex engine has a similar feature.
A Perl one-liner: perl -ne'$var = "..."; print $p if /\Q$var/; $p=$_'
Use double quotes instead of single quotes to allow variable expansion:
echo $many_lines | sed -n "/$var/"'{g;1!p;};h'
Since you are looking for a line before the regex, with a single one liner it will not be that trivial and beautiful, but here is how I will do it (Using Perl only):
echo "$many_lines" | perl -nle 'print $. if /\Q$var/' | while read line_no; do
export line_no
echo $many_lines | perl -nle 'print if $. - 1 == $ENV{line_no}'
done
or if you want to do in one line
echo "$many_lines" | perl -nle 'BEGIN {my $content = ""; } $content .= $_; END { while ($content =~ m#([^\n]+)\n[^\n]+\Q$var#gosm) { print $1 }}'
Or this one, should definitely match:
echo "$many_lines" | perl -nle 'BEGIN {my #contents; } push #contents, $_; if ($contents[-1] =~ m#\Q$var#o)') { print $contents[-2] if defined $contents[-2]; }
Or you can use here-documents too, if you don't want to escape the double quotes!
In Perl it looks like this:
$heredoc = <<HEREDOC;
here is your text and $var is your parameter
HEREDOC
Its important to end the heredoc with the same string you began, in my example its "HEREDOC" in a new line!

Regexp replace for mutil-line

I need to replace a string between two lines. For example:
"aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" ==> "aaa\nfoo\nfoo\naaa\nright\nbbb\nfoo\nbbb"
I want to use perl like following format but failed:
echo -e "aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" | perl -pe "code here"
So is there a good way to deal with it?
Both perl and awk is ok.
Perl:
echo -e "aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" | perl -p00e 's/aaa\nfoo\nbbb/aaa\nright\nbbb/'
if you need to match a pattern multiline, you must change the record separator. the flags m and s can be useful too.
see also
perl --help # -0
perldoc perlvar # $/
perldoc perlre # /Modifiers
perl -MO=Deparse -p00e 's/aaa\nfoo\nbbb/aaa\nright\nbbb/'
Awk:
echo -e "aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" | awk 'BEGIN{RS=""}{sub(/aaa\nfoo\nbbb/,"aaa\nright\nbbb",$0);print}'