Can these two regex expressions ever give a different result? - regex

Can these two regex expressions ever give a different result?
perl -pe 's/.*c//s'
perl -0777 -pe 's/.*c//s'
Where the .*c can be replaced with anything..
In the .*c case the result is the same
$ echo -e 'a\nb\nc\nd' | perl -pe 's/.*c//s'
a
b
d
$ echo -e 'a\nb\nc\nd' | perl -0777 -pe 's/.*c//'
a
b
d
And the question re the difference between the regexes, is where what is echoed can be replaced with anything too.
Are -0777 and /s interchangeable?
And is it pointless to do both -0777 with /s ?

They mean completely different things and are not interchangeable, even though in some cases they can have the same result.
/s makes . match all characters (including linebreaks); without it . usually means [^\n]
-0777 means read the whole file at once; without it the file is read line by line
/s doesn't change how the input is parsed, -0 does.
-0777 is usually only useful if you are matching across several lines (in which case /s can be helpful). If you are matching line by line, then whether you use /s or not doesn't matter.
For example (using your example), if you would like to remove everything up to the last c, including all the lines, you could use:
echo -e 'a\nb\nc\nd' | perl -0777 -pe 's/.*c//s'
Output:
d

Qtax gives a good answer, i'm just going to include some examples to demonstrate they're not the same or even effectively the same.
These two
$ echo -e 'a\nb\nc\nd' | perl -pe 's/./o/s'
o
o
o
o
$ echo -e 'a\nb\nc\nd' | perl -0777 -pe 's/./o/'
o
b
c
d
These two
$ echo -e 'aa\nbb\ncc\ndd' | perl -0777 -pe 's/./o/'
oa
bb
cc
dd
$ echo -e 'aa\nbb\ncc\ndd' | perl -pe 's/./o/s'
oa
ob
oc
od
These two
$ echo -e 'aa\nbb\ncc\ndd' | perl -pe 's/./o/sg'
oooooooooooo
$ echo -e 'aa\nbb\ncc\ndd' | perl -0777 -pe 's/./o/g'
oo
oo
oo
oo
Those all demonstrate that -0777 and /s are not the same.

Related

Can we do multiple substitutions with a single Perl command?

Is there a way to make the following into one perl -pe instead of piping it in sequence?
cat text.txt | perl -pe "s/PATTERN1/$PATTERN1/g" | perl -pe "s/PATTERN2/$PATTERN2/g"
The answer in the comments is perfect, but here's a goofy way to do it just for fun:
perl -pe '$_ = s/PATTERN1/$PATTERN1/gr =~ s/PATTERN2/$PATTERN2/gr' text.txt
Anyway, so you don't need to use pipes at all. Just add the file name as the last argument.
Just for reference, here is the best answer, which was given above in the comments:
perl -pe 's/PATTERN1/$PATTERN1/g; s/PATTERN2/$PATTERN2/g' text.txt

One-liner to print all lines between two patterns

Using one line of Perl code, what is the shortest way possible to print all the lines between two patterns not including the lines with the patterns?
If this is file.txt:
aaa
START
bbb
ccc
ddd
END
eee
fff
I want to print this:
bbb
ccc
ddd
I can get most of the way there using something like this:
perl -ne 'print if (/^START/../^END/);'
That includes the START and END lines, though.
I can get the job done like this:
perl -ne 'if (/^START/../^END/) { print unless (/^(START)|(END)/); };' file.txt
But that seems redundant.
What I'd really like to do is use lookbehind and lookahead assertions like this:
perl -ne 'print if (/^(?<=START)/../(?=END)/);' file.txt
But that doesn't work and I think I've got something just a little bit wrong in my regex.
These are just some of the variations I've tried that produce no output:
perl -ne 'print if (/^(?<=START)/../^.*$(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^.*(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../$(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../^(?=END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../(?=^END)/);' file.txt
perl -ne 'print if (/^(?<=START)/../.*(?=END)/s);' file.txt
Read the whole file, match, and print.
perl -0777 -e 'print <> =~ /START.*?\n(.*?)END.*?/gs;' file.txt
May drop .*? after START|END if alone on line.
Then drop \n for a blank line between segments.
Read file, split line by START|END, print every odd of #F
perl -0777 -F"START|END" -ane 'print #F[ grep { $_ & 1 } (0..$#F) ]' file.txt
Use END { } block for extra processing. Uses }{ for END { }.
perl -ne 'push #r, $_ if (/^START/../^END/); }{ print "#r[1..$#r-1]"' file.txt
Works as it stands only for a single such segment in the file.
It seems kind of arbitrary to place a single-line restriction on this, but here's one way to do it:
$ perl -wne 'last if /^END/; print if $p; $p = 1 if /^START/;' file.txt
perl -e 'print split(/.*START.|END.*/s, join("", <>))' file.txt
perl -ne 'print if /START/../END/' file.txt | perl -ne 'print unless $.==1 or eof'
perl -ne 'print if /START/../END/' file.txt | sed -e '$d' -n -e '1\!p'
I don't see why you are so insistent on using lookarounds, but here are a couple of ways to do it.
perl -ne 'print if /^(?=START)/../^(?=END)/'
This finds the terminators without actually matching them. A zero-length match which satisfies the lookahead is matched.
Your lookbehind wasn't working because it was trying to find beginning of line ^ with START before it on the same line, which can obviously never match. Factor the ^ into the zero-width assertion and it will work:
perl -ne 'print if /(?<=^START)/../(?<=^END)/'
As suggested in comments by #ThisSuitIsBlackNot you can use the sequence number to omit the START and END tokens.
perl -ne '$s = /^START/../^END/; print if ($s>1 && $s !~ /E0/)'
The lookarounds don't contribute anything useful so I did not develop those examples fully. You can adapt this to one of the lookaround examples above if you care more about using lookarounds than about code maintainability and speed of execution.

Perl equivalent of grep -Eo

In shell scripting, grep -Eo {regex} {file} returns the matched part of the regex. For example:
$ echo \
'For support, visit <http://www.example1.com/support>
You can also visit <http://www.example2.com/products> for information.'
| grep -Eo 'http://[a-z0-9_.-]+/'
http://www.example1.com/
http://www.example2.com/
How would I do this with Perl?
Here are two ways:
In Perl, the special variable $& contains the matched part of the regular expression.
perl -ne 'print "$&\n" if m#http://[a-z0-9_.-]+/#' < input
If your regular expression contains capture groups, the patterns matched by those groups are assigned to the variables $1, $2, ...
perl -ne 'print "$1\n" if m#(http://[a-z0-9_.-]+/)#' < input
To get -o functionality I suggest the following:
echo abcdabcd | perl -lne 'while ($_ =~ s/(bc)//){print $1}'
bc
bc
echo abcdabcd | grep -Eo 'bc'
bc
bc
But for you example I suggest perl -pe 's|(http://[\w-\.]+/).*|$1|g':
echo 'http://www.example1.com/support' | perl -pe 's|(http://[\w-\.]+/).*|$1|g'
http://www.example1.com/

Regexp replace for mutil-line

I need to replace a string between two lines. For example:
"aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" ==> "aaa\nfoo\nfoo\naaa\nright\nbbb\nfoo\nbbb"
I want to use perl like following format but failed:
echo -e "aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" | perl -pe "code here"
So is there a good way to deal with it?
Both perl and awk is ok.
Perl:
echo -e "aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" | perl -p00e 's/aaa\nfoo\nbbb/aaa\nright\nbbb/'
if you need to match a pattern multiline, you must change the record separator. the flags m and s can be useful too.
see also
perl --help # -0
perldoc perlvar # $/
perldoc perlre # /Modifiers
perl -MO=Deparse -p00e 's/aaa\nfoo\nbbb/aaa\nright\nbbb/'
Awk:
echo -e "aaa\nfoo\nfoo\naaa\nfoo\nbbb\nfoo\nbbb" | awk 'BEGIN{RS=""}{sub(/aaa\nfoo\nbbb/,"aaa\nright\nbbb",$0);print}'

having a regex replacing across lines, retain the newlines?

I'd like to have a substitute or print style command with a regex working across lines. And lines retained.
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | tr -d '\n' | grep -or 'b.*f'
bcdef
or
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | tr -d '\n' | sed -r 's|b(.*)f|y\1z|'
aycdezg
i'd like to use grep or sed because i'd like to know what people would've done before awk or perl ..
would they not have? was .* not available? had they no other equivalent?
to possibly modify some input with a regex that spans across lines, and print it to stdout or output to a file, retaining the lines.
This should do what you're looking for:
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | sed ':a;$s/b\([^f]*\)f/y\1z/;N;ba'
a
y
c
d
e
z
g
It accumulates all the lines then does the replacement. It looks for the first "f". If you want it to look for the last "f", change [^f] to ..
Note that this may make use of features added to sed after AWK or Perl became available (AWK has been around a looong time).
Edit:
To do a multi-line grep requires only a little modification:
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | sed ':a;$s/^[^b]*\(b[^f]*f\)[^f]*$/\1/;N;ba'
b
c
d
e
f
sed can match across newlines through the use of its N command. For example, the following sed command will replace bar followed a newline followed by foo with ###:
$ echo -e "foo\nbar\nbaz\nqux" | sed 'N;s/bar\nbaz/###/;P;D'
foo
###
qux
The N command will append the next input line to the current pattern space separated by an embedded newline (\n)
The P command will print the current pattern space up to and including the first embedded newline.
The D command will delete up to and including the first embedded newline in the pattern space. It will also start next cycle but skip reading from the input if there is still data in the pattern space.
Through the use of these 3 commands, you can essentially do any sort of s command replacement looking across N-lines.
Edit
If your question is how can I remove the need for tr in the two examples above and just use sed then here you go:
$ echo -e 'a\nb\nc\nd\ne\nf\ng' | sed ':a;N;$!ba;s/\n//g;y/ag/yz/'
ybcdefz
Proven tools to the rescue.
echo -e "foo\nbar\nbaz\nqux" | perl -lpe 'BEGIN{$/=""}s/foo\nbar/###/'