Regex performance - regex

I am benchmarking different approaches to RegEx and seeing something I really don't understand. I am specifically comparing using the -match operator vs using the [regex]::Matches() accelerator.
I started with
(Measure-Command {
foreach ($i in 1..10000) {
$path -match $testPattern
}
}).TotalSeconds
(Measure-Command {
foreach ($i in 1..10000) {
[regex]::Matches($path, $testPattern)
}
}).TotalSeconds
and -match is always very slightly faster. But it's also not apples to apples because I need to assign the [Regex] results to a variable to use it. So I added that
(Measure-Command {
foreach ($i in 1..10000) {
$path -match $testPattern
}
}).TotalSeconds
(Measure-Command {
foreach ($i in 1..10000) {
$test = [regex]::Matches($path, $testPattern)
}
}).TotalSeconds
And now [Regex] is consistently slightly faster, which makes no sense because I added to the workload with the variable assignment. The performance difference is ignorable, 1/100th of a second when doing 10,000 matches, but I wonder what is going on under the hood to make [Regex] faster when there is a variable assignment involved?
For what it's worth, without the variable assignment -match is faster, .05 seconds vs .03 seconds. With variable assignment [Regex] is faster by .03 seconds vs .02 seconds. So while it IS all negligible, adding the variable cuts [Regex] processing time more than in half, which is a (relatively) huge delta.

The outputs of both tests are different.
The accelerator output a lot more text.
Even though they are not displayed when wrapped in the Measure-Command cmdlet, they are part of the calculation.
Output of $path -match $testPattern
$true
Output of [regex]::Matches($path,$testPattern
Groups : {0}
Success : True
Name : 0
Captures : {0}
Index : 0
Length : 0
Value :
Writing stuff is slow.
In your second example, you take care of the accelerator output by assigning it to a variable. That's why it is significantly faster.
You can see the difference without assignment by voiding the outputs
If you do that, you'll see the accelerator is consistently slightly faster.
(Measure-Command {
foreach ($i in 1..10000) {
[void]($path -match $testPattern)
}
}).TotalSeconds
(Measure-Command {
foreach ($i in 1..10000) {
[void]([regex]::Matches($path, $testPattern))
}
}).TotalSeconds
Additional note
void is always more efficient than Command | Out-null.
Pipeline is slower but memory efficient.

This isn't an answer to the direct question asked, but it's an expansion on the performance of pre-compiled regexes that I mentioned in comments...
First, here's my local performance benchmark for the original code in the question for comparison (with some borrowed text and patterns):
$text = "foo" * 1e6;
$pattern = "f?(o)";
$count = 1000000;
# example 1
(Measure-Command {
foreach ($i in 1..$count) {
$text -match $pattern
}
}).TotalSeconds
# 8.010825
# example 2
(Measure-Command {
foreach ($i in 1..$count) {
$result = [regex]::Matches($text, $pattern)
}
}).TotalSeconds
# 6.8186813
And then using a pre-compiled regex, which according to Compilation and Reuse in Regular Expressions emits a native assembly to process the regex rather than the default "sequence of internal instructions" - whatever that actually means :-).
$text = "foo" * 1e6;
$pattern = "f?(o)";
$count = 1000000;
# example 3
$regex = [regex]::new($pattern, "Compiled");
(Measure-Command {
foreach ($i in 1..$count) {
$result = $regex.Matches($text)
}
}).TotalSeconds
# 5.8794981
# example 4
(Measure-Command {
$regex = [regex]::new($pattern, "Compiled");
foreach ($i in 1..$count) {
$result = $regex.Matches($text)
}
}).TotalSeconds
# 3.6616832
# example 5
# see https://github.com/PowerShell/PowerShell/issues/8976
(Measure-Command {
& {
$regex = [regex]::new($pattern, "Compiled");
foreach ($i in 1..$count) {
$result = $regex.Matches($text);
}
}
}).TotalSeconds
# 1.5474028
Note that Example 3 has a performance overhead of finding / resolving the $regex variable from inside each iteration because it's defined outside the Measure-Command's -Expresssion scriptblock - see https://github.com/PowerShell/PowerShell/issues/8976 for details.
Example 5 defines the variable inside a nested scriptblock and so is a lot faster. I'm not sure why Example 4 sits in between the two in performance, but it's useful to note there's a definite difference :-)
Also, as an aside, in my comments above, my original version of Example 5 didn't have the &, which meant I was timing the effort required to define the scriptblock, not execute it, so my numbers were way off. In practice, the performance increase is a lot less than my comment suggested, but it's still a decent improvement if you're executing millions of matches in a tight loop...

Related

perl: refactor s/.../.../g -> while {}?

I've got a monstrous eval-substitution; here's a simplified version
$ perl -wpe 's#(for )(\w+)#$1 . "user " . qx/id $2/#ge'
which replaces e.g.
Stats for root are bad
Stats for user uid=0(root) gid=0(root) groups=0(root)
are bad
Is there an idiom to turn the s/.../.../g into a loop? Something like
while (m#(for )(\w+)#) {
# explicitly replace match with expression computed over several LOCs
}
Or maybe somehow use map()?
This idiom is to use s///eg. It undeniably better than the alternative you are seeking.
s{pat}{ repl }eg;
is equivalent to
my $out = '';
my $last_pos = 0;
while (m{pat}g) {
$out .= substr($_, $last_pos, $-[0] - $last_pos) . do { repl };
$last_pos = $+[0];
}
$_ = $out . substr($_, $last_pos);
Because you hinted that there would be more than one statements to be executed in the replacement expression, I'd write your code as follows:
s{for \K(\w+)}{
...
...
}eg;
The advantage of curlies is that they can be nested.

Use Powershell to comment out a 'codeblock' in a text file?

I'm trying to comment out some code in a massive amount of files
The files all contain something along the lines of:
stage('inrichting'){
steps{
build job: 'SOMENAME', parameters: param
build job: 'SOMEOTHERNAME', parameters: param
echo 'TEXT'
}
}
The things within the steps{ } is variable, but always exists out of 0..N 'echo' and 0..N 'build job'
I need an output like:
//stage('inrichting'){
// steps{
// build job: 'SOMENAME', parameters: param
// build job: 'SOMEOTHERNAME', parameters: param
// echo 'TEXT'
// }
//}
Is there any good way to do this with PowerShell? I tried some stuff with pattern.replace but didn't get very far.
$list = Get-ChildItem -Path 'C:\Program Files (x86)\Jenkins\jobs' -Filter config.xml -Recurse -ErrorAction SilentlyContinue -Force | % { $_.fullname };
foreach ($item in $list) {
...
}
This is a bit tricky, as you're trying to find that whole section, and then add comment markers to all lines in it. I'd probably write an ad-hoc parser with switch -regex if your structure allows for it (counting braces may make things more robust, but is also a bit harder to get right for all cases). If the code is regular enough you can perhaps reduce it to the following:
stage('inrichting'){
steps{
... some amount of lines that don't contain braces
}
}
and we can then check for occurrence of the two fixed lines at the start and eventually two lines with closing braces:
foreach ($file in $list) {
# lines of the file
$lines = Get-Content $file
# line numbers to comment out
$linesToComment = #()
# line number of the current block to comment
$currentStart = -1
# the number of closing braces on single lines we've encountered for the current block
$closingBraces = 0
for ($l = 0; $l -le $lines.Count; $l++) {
switch -regex ($lines[$l]) {
'^\s*stage\('inrichting'\)\{' {
# found the first line we're looking for
$currentStart = $l
}
'^\s*steps\{' {
# found the second line, it may not belong to the same block, so reset if needed
if ($l -ne $currentStart + 1) { $currentStart = -1 }
}
'^\s*}' {
# only count braces if we're at the correct point
if ($currentStart -ne -1) { $closingBraces++ }
if ($closingBraces -eq 2) {
# we've reached the end, add the range to the lines to comment out
$linesToComment += $currentStart..$l
$currentStart = -1
$closingBraces = 0
}
}
}
}
$commentedLines = 0..($lines.Count-1) | % {
if ($linesToComment -contains $_) {
'//' + $lines[$_]
} else {
$lines[$_]
}
} | Set-Content $file
}
Untested, but the general idea might work.
Update: fixed and tested

Regex simple replace document from dictionary hash (Perl)

I need to find and replace keywords from a hash in a large documents as fast as possible.
I tired the below two methods, one is faster by 320% but I am sure I am doing this the wrong way and sure there is a better way to do it.
The idea I want to replace only the keywords that exist in the dictionary hash and keep those that does not exist so I know it is not in the dictionary.
Both methods below scan twice to find and replace as I think. I am sure the regex like look ahead or behind can optimize it much faster.
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all);
my %dictionary = (
pollack => "pollard",
polynya => "polyoma",
pomaces => "pomaded",
pomades => "pomatum",
practic => "praetor",
prairie => "praised",
praiser => "praises",
prajnas => "praline",
quakily => "quaking",
qualify => "quality",
quamash => "quangos",
quantal => "quanted",
quantic => "quantum",
);
my $content =qq{
Start this is the text that contains the words to replace. {quantal} A computer {pollack} is a general {pomaces} purpose device {practic} that
can be {quakily} programmed to carry out a set {quantic} of arithmetic or logical operations automatically {quamash}.
Since a {prajnas} sequence of operations can {praiser} be readily changed, the computer {pomades} can solve more than {prairie}
one kind of problem {qualify} {doesNotExist} end.
};
# just duplicate content many times
$content .= $content;
cmpthese(100000, {
replacer_1 => sub {my $text = replacer1($content)},
replacer_2 => sub {my $text = replacer2($content)},
});
print replacer1($content) , "\n--------------------------\n";
print replacer2($content) , "\n--------------------------\n";
exit;
sub replacer1 {
my ($content) = shift;
$content =~ s/\{(.+?)\}/exists $dictionary{$1} ? "[$dictionary{$1}]": "\{$1\}"/gex;
return $content;
}
sub replacer2 {
my ($content) = shift;
my #names = $content =~ /\{(.+?)\}/g;
foreach my $name (#names) {
if (exists $dictionary{$name}) {
$content =~ s/\{$name\}/\[$dictionary{$name}\]/;
}
}
return $content;
}
Here is the benchmark result:
Rate replacer_2 replacer_1
replacer_2 5565/s -- -76%
replacer_1 23397/s 320% --
Here's a way that's a little faster and more compact:
sub replacer3 {
my ($content) = shift;
$content =~ s#\{(.+?)\}#"[".($dictionary{$1} // $1)."]"#ge;
return $content;
}
In Perl 5.8, it is ok to use || instead of // if none of your dictionary values are "false".
There's also a little to be gained by using a dictionary that already contains the braces and brackets:
sub replacer5 {
my ($content) = shift;
our %dict2;
if (!%dict2) {
%dict2 = map { "{".$_."}" => "[".$dictionary{$_}."]" } keys %dictionary
}
$content =~ s#(\{.+?\})#$dict2{$1} || $1#ge;
return $content;
}
Benchmark results:
Rate replacer_2 replacer_1 replacer_3 replacer_5
replacer_2 2908/s -- -79% -83% -84%
replacer_1 14059/s 383% -- -20% -25%
replacer_3 17513/s 502% 25% -- -7%
replacer_5 18741/s 544% 33% 7% --
It helps to build a regex that will match any of the hash keys beforehand. Like this
my $pattern = join '|', sort {length $b <=> length $a } keys %dictionary;
$pattern = qr/$pattern/;
sub replacer4 {
my ($string) = #_;
$string =~ s# \{ ($pattern) \} #"[$dictionary{$1}]"#gex;
$string;
}
with these results
Rate replacer_2 replacer_1 replacer_3 replacer_4
replacer_2 4883/s -- -80% -84% -85%
replacer_1 24877/s 409% -- -18% -22%
replacer_3 30385/s 522% 22% -- -4%
replacer_4 31792/s 551% 28% 5% --
It would also make an improvement if you could the braces and brackets in the hash, instead of having to add them each time.
I'd recommend using meaningful names for your benchmarking subroutines, it'll make the output and intent more clear.
The following reproduces a bit of what Borodin and mob have tried out, and then combines them as well.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'state';
use Benchmark qw(:all);
# Data separated by paragraph mode.
my %dictionary = split ' ', do {local $/ = ''; <DATA>};
my $content = do {local $/; <DATA>};
# Quadruple Content
$content = $content x 4;
cmpthese(100_000, {
original => sub { my $text = original($content) },
build_list => sub { my $text = build_list($content) },
xor_regex => sub { my $text = xor_regex($content) },
list_and_xor => sub { my $text = list_and_xor($content) },
});
exit;
sub original {
my $content = shift;
$content =~ s/\{(.+?)\}/exists $dictionary{$1} ? "[$dictionary{$1}]": "\{$1\}"/gex;
return $content;
}
sub build_list {
my $content = shift;
state $list = join '|', map quotemeta, keys %dictionary;
$content =~ s/\{($list)\}/[$dictionary{$1}]/gx;
return $content;
}
sub xor_regex {
my $content = shift;
state $with_brackets = {
map {("{$_}" => "[$dictionary{$_}]")} keys %dictionary
};
$content =~ s{(\{.+?\})}{$with_brackets->{$1} // $1}gex;
return $content;
}
sub list_and_xor {
my $content = shift;
state $list = join '|', map quotemeta, keys %dictionary;
state $with_brackets = {
map {("{$_}" => "[$dictionary{$_}]")} keys %dictionary
};
$content =~ s{(\{(?:$list)\})}{$with_brackets->{$1} // $1}gex;
return $content;
}
__DATA__
pollack pollard
polynya polyoma
pomaces pomaded
pomades pomatum
practic praetor
prairie praised
praiser praises
prajnas praline
quakily quaking
qualify quality
quamash quangos
quantal quanted
quantic quantum
Start this is the text that contains the words to replace. {quantal} A computer {pollack} is a general {pomaces} purpose device {practic} that
can be {quakily} programmed to carry out a set {quantic} of arithmetic or logical operations automatically {quamash}.
Since a {prajnas} sequence of operations can {praiser} be readily changed, the computer {pomades} can solve more than {prairie}
one kind of problem {qualify} {doesNotExist} end.
Outputs:
Rate original xor_regex build_list list_and_xor
original 19120/s -- -23% -24% -29%
xor_regex 24938/s 30% -- -1% -8%
build_list 25253/s 32% 1% -- -7%
list_and_xor 27027/s 41% 8% 7% --
My solutions make heavy use of state variables to avoid reinitializing static data structures. However, one could also use closures or our $var; $var ||= VAL.
Addendum about enhancing the LHS of the regex
Actually, editing the LHS to use an explicit list is about improving the regular expression. And this change showed a 30% improvement in speed.
There isn't likely to be any magic solution to this. You have a list of values, that you're wanting to replace. It isn't like there is some mysterious way to simplify the language of this goal.
You could perhaps use a code block in the LHS to Fail and skip if the word does not exist in the dictionary hash. However, the following shows that this is actually 36% slower than your original method:
sub skip_fail {
my $content = shift;
$content =~ s{\{(.+?)\}(?(?{! $dictionary{$1}})(*SKIP)(*FAIL))}{[$dictionary{$1}]}gx;
return $content;
}
Outputs:
Rate skip_fail original xor_regex build_list list_and_xor
skip_fail 6769/s -- -36% -46% -49% -53%
original 10562/s 56% -- -16% -21% -27%
xor_regex 12544/s 85% 19% -- -6% -14%
build_list 13355/s 97% 26% 6% -- -8%
list_and_xor 14537/s 115% 38% 16% 9% --

Unexpected speed behaviour when benchmarking Perl regexs

Whilst discussing the relative merits of using index() in Perl to search for substrings I decided to write a micro benchmark to prove what I had seen before than index is faster than regular expressions when looking for a substring. Here is the benchmarking code:
use strict;
use warnings;
use Benchmark qw(:all);
my #random_data;
for (1..100000) {
push(#random_data, int(rand(1000)));
}
my $warn_about_counts = 0;
my $count = 100;
my $search = '99';
cmpthese($count, {
'Using regex' => sub {
my $instances = 0;
my $regex = qr/$search/;
foreach my $i (#random_data) {
$instances++ if $i =~ $regex;
}
warn $instances if $warn_about_counts;
return;
},
'Uncompiled regex with scalar' => sub {
my $instances = 0;
foreach my $i (#random_data) {
$instances++ if $i =~ /$search/;
}
warn $instances if $warn_about_counts;
return;
},
'Uncompiled regex with literal' => sub {
my $instances = 0;
foreach my $i (#random_data) {
$instances++ if $i =~ /99/;
}
warn $instances if $warn_about_counts;
return;
},
'Using index' => sub {
my $instances = 0;
foreach my $i (#random_data) {
$instances++ if index($i, $search) > -1;
}
warn $instances if $warn_about_counts;
return;
},
});
What I was surprised at was how these performed (using Perl 5.10.0 on a recent MacBook Pro). In descending order of speed:
Uncompiled regex with literal (69.0 ops/sec)
Using index (61.0 ops/sec)
Uncompiled regex with scalar (56.8 ops/sec)
Using regex (17.0 ops/sec)
Can anyone offer an explanation as to what voodoo Perl is using to get the speed of the two uncomplied regular expressions to perform as well as the index operation? Is it an issue in the data I've used to generate the benchmark (looking for the occurrence of 99 in 100,000 random integers) or is Perl able to do a runtime optimisation?
Wholesale revision
In light of #Ven'Tatsu's comment, I changed the benchmark a bit:
use strict; use warnings;
use Benchmark qw(cmpthese);
use Data::Random qw( rand_words );
use Data::Random::WordList;
my $wl = Data::Random::WordList->new;
my #data_1 = (rand_words( size => 10000 )) x 10;
my #data_2 = #data_1;
my $pat = 'a(?=b)';
my $re = qr/^$pat/;
cmpthese(1, {
'qr/$search/' => sub {
my $instances = grep /$re/, #data_1;
return;
},
'm/$search/' => sub {
my $search = 'a(?=b)';
my $instances = grep /^$search/, #data_2;
return;
},
});
On Windows XP with ActiveState perl 5.10.1:
Rate qr/$search/ m/$search/
qr/$search/ 5.40/s -- -73%
m/$search/ 20.1/s 272% --
On Windows XP with Strawberry perl 5.12.1:
Rate qr/$search/ m/$search/
qr/$search/ 6.42/s -- -66%
m/$search/ 18.6/s 190% --
On ArchLinux with bleadperl:
Rate qr/$search/ m/$search/
qr/$search/ 9.25/s -- -38%
m/$search/ 14.8/s 60% --
Well, your case "Using regex" is so slow because you are compiling it each time. Try moving it out of the subroutine.
Perl optimizes a lot of things. Your pattern with no special regex features and literal characters allows perl's regex engine to simplify many things. Using use re 'debug' can show you what's actually happening behind the scenes.

Does the 'o' modifier for Perl regular expressions still provide any benefit?

It used to be considered beneficial to include the 'o' modifier at the end of Perl regular expressions. The current Perl documentation does not even seem to list it, certainly not at the modifiers section of perlre.
Does it provide any benefit now?
It is still accepted, for reasons of backwards compatibility if nothing else.
As noted by J A Faucett and brian d foy, the 'o' modifier is still documented, if you find the right places to look (one of which is not the perlre documentation). It is mentioned in the perlop pages. It is also found in the perlreref pages.
As noted by Alan M in the accepted answer, the better modern technique is usually to use the qr// (quoted regex) operator.
/o is deprecated. The simplest way to make sure a regex is compiled only once is to use use a regex object, like so:
my $reg = qr/foo$bar/;
The interpolation of $bar is done when the variable $reg is initialized, and the cached, compiled regex will be used from then on within the enclosing scope. But sometimes you want the regex to be recompiled, because you want it to use the variable's new value. Here's the example Friedl used in The Book:
sub CheckLogfileForToday()
{
my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];
my $today_regex = qr/^$today:/i; # compiles once per function call
while (<LOGFILE>) {
if ($_ =~ $today_regex) {
...
}
}
}
Within the scope of the function, the value of $today_regex stays the same. But the next time the function is called, the regex will be recompiled with the new value of $today. If he had just used:
if ($_ =~ m/^$today:/io)
...the regex would never be updated. So, with the object form you have the efficiency of /o without sacrificing flexibility.
The /o modifier is in the perlop documentation instead of the perlre documentation since it is a quote-like modifier rather than a regex modifier. That has always seemed odd to me, but that's how it is. Since Perl 5.20, it's now listed in perlre simply to note that you probably shouldn't use it.
Before Perl 5.6, Perl would recompile the regex even if the variable had not changed. You don't need to do that anymore. You could use /o to compile the regex once despite further changes to the variable, but as the other answers noted, qr// is better for that.
In the Perl 5 version 20.0 documentation
http://perldoc.perl.org/perlre.html
it states
Modifiers
Other Modifiers
…
o - pretend to optimize your code, but actually introduce bugs
which may be a humorous way of saying it was supposed to perform some kind of optimisation, but the implementation is broken.
Thus the option might be best avoided.
This is an optimization in the case that the regex includes a variable reference. It indicates that the regex does not change even though it has a variable within it. This allows for optimizations that would not be possible otherwise.
Here are timings for different ways to call matching.
$ perl -v | grep version
This is perl 5, version 20, subversion 1 (v5.20.1) built for x86_64-linux-gnu-thread-multi
$ perl const-in-re-once.pl | sort
0.200 =~ CONST
0.200 =~ m/$VAR/o
0.204 =~ m/literal-wo-vars/
0.252 =~ m,#{[ CONST ]},o
0.260 =~ $VAR
0.276 =~ m/$VAR/
0.336 =~ m,#{[ CONST ]},
My code:
#! /usr/bin/env perl
use strict;
use warnings;
use Time::HiRes qw/ tv_interval clock_gettime gettimeofday /;
use BSD::Resource qw/ getrusage RUSAGE_SELF /;
use constant RE =>
qr{
https?://
(?:[^.]+-d-[^.]+\.)?
(?:(?: (?:dev-)? nind[^.]* | mr02 )\.)?
(?:(?:pda|m)\.)?
(?:(?:news|haber)\.)
(?:.+\.)?
yandex\.
.+
}x;
use constant FINAL_RE => qr,^#{[ RE ]}(/|$),;
my $RE = RE;
use constant ITER_COUNT => 1e5;
use constant URL => 'http://news.trofimenkov.nerpa.yandex.ru/yandsearch?cl4url=www.forbes.ru%2Fnews%2F276745-visa-otklyuchila-rossiiskie-banki-v-krymu&lr=213&lang=ru';
timeit(
'=~ m/literal-wo-vars/',
ITER_COUNT,
sub {
for (my $i = 0; $i < ITER_COUNT; ++$i) {
URL =~ m{
^https?://
(?:[^.]+-d-[^.]+\.)?
(?:(?: (?:dev-)? nind[^.]* | mr02 )\.)?
(?:(?:pda|m)\.)?
(?:(?:news|haber)\.)
(?:.+\.)?
yandex\.
.+
(/|$)
}x
}
}
);
timeit(
'=~ m/$VAR/',
ITER_COUNT,
sub {
for (my $i = 0; $i < ITER_COUNT; ++$i) {
URL =~ m,^$RE(/|$),
}
}
);
timeit(
'=~ $VAR',
ITER_COUNT,
sub {
my $r = qr,^$RE(/|$),o;
for (my $i = 0; $i < ITER_COUNT; ++$i) {
URL =~ $r
}
}
);
timeit(
'=~ m/$VAR/o',
ITER_COUNT,
sub {
for (my $i = 0; $i < ITER_COUNT; ++$i) {
URL =~ m,^$RE(/|$),o
}
}
);
timeit(
'=~ m,#{[ CONST ]},',
ITER_COUNT,
sub {
for (my $i = 0; $i < ITER_COUNT; ++$i) {
URL =~ m,^#{[ RE ]}(/|$),
}
}
);
timeit(
'=~ m,#{[ CONST ]},o',
ITER_COUNT,
sub {
for (my $i = 0; $i < ITER_COUNT; ++$i) {
URL =~ m,^#{[ RE ]}(/|$),o
}
}
);
timeit(
'=~ CONST',
ITER_COUNT,
sub {
my $r = qr,^$RE(/|$),o;
for (my $i = 0; $i < ITER_COUNT; ++$i) {
URL =~ FINAL_RE
}
}
);
sub timeit {
my ($name, $iters, $code) = #_;
#my $t0 = [gettimeofday];
my $t0 = (getrusage RUSAGE_SELF)[0];
$code->();
#my $el = tv_interval($t0);
my $el = (getrusage RUSAGE_SELF)[0] - $t0;
printf "%.3f\t%-17s\t%.9f\n", $el, $name, $el / $iters
}
Yep and Nope
I ran a simple comparison using the follow script:
perl -MBenchmark=cmpthese -E 'my #n = 1..10000; cmpthese(10000, {string => sub{"a1b" =~ /a\d+c/ for #n}, o_flag => sub{"a1b" =~ /a\d+c/o for #n}, qr => sub{my $qr = qr/a\d+c/; "a1b" =~ /$qr/ for #n } })'
Here are the results:
Rate qr string o_flag
qr 760/s -- -72% -73%
string 2703/s 256% -- -5%
o_flag 2833/s 273% 5% --
So, clearly the /o flag is much faster than using qr.
But apparently the /o flag may cause bugs:
Perl regex /o optimization or bug?
One thing it, mystifyingly, does not do is, allow a ONCE block, at least at 5.8.8.
perl -le 'for (1..3){
print;
m/${\(print( "between 1 and 2 only"), 3)}/o and print "matched"
}'