Puppet regular expression (grab path in var between quotation marks) - regex

I have the following yaml:
role::project::conf::files:
entry_01:
- 'file "/var/project/some_file.txt" 5333;'
- 'echo no;'
entry_02:
- 'file "/var/project/some_other_file.txt" 5334;'
- 'echo yes;'
entry_03:
- 'file "/var/project/extra_file.txt" 5335;'
- 'echo yes;'
then I've used the following regular expression https://regex101.com/r/pVzseA/1 in order to grab the value between those quotation marks (the regex works in regex101.com) but doesn't work in Puppet:
each($files) | $files_itm | {
if $files_itm[1] =~ /"([^"]*)"/ {
#how to get only the path here in var
}
}
Update, the class:
class role::project::conf (
$files = [],
){
each($files) | $files_itm | {
if $files_itm[1] =~ /"([^"]*)"/ {
notify { "file ${1} seen": }
}
}
}

I've used the following regular expression
https://regex101.com/r/pVzseA/1 in order to grab the value between
those quotation marks (the regex works in regex101.com) but doesn't
work in Puppet:
each($files) | $files_itm | {
if $files_itm[1] =~ /"([^"]*)"/ {
#how to get only the path here in var
}
}
As I noted in comments, Puppet uses Ruby's flavor of regular expression. This is not among those explicitly supported by Regex101, so it is not out of the question that a regex that worked there would not work with Puppet. However, the Ruby-flavor online regular expression tester at Rubular shows that your particular regular expression works just fine in Ruby, including the the matching group.
You don't show how you tried to obtain the matched group in your Puppet code, but the appropriate procedure is given in Puppet's regular expression docs. In particular:
Within conditional statements and node definitions, substrings withing [sic]
parentheses () in a regular expression are available as numbered
variables inside the associated code section. The first is $1, the
second is $2, and so on. The entire match is available as $0. These
are not normal variables, and have some special behaviors: [...]
This is also reflected in the documentation for conditional statements.
You have only one capture group, so inside the body of your conditional statement, you can refer to the captured content as $1. For example,
each($files) | $files_itm | {
if $files_itm[1] =~ /"([^"]*)"/ {
notify { "file ${1} seen": }
}
}
Note well, however, that the capture group variables are only accessible within the body of the conditional statement, and outside any nested conditionals that have a regular expression match in their condition. Also, it does not seem to be documented what the implications are if there are two or more regex-match expressions in the same conditional statement's condition.
Update
But the update to the question and the error messages you presented in comments show that the problem you're struggling with is something altogether different. The value associated with Hiera key role::project::conf::files is a hash, with string keys and array values, whereas you seem to expect an array according to the parameter default encoded into the class. When you iterate over a hash and capture the entries in a single variable, that variable will take two-element array values, where the first (element 0) is the entry's key and the second (element 1) is corresponding value.
Thus, when the Hiera data presented in the question is bound to your $files parameter, your expression $files_itm[1] evaluates to array values such as ['file "/var/project/some_file.txt" 5333;', 'echo no;']. Those are not suitable left-hand operands for the =~ operator, which is what the error message is telling you.
It's difficult to tell what you really want here, but this ought to at least avoid error (in conjunction with your YAML data):
each($files) | $itm_key, $itm_value | {
if $itm_value[0] =~ /"([^"]*)"/ {
notify { "file ${1} seen": }
}
}

Related

Extract regex substring from command output and pass into another command [duplicate]

I have the following code:
$DatabaseSettings = #();
$NewDatabaseSetting = "" | select DatabaseName, DataFile, LogFile, LiveBackupPath;
$NewDatabaseSetting.DatabaseName = "LiveEmployees_PD";
$NewDatabaseSetting.DataFile = "LiveEmployees_PD_Data";
$NewDatabaseSetting.LogFile = "LiveEmployees_PD_Log";
$NewDatabaseSetting.LiveBackupPath = '\\LiveServer\LiveEmployeesBackups';
$DatabaseSettings += $NewDatabaseSetting;
When I try to use one of the properties in a string execute command:
& "$SQlBackupExePath\SQLBackupC.exe" -I $InstanceName -SQL `
"RESTORE DATABASE $DatabaseSettings[0].DatabaseName FROM DISK = '$tempPath\$LatestFullBackupFile' WITH NORECOVERY, REPLACE, MOVE '$DataFileName' TO '$DataFilegroupFolder\$DataFileName.mdf', MOVE '$LogFileName' TO '$LogFilegroupFolder\$LogFileName.ldf'"
It tries to just use the value of $DatabaseSettings rather than the value of $DatabaseSettings[0].DatabaseName, which is not valid.
My workaround is to have it copied into a new variable.
How can I access the object's property directly in a double-quoted string?
When you enclose a variable name in a double-quoted string it will be replaced by that variable's value:
$foo = 2
"$foo"
becomes
"2"
If you don't want that you have to use single quotes:
$foo = 2
'$foo'
However, if you want to access properties, or use indexes on variables in a double-quoted string, you have to enclose that subexpression in $():
$foo = 1,2,3
"$foo[1]" # yields "1 2 3[1]"
"$($foo[1])" # yields "2"
$bar = "abc"
"$bar.Length" # yields "abc.Length"
"$($bar.Length)" # yields "3"
PowerShell only expands variables in those cases, nothing more. To force evaluation of more complex expressions, including indexes, properties or even complete calculations, you have to enclose those in the subexpression operator $( ) which causes the expression inside to be evaluated and embedded in the string.
#Joey has the correct answer, but just to add a bit more as to why you need to force the evaluation with $():
Your example code contains an ambiguity that points to why the makers of PowerShell may have chosen to limit expansion to mere variable references and not support access to properties as well (as an aside: string expansion is done by calling the ToString() method on the object, which can explain some "odd" results).
Your example contained at the very end of the command line:
...\$LogFileName.ldf
If properties of objects were expanded by default, the above would resolve to
...\
since the object referenced by $LogFileName would not have a property called ldf, $null (or an empty string) would be substituted for the variable.
Documentation note: Get-Help about_Quoting_Rules covers string interpolation, but, as of PSv5, not in-depth.
To complement Joey's helpful answer with a pragmatic summary of PowerShell's string expansion (string interpolation in double-quoted strings ("...", a.k.a. expandable strings), including in double-quoted here-strings):
Only references such as $foo, $global:foo (or $script:foo, ...) and $env:PATH (environment variables) can directly be embedded in a "..." string - that is, only the variable reference itself, as a whole is expanded, irrespective of what follows.
E.g., "$HOME.foo" expands to something like C:\Users\jdoe.foo, because the .foo part was interpreted literally - not as a property access.
To disambiguate a variable name from subsequent characters in the string, enclose it in { and }; e.g., ${foo}.
This is especially important if the variable name is followed by a :, as PowerShell would otherwise consider everything between the $ and the : a scope specifier, typically causing the interpolation to fail; e.g., "$HOME: where the heart is." breaks, but "${HOME}: where the heart is." works as intended.
(Alternatively, `-escape the :: "$HOME`: where the heart is.", but that only works if the character following the variable name wouldn't then accidentally form an escape sequence with a preceding `, such as `b - see the conceptual about_Special_Characters help topic).
To treat a $ or a " as a literal, prefix it with escape char. ` (a backtick); e.g.:
"`$HOME's value: $HOME"
For anything else, including using array subscripts and accessing an object variable's properties, you must enclose the expression in $(...), the subexpression operator (e.g., "PS version: $($PSVersionTable.PSVersion)" or "1st el.: $($someArray[0])")
Using $(...) even allows you to embed the output from entire commands in double-quoted strings (e.g., "Today is $((Get-Date).ToString('d')).").
Interpolation results don't necessarily look the same as the default output format (what you'd see if you printed the variable / subexpression directly to the console, for instance, which involves the default formatter; see Get-Help about_format.ps1xml):
Collections, including arrays, are converted to strings by placing a single space between the string representations of the elements (by default; a different separator can be specified by setting preference variable $OFS, though that is rarely seen in practice) E.g., "array: $(#(1, 2, 3))" yields array: 1 2 3
Instances of any other type (including elements of collections that aren't themselves collections) are stringified by either calling the IFormattable.ToString() method with the invariant culture, if the instance's type supports the IFormattable interface[1], or by calling .psobject.ToString(), which in most cases simply invokes the underlying .NET type's .ToString() method[2], which may or may not give a meaningful representation: unless a (non-primitive) type has specifically overridden the .ToString() method, all you'll get is the full type name (e.g., "hashtable: $(#{ key = 'value' })" yields hashtable: System.Collections.Hashtable).
To get the same output as in the console, use a subexpression in which you pipe to Out-String and apply .Trim() to remove any leading and trailing empty lines, if desired; e.g.,
"hashtable:`n$((#{ key = 'value' } | Out-String).Trim())" yields:
hashtable:
Name Value
---- -----
key value
[1] This perhaps surprising behavior means that, for types that support culture-sensitive representations, $obj.ToString() yields a current-culture-appropriate representation, whereas "$obj" (string interpolation) always results in a culture-invariant representation - see this answer.
[2] Notable overrides:
• The previously discussed stringification of collections (space-separated list of elements rather than something like System.Object[]).
• The hashtable-like representation of [pscustomobject] instances (explained here) rather than the empty string.
#Joey has a good answer. There is another way with a more .NET look with a String.Format equivalent, I prefer it when accessing properties on objects:
Things about a car:
$properties = #{ 'color'='red'; 'type'='sedan'; 'package'='fully loaded'; }
Create an object:
$car = New-Object -typename psobject -Property $properties
Interpolate a string:
"The {0} car is a nice {1} that is {2}" -f $car.color, $car.type, $car.package
Outputs:
# The red car is a nice sedan that is fully loaded
If you want to use properties within quotes follow as below. You have to use $ outside of the bracket to print property.
$($variable.property)
Example:
$uninstall= Get-WmiObject -ClassName Win32_Product |
Where-Object {$_.Name -like "Google Chrome"
Output:
IdentifyingNumber : {57CF5E58-9311-303D-9241-8CB73E340963}
Name : Google Chrome
Vendor : Google LLC
Version : 95.0.4638.54
Caption : Google Chrome
If you want only name property then do as below:
"$($uninstall.name) Found and triggered uninstall"
Output:
Google Chrome Found and triggered uninstall

Why does the match operator's "match-only-once" optimization only apply with the "?" delimiter?

From the docs (perldoc -f m)
If ? is the delimiter, then a match-only-once rule applies, described in m?*PATTERN*? below.
The "match-only-once rule" doesn't' seem to be defined anywhere, but it seems to be a real optimization,
use Benchmark qw(:all) ;
use constant HAYSTACK => "this is a test string";
my $needle = "test";
cmpthese(-1, {
'questionmark' => sub { if ( HAYSTACK =~ m?$needle?n ) { 1 } },
'backslash' => sub { if ( HAYSTACK =~ m/$needle/n ) { 1 } },
});
With the results,
Rate backslash questionmark
backslash 9267717/s -- -57%
questionmark 21588328/s 133% --
This makes me wonder why is the behavior in m// in scalar context such that it even needs this behavior? Let's take for example the output
perl -E'say "FOOOOOO" =~ m/O/' # returns 1
If it's not even counting the O what does it do after the first match such that it's twice as slow?
The "match-only-once rule" doesn't' seem to be defined anywhere, […]
"A match-only-once rule" is a description of the rule — it's a rule saying that m?PATTERN? matches only once — not an official name that you can use to search. The text that you quote is pulled from the perlop manpage, so when it says "described in m?*PATTERN*? below", it's referring to this part of that manpage:
m?PATTERN?msixpodualngc
This is just like the m/PATTERN/ search, except that it matches only once between calls to the reset() operator. This is a useful optimization when you want to see only the first occurrence of something in each file of a set of files, for instance. Only m?? patterns local to the current package are reset.
while (<>) {
if (m?^$?) {
# blank line between header and body
}
} continue {
reset if eof; # clear m?? status for next file
}
Another example switched the first "latin1" encoding it finds to "utf8" in a pod file:
s//utf8/ if m? ^ =encoding \h+ \K latin1 ?x;
This makes me wonder why is the behavior in m// in scalar context such that it even needs this behavior?
Even in scalar context, m// or m?? may be called many times between resets, and if so then the two behave differently. (You can see this in the first snippet above. It's also the reason that your benchmarks give different performance results: the version with m?$needle?n only does a regex match the first time the function is called — it just returns 'no match' on all subsequent calls — whereas the version with m/$needle/n does a regex match every time.)
The confusion here is that "once" in "match-only-once" is in reference to the calling context of the m?? not in reference to matching once the needle inside the haystack, and ignoring subsequent matches of the needle inside the haystack. So if m?? is called many times without reset, only the first one that matches will return the match.
sub foo { return "foo" =~ m?o? };
say foo(); # 1
say foo(); # undef
reset();
say foo(); # 1

Capture and execute multiline code and incorporate result in raku

This is a markdown document example.md I have:
## New language
Raku is a new language different from Perl.
## what does it offer
+ Object-oriented programming including generics, roles and multiple dispatch
+ Functional programming primitives, lazy and eager list evaluation, junctions, autothreading and hyperoperators (vector operators)
+ Parallelism, concurrency, and asynchrony including multi-core support
+ Definable grammars for pattern matching and generalized string processing
+ Optional and gradual typing
This code will be evaluated.
```{raku evaluate=TRUE}
4/5
```
Rakudo is a compiler for raku programming language. Install it and you're all set to run raku programs!
This code will be evaluated.
```{raku evaluate=TRUE}
say "this is promising";
say $*CWD;
```
This code will **not** be evaluated.
```{raku evaluate=FALSE}
say "Hello world";
```
which I want to convert into example.md as shown below with the code and output within it.
## New language
Raku is a new language different from Perl.
## what does it offer
+ Object-oriented programming including generics, roles and multiple dispatch
+ Functional programming primitives, lazy and eager list evaluation, junctions, autothreading and hyperoperators (vector operators)
+ Parallelism, concurrency, and asynchrony including multi-core support
+ Definable grammars for pattern matching and generalized string processing
+ Optional and gradual typing
This code will be evaluated.
Code:
```{raku evaluate=TRUE}
4/5
```
Output:
```
0.8
```
Rakudo is a compiler for raku programming language. Install it and you're all set to run raku programs!
This code will be evaluated.
Code:
```{raku evaluate=TRUE}
say "this is promising";
say $*CWD;
```
Output:
```
this is promising
"C:\Users\suman".IO
```
This code will **not** be evaluated.
Code:
```{raku evaluate=FALSE}
say "Hello world";
```
What I want to accomplish is:
capture the code between backticks{raku evaluate} and backticks
execute the code if evaluate is TRUE
insert the code and output back into the document
What I tried to do:
Capture multiline code and evaluate expression
my $array= 'example.md'.IO.slurp;
#multiline capture code chunk and evaluate separately
if $array~~/\`\`\`\{raku (.*)\}(.*)\`\`\`/ {
#the first capture $0 will be evaluate
if $0~~"TRUE"{
#execute second capture which is code chunk which is captured in $1
}else {
# don't execute code
};
};
create a temp.p6 file and write code chunk $1 from above into it
my $fh="temp.p6".IO.spurt: $1;
execute the chunk if $0 is TRUE
my $output= q:x/raku temp.p6/ if $0==TRUE
integrate all this into final example.md while we create intermediate example_new.md
my $fh-out = open "example_new.md", :w; # Create a new file
# Print out next file, line by line
for "$file.tex".IO.lines -> $line {
# write output of code to example_new.md
}
$fh-out.close;
# copy
my $io = IO::Path.new("example_new.md");
$io.copy("example.md");
# clean up
unlink("example.md");
# move
$io.rename("example.md");
I am stuck in the first step. Any help?
There are two ways to execute the code and capture the output:
You can write it to a tempfile and use my $result = qqx{perl6 $filename} to spawn a separate process
You can execute the code in the same interpreter using EVAL, and use IO::Capture::Simple to capture STDOUT:
my $re = regex {
^^ # logical newline
'```{perl6 evaluate=' (TRUE|FALSE) '}'
$<code>=(.*?)
'```'
}
for $input.match(:global, $re) -> $match {
if $match[0] eq 'TRUE' {
use IO::Capture::Simple;
my $result = capture_stdout {
use MONKEY-SEE-NO-EVAL;
EVAL $match<code>;
}
# use $result now
}
}
Now you just need to switch from match to subst and return the value from that block that you want to substitute in, and then you're done.
I hope this gives you some idea how to proceed.
Code that accomplishes "What I want to accomplish"
You can run this code against your data with glot.io.
use v6;
constant $ticks = '```';
my regex Search {
$ticks '{raku evaluate=' $<evaluate>=(TRUE|FALSE) '}'
$<code>=[<!before $ticks> .]*
$ticks
}
sub Replace ($/) {
"Code:\n" ~ $ticks ~ $<code> ~ $ticks ~
($<evaluate> eq 'TRUE'
?? "\n\n" ~ 'Output:' ~ "\n" ~ $ticks ~ "\n" ~ Evaluate($<code>) ~ $ticks
!! '');
}
sub Evaluate ($code) {
my $out; my $*OUT = $*OUT but role { method print (*#args) { $out ~= #args } }
use MONKEY; my $eval-result = EVAL $code;
$out // $eval-result ~ "\n"
}
spurt
'example_new.md',
slurp('example.md')
.subst: &Search, &Replace, :g;
Explanation
Starting at the bottom and then working upwards:
The .subst method substitutes parts of its invocant string that need to be replaced and returns the revised string. .subst's first argument is a matcher; it can be a string, or, as here, a regex -- &Search1. .subst's second argument is a replacement; this can also be a string, or, as here, a Callable -- &Replace. If it's a Callable then .subst passes the match from the matcher as a match object2 as the first argument to the Callable. The :g adverb directs .subst to do the search/replace repeatedly for as many matches as there are in the invocant string.
slurp generates a string in one go from a file. No need for open, using handles, close, etc. Its result in this case becomes the invocant of the .subst explained above.
spurt does the opposite, generating a file in one go from a string, in this case the results of the slurp(...).subst... operation.
The Evaluate routine generates a string that's the output from evaluating the string of code passed to it. To capture the result of evaluation it temporarily modifies Raku's STDOUT variable $*OUT, redirecting prints (and thus also says etc.) to the internal variable $out before EVALing the code. If the EVAL results in anything being printd to $out then that is returned; if not, then the result of the EVAL is returned (coerced to a string by the ~). (A newline is appended in this second scenario but not the first because that is what's needed to get the correctly displayed result given how you've "specified" things by your example.)
The Replace routine is passed a match object from a call of the Code regex. It reconstructs the code section (without the evaluate bit) using the $<code> capture. If the $<evaluate> capture is 'TRUE' then it also appends a fresh Output: section using the Evaluate routine explained above to produce the code's output.
The Code regex matches a code section. It captures the TRUE or FALSE setting from the evaluate directive into a capture named $<evaluate> and the code into a capture named $<code>.
Footnotes
1 To pass a routine (a regex is a routine) rather than call it, it must be written with a sigil (&foo), not without (foo).
2 It does this even if the matcher was merely a string!
You can try this regex:
```{perl6 evaluate=(?<evaluate>[^}]+)}\s+(?<code>[^`]+)
You will get three results from your sample text, each result contains two named groups, the first is evaluate, containing the flag and the second one codeis the code.
Have a look at the regex-demo:
https://regex101.com/r/EsERkJ/1
Since I don't know perl, i can't help you with the implementation :(

Perl switch/case Fails on Literal Regex String Containing Non-Capturing Group '?'

I have text files containing lines like:
2/17/2018 400000098627 =2,000.0 $2.0994 $4,387.75
3/7/2018 1)0000006043 2,000.0 $2.0731 $4,332.78
3/26/2018 4 )0000034242 2,000.0 $2.1729 $4,541.36
4/17/2018 2)0000008516 2,000.0 $2.219 $4,637.71
I am matching them with /^\s*(\S+)\s+(?:[0-9|\)| ]+)+\s+([0-9|.|,]+)\s+\$/ But I also have some files with lines in a completely different format, which I match with a different regex. When I open a file I determine which format and assign $pat = '<regex-string>'; in a switch/case block:
$pat = '/^\s*(\S+)\s+(?:[0-9|\)| ]+)+\s+([0-9|.|,]+)\s+\$/'
But the ? character that introduces the non-capturing group I use to match repeats after the date and before the first currency amount causes the Perl interpreter to fail to compile the script, reporting on abort:
syntax error at ./report-dates-amounts line 28, near "}continue "
If I delete the ? character, or replace ? with \? escaped character, or first assign $q = '?' then replace ? with $q inside a " string assignment (ie. $pat = "/^\s*(\S+)\s+($q:[0-9|\)| ]+)+\s+([0-9|.|,]+)\s+\$/"; ) the script compiles and runs. If I assign the regex string outside the switch/case block that also works OK. Perl v5.26.1 .
My code also doesn't have any }continue in it, which as reported in the compilation failure is probably some kind of transformation of the switch/case code by Switch.pm into something native the compiler chokes on. Is this some kind of bug in Switch.pm? It fails even when I use given/when in exactly the same way.
#!/usr/local/bin/perl
use Switch;
# Edited for demo
switch($format)
{
# Format A eg:
# 2/17/2018 400000098627 =2,000.0 $2.0994 $4,387.75
# 3/7/2018 1)0000006043 2,000.0 $2.0731 $4,332.78
# 3/26/2018 4 )0000034242 2,000.0 $2.1729 $4,541.36
# 4/17/2018 2)0000008516 2,000.0 $2.219 $4,637.71
#
case /^(?:april|snow)$/i
{ # This is where the ? character breaks compilation:
$pat = '^\s*(\S+)\s+(?:[0-9|\)| ]+)+\s+\D?(\S+)\s+\$';
# WORKS:
# $pat = '^\s*(\S+)\s+(' .$q. ':[0-9|\)| ]+)+\s+\D' .$q. '(\S+)\s+\$';
}
# Format B
case /^(?:umberto|petro)$/i
{
$pat = '^(\S+)\s+.*Think 1\s+(\S+)\s+';
}
}
Don't use Switch. As mentionned by #choroba in the comments, Switch uses a source filter, which leads to mysterious and hard to debug errors, as you constated.
The module's documentation itself says:
In general, use given/when instead. It were introduced in perl 5.10.0. Perl 5.10.0 was released in 2007.
However, given/when is not necessarily a good option as it is experimental and likely to change in the future (it seems that this feature was almost removed from Perl v5.28; so you definitely don't want to start using it now if you can avoid it). A good alternative is to use for:
for ($format) {
if (/^(?:april|snow)$/i) {
...
}
elsif (/^(?:umberto|petro)$/i) {
...
}
}
It might look weird a first, but once you get used to it, it's actually reasonable in my opinion. Or, of course, you can use none of this options and just do:
sub pattern_from_format {
my $format = shift;
if ($format =~ /^(?:april|snow)$/i) {
return qr/^\s*(\S+)\s+(?:[0-9|\)| ]+)+\s+\D?(\S+)\s+\$/;
}
elsif ($format =~ /^(?:umberto|petro)$/i) {
return qr/^(\S+)\s+.*Think 1\s+(\S+)\s+/;
}
# Some error handling here maybe
}
If, for some reason, you still want to use Switch: use m/.../ instead of /.../.
I have no idea why this bug is happening, however, the documentation says:
Also, the presence of regexes specified with raw ?...? delimiters may cause mysterious errors. The workaround is to use m?...? instead.
Which I misread at first, and therefore tried to use m/../ instead of /../, which fixed the issue.
Another option instead of an if/elsif chain would be to loop over a hash which maps your regular expressions to the values which should be assigned to $pat:
#!/usr/local/bin/perl
my %switch = (
'^(?:april|snow)$' => '^\s*(\S+)\s+(?:[0-9|\)| ]+)+\s+\D?(\S+)\s+\$',
'^(?:umberto|petro)$' => '^(\S+)\s+.*Think 1\s+(\S+)\s+',
);
for my $re (keys %switch) {
if ($format =~ /$re/i) {
$pat = $switch{$re};
last;
}
}
For a more general case (i.e., if you're doing more than just assigning a string to a scalar) you could use the same general technique, but use coderefs as the values of your hash, thus allowing it to execute an arbitrary sub based on the match.
This approach can cover a pretty wide range of the functionality usually associated with switch/case constructs, but note that, because the conditions are pulled from the keys of a hash, they'll be evaluated in a random order. If you have data which could match more than one condition, you'll need to take extra precautions to handle that, such as having a parallel array with the conditions in the proper order or using Tie::IxHash instead of a regular hash.

What's the best way to clear regex matching variables?

What's the best way to clear/reset all regex matching variables?
Example how $1 isn't reset between regex operations and uses the most recent match:
$_="this is the man that made the new year rumble";
/ (is) /;
/ (isnt) /;
say $1; # outputs "is"
Example how this may be problematic when working with loops:
foreach (...){
/($some_value)/;
&doSomething($1) if $1;
}
Update: I didn't think I'd need to do this, but Example-2 is only an example. This question is about resetting matching variables, not the best way to implement them.
Regardless, originally my coding style was more inline with being explicit and using if-blocks. After coming back to this (Example2) now, it is much more concise in reading many lines of code, I'd find this syntax faster to comprehend.
You should use the return from the match, not the state of the group vars.
foreach (...) {
doSomething($1) if /($some_value)/;
}
$1, etc. are only guaranteed to reflect the most recent match if the match succeeds. You shouldn't be looking at them other than right after a successful match.
Regex captures* are reset by a successful match. To reset regex captures, one would use a trivial match operation that's guaranteed to match.
"a" =~ /a/; # Reset captures to undef.
Yeah, it looks weird, but you asked to do some thing weird.
If you fix your code, you don't need weird-looking workarounds. Fixing your code even reveals a bug!
Fixes:
$_ = "this is the man that made the new year rumble";
if (/ (is) / || / (isnt) /) {
say $1;
} else{
... # You're currently printing something random.
}
and
for (...) {
if (/($some_pattern)/) {
do_something($1);
}
}
* — Backrefs are regex patterns that match previously captured text. e.g. \1, \k<foo>. You're actually talking about "regex capture buffers".
You should test whether the match succeeded. For example:
foreach (...){
/($some_value)/ or next;
doSomething($1) if $1;
}
foreach (...){
doSomething($1) if /($some_value)/ and $1;
}
foreach (...){
if (/($some_value)/) {
doSomething($1) if $1;
}
}
Depending on what $some_value is, and how you want to handle matching the empty string and/or 0, you may or may not need to test $1 at all.
To complement the existing, helpful answers (and the sensible recommendation to normally test the result of a matching operation in a Boolean context and take action only if the test succeeds notwithstanding):
Depending on your scenario, you can approach the problem differently:
Disclaimer: I'm not an experienced Perl programmer; do let me know if there are problems with this approach.
Enclose the matching operation in a do { ... } block scopes all regex-related special variables ($&, $1, ...) to that block.
Thus, you can use a do { ... } to prevent these special variables from getting set in the first place (although the ones from a previous regex operation outside the block will obviously remain in effect); for instance:
$_="this is the man that made the new year rumble";
# Match in current scope; -> $&, $1, ... *are* set.
/ (is) /;
# Match inside a `do` block; the *new* $&, $1, ... values
# are set only *inside* the block;
# `&& $1` passes out the block's version of `$1`.
$do1 = do { / (made) / && $1 };
print "\$1 == '$1'; \$do1 == '$do1'\n"; # -> $1 == 'is'; $do1 == 'made'
The advantage of this approach is that none of the current scope's special regex variables are set or altered; the accepted answer, by contrast, alters variables such as $&, and $'.
The disadvantage is that you must explicitly pass out variables of interest; you do get the result of the matching operation by default, however, and if you're only interested in the contents of capture buffers, that will suffice.
You shoud do it this way:
foreach (...) {
someFnc($1) if /.../;
}
But if you want to stick with your style, then check this as an idea:
$_ = "this is the man that made the new year rumble";
$m = /(is)/ ? $1 : undef;
$m = /(isnt)/ ? $1 : undef;
print $m, "\n" if defined $m;
Assigning captures to a list behave closer to what it sounds like you want.
for ("match", "fail") {
my ($fake_1) = /(m.+)/;
doSomething($fake_1) if $fake_1;
}