Quote - capture - question - regex

Could someone explain, why I can use $1 two times and get different results?
perl -wle '"ok" =~ /(.*)/; sub { "huh?" =~ /(.*)/; print for #_ }->( "$1", $1 )'
(Found in: How to exclude submatches in Perl?)

The #_ argument array doesn't behave the way you think it does. The values in #_ in a subroutine are actually aliases for the real arguments:
The array #_ is a local array, but its elements are aliases for the actual scalar parameters.
When you say this:
sub s {
"huh?" =~ /(.*)/;
print for #_;
}
"ok" =~ /(.*)/;
s("$1", $1);
The $1 in the first argument to s is immediately evaluated by the string interpolation but the second argument is not evaluated, it is just noted that the second value in the sub's version of #_ is $1 (the actual variable $1, not its value). Then, inside s, the value of $1 is changed by your regular expression. And now, your #_ has an alias for the string "ok" followed by an alias for $1, these aliases are resolved by the print in your loop.
If you change the function to this:
sub s {
my #a = #_;
"huh?" =~ /(.*)/;
print for #a;
}
or even this:
sub s {
local $1;
"huh?" =~ /(.*)/;
print for #_;
}
Then you'll get the two lines of "ok" that you're expecting. The funny (funny peculiar, not funny ha-ha) is that those two versions of s produce your expected result for different reasons. The my #a = #_; version extracts the current values of the aliases in #_ before the regular expression gets its hands on $1; the local $1; version localizes the $1 variable to the sub leaving the alias in #_ referencing the version of $1 from outside the sub:
A local modifies the listed variables to be local to the enclosing block, file, or eval.
Oddities like this are why you should always copy the values of the numbered regex capture variables to variables of your as soon as possible and why you want to unpack #_ right at the beginning of your functions (unless you know why you don't want to do that).
Hopefully I haven't butchered the terminology too much, this is one of those weird corners of Perl that I've always stayed away from because I don't like juggling razor blades.

The sample code makes use of two facts:
The elements of the #_ array are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (and vice versa).
$1 is a global variable (albeit dynamically scoped to the current BLOCK), which automatically contains the subpattern from () from the last successful pattern match.
The first argument to the subroutine is an ordinary string ("ok"). The second argument is the global variable $1. But it is changed by the successful pattern match inside the subroutine, before the arguments are printed.

That happens because perl passes parameters by reference.
What you are doing is similar to:
my $a = 'ok';
sub foo {
$a = 'huh?';
print for #_;
}
my $b = $a;
foo($b, $a)
When the sub foo is called, $_[1] is actually an alias for $a and so its value gets modified when $a is modified.

Related

perl regex error: Modification of a read-only value attempted

I have this perl script:
use strict;
use warnings;
foreach my $line (" ^?[?12;12A", " ^?[A") {
print "$line\n";
$line =~ s/\s?[[:cntrl:]]\[(\?)?([0-9]{1,2}(;[0-9]{1,2})?)?[a-zA-Z]//g;
print "$line\n";
}
Those are two strings that start with a space, then a control character, then some regular ascii characters. It results with this error:
$ perl foo.pl
[?12;12A
Modification of a read-only value attempted at foo.pl line 6.
$
What am I doing wrong?
In a foreach loop the loop variable ("topicalizer") is but an alias for the currently processed list element; by changing it we really change the element.
If any element of LIST is an lvalue, you can modify it by modifying VAR inside the loop. Conversely, if any element of LIST is NOT an lvalue, any attempt to modify that element will fail. In other words, the foreach loop index variable is an implicit alias for each item in the list that you're looping over.
The loop in the question iterates over a list of string literals, and those are read-only. Attempting to change that is a fatal error (perldiag) (this case, of foreach, is given as one example.
Some ways around this are shown in Hameed's answer, to store them in an array, or to assign the string literal to a variable first.
Or, use the "non-destructive" modifier on the substitution operator, s///r, which doesn't change the original but returns the changed value (or the original if it didn't change)
my $new_line = $line =~ s/.../.../r;
In your case $line is a read-only value.
You can fix this in two ways:
Work with an actual array like my #testarray = (" ^?[?12;12A", " ^?[A");
Assign the value of $line to another variable and modify that:
my $tmp = $line;
$tmp =~ s/\s?[[:cntrl:]]\[(\?)?([0-9]{1,2}(;[0-9]{1,2})?)?[a-zA-Z]//g;

extract first word from a sentence and store it

i am extracting first word from a line using regex in Perl
for my $source_line (#lines) {
$source_line =~ /^(.*?)\s/
}
But I want to store the first word into a variable
when I print the below code, I get correct output
print($source_line =~ /^(.*?)\s/)
when I want to store in $i and print it, I get output as 1.
my $i = ($source_line =~ /^(.*?)\s/);
print $i;
How do the store the first word into a temporary variable
You need to evaluate the match in list context.
my ($i) = $source_line =~ /^(.*?)\s/;
my ($i) is the same as (my $i), which "looks like a list", so it causes = to be the list assignment operator, and the list assignment operator evaluates its RHS in list context.
By the way, the following version works even if there's only one work and when there's leading whitespace:
my ($i) = $source_line =~ /(\S+)/;
It all comes down to context, this expression:
$source_line =~ /^(.*?)\s/
returns a list of matches.
When you evaluate a list in list context, you get the list itself back. When you evaluate a list in scalar context, you get the size of the list back; which is what is happening here.
So changing your lhs expression to be in list context:
my ($i) = $source_line =~ /^(.*?)\s/;
captures the word correctly.
There were recently a few articles on Perl Weekly related to context, here is one of them that was particularly good: http://perlhacks.com/2013/12/misunderstanding-context/

Perl, Assign regex match to scalar

There's an example snippet in Mail::POP3Client in which theres a piece of syntax that I don't understand why or how it's working:
foreach ( $pop->Head( $i ) ) {
/^(From|Subject):\s+/i and print $_, "\n";
}
The regex bit in particular. $_ remains the same after that line but only the match is printed.
An additional question; How could I assign the match of that regex to a scalar of my own so I can use that instead of just print it?
This is actually pretty tricky. What it's doing is making use of perl's short circuiting feature to make a conditional statement. it is the same as saying this.
if (/^(From|Subject):\s+/i) {
print $_;
}
It works because perl stops evaluating and statements after something evaluates to 0. and unless otherwise specified a regex in the form /regex/ instead of $somevar =~ /regex/ will apply the regex to the default variable, $_
you can store it like this
my $var;
if (/^(From|Subject):\s+/i) {
$var = $_;
}
or you could use a capture group
/^((?:From|Subject):\s+)/i
which will store the whole thing into $1

How does =~ behave in Perl?

I have the following perl script:
$myVariable = "some value";
//... some code ...
$myVariable =~ s/\+/ /g;
$myVariable =~s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/seg;
//.. some more code ...
Reading from perl documentation, I think that =~ operator returns a boolean value, namely true or false.
My question is: Does the above operations where $myVariable is involved affect its value, or not.
N.B. The $myVariable's value is set in my script to a string that is a result from the $_ variable's split. I think that should not affect the behavior of that operation.
P.S. If you need more code from my script just let me now.
$myVariable is changed, but because you are doing substitutions (s///), not because the result of a match or substitution is a boolean; =~ is not like =, it is like ==. If you want the boolean result of the action, you need to assign it.
$result = $myVariable =~ s/\+/ /g;
We are talking about
leftvalue =~ rightvalue
rightvalue must be one of these things:
m/regexp/
s/regexp/replacement/
tr/regexp/translation/
leftvalue can be anything that is a left value.
The expresion leftvalue =~ rightwalue always evaluates to a boolean value, but this value is not assigned to leftvalue! This boolean value is the value of the expression itself! So you can use it very fine in an if-clause:
if (leftvalue =~ rightvalue) {
// do something
}
m/regexp/ will never change anything. It just tests, if regexp matches on leftvalue.
s/regexp/replacement/ also tests if regexp matches on leftvalue, and if so, it replaces the matching part with replacement. If regexp did match, leftvalue =~ rightvalue is true, otherwise it is false.
tr/regexp/replacement/ analogously the same as s///, but with translation instead of replacement.
So this will work fine:
my #a=('acbc123','aubu123');
foreach (#a) {
if ($_ =~ s/c(\d)/x$1/g;) {
$_ .= 'MATCHED!';
}
}
The results will be:
a[0] = 'acbx123MATCHED!'
the 'c', followed by a digit did match the regular expression. So ist was replaced by 'x' and that digit. And because it matched, the if-statement is true, and 'MATCHED!' is attached to the string.
a[1] = 'aubu123'
The regular expression did not match. Nothing was replaced and the if-statement was false.
The binding operator is just "binds" a target variable to one of the operators. It doesn't affect the value. The substitution operator, s///, normally changes the target value and returns the number of substitutions it made.
my $count = $target =~ s/.../.../;
my $count = ( $target =~ s/.../.../ ); # same thing, clarified with ()
Starting with Perl v5.14, there's a /r flag for the substitution operator that leaves alone the target value, and, instead of returning a count, returns the modified value:
my $modified = $target =~ s/.../.../r;
=~ doesn't quite mean anything by itself, it also needs something on its right to do to the variable on its left.
To see if a variable matches a pattern, you use m// on the right, you'll probably want to use this as a boolean, but you can also use it in other senses. This does not alter $foo:
$foo =~ m/pattern/
To substitute a replacement for a pattern, you use s/// on the right, this alters $foo:
$foo =~ s/pattern/replacement/;
To translate single characters within $foo, you use tr/// on the right, this alters $foo:
$foo =~ tr/abc/def/;

Perl - Why does shift lose its value after being used?

This code works - It takes an array of full txt file paths and strips them so that when $exam_nums[$x] is called, it returns the file name
for (0..$#exam_nums)
{
$exam_nums[$_] =~ s/\.txt$//; #remove extension
$exam_nums[$_] =~ s/$dir//g; #remove path
}
When I try to do this for a single variable, it doesn't work. I'm calling a subroutine and sending it a present, but the variable is empty at the end. (It is getting into the if statement block, because the other lines in there run fine.) Here's the code:
Call to the sub:
notify($_);
The $_ is from a foreach(#files) loop that works
The sub:
sub notify
{
if(shift)
{
$ex_num = shift;
$ex_num =~ s/\.txt$//; #remove extension
$ex_num =~ s/$dir//g; #remove path
print $ex_num;
print "\nanything";
}
}
I tried taking out the $ in the "remove extension" portion of the regex, but that didn't help.
You're shifting TWICE. The first shift in the if statement removes the value, the second shift gets nothing. shift has a side-effect of actually modifying #_. in addition to returning the first element, it removes the first element permanently from #_.
EDIT: from man perlfunc
shift ARRAY
shift Shifts the first value of the array off and returns it,
shortening the array by 1 and moving everything down. If there
are no elements in the array, returns the undefined value. If
ARRAY is omitted, shifts the #_ array within the lexical scope
of subroutines and formats, ...
You are attempting to extract your ex_num argument from #_ (the argument list) twice: shift (which alters #_) is not the same as $_[0] (which just looks at the first element of #_ but does not alter it). See perldoc -f shift.
Also, your function is closing over $dir, which may or may not be your intent. (See perldoc perlfaq7 for more information about closures.) I've taken that out and added it as an additional function parameter:
sub notify
{
my ($ex_num, $dir) = #_;
return unless $ex_num;
$ex_num =~ s/\.txt$//; # remove extension
$ex_num =~ s/$dir//g; # remove path
print $ex_num . "\n";
}
I'd use File::Basename instead of rolling my own. It allows you to parse file paths into their directory, filename and suffix.
As per Jim Garrison's info, I pulled a switch to fix the problem:
sub notify
{
$ex_num = shift;
if($ex_num)
{
$ex_num =~ s/\.txt$//; #remove extension
$ex_num =~ s/$dir//g; #remove path
}
}
Uses a core module, local variables and Perl 5.10.
use 5.010;
use File::Basename;
sub notify {
my $ex_num = shift;
my $name = basename($ex_num, '.txt');
say $name;
}