uninitialized eq warning wwhen there is a regex match - regex

I get a uninitialized value in string eq at *** line xxx warning in my code, which would be easy too fix if there actually was aneq at that line.
But there is a regular expression match on a value inside a hashref.
if ($hashref->{parameters}->{type} =~ m/:/) {
some lines before this I even have this:
$hashref->{parameters} = defined($hashref->{parameters}) ? $hashref->{parameters} : '';
$hashref->{parameters}->{type} = defined($hashref->{parameters}->{type}) ? $hashref->{parameters}->{type} : '';
so the value should be at least initialized.
I'm asking myself and you: why do I still get the warning that the value is uninitialized and moreover why does it say eq instead of pattern match
Edit:
The parameters subhash contains all variabels given via url input (post and/or get).
The type value is one of those variables which could be in the url.
It does not matter if or if not the type value is in the url, and if it contains a value, I always get an uninitialized value in string eq warning. Even if I control the value of type the line by warning it before the buggy line.
2. edit:
As #ikegami supposed there is indeed an elsif which caused the warning
The whole if - elsif statement looks somehow like:
if ($hashref->{parameters}->{type} =~ m/:/) {
…
elsif ($hashref->{parameters}->{type} eq $somevalue) {
…
}
and it was $somevalue that was uninitialized.

You only showed half of the statement on that line. The full statement actually looks something like
456: if ($foo =~ /bar/) {
457: ...
458: }
459: elsif ($baz eq 'qux') {
460: ...
461: }
Run-time warnings for a statement normally use the line number at which the statement started, so if the regex doesn't match and $baz is undefined, you'll get a warning listing line 456 for the eq on line 459.
It's the same idea here:
$ perl -wE' my $x; # 1
say # 2
4 # 3
+ # 4
$x # 5
+ # 6
5; # 7 '
Use of uninitialized value $x in addition (+) at -e line 2.
9
Recently*, Perl was changed so that elsif conditions are considered a different statement to avoid this kind of problem. You must have an older version of Perl.
— Actually, not so recently. Both 5.10 and 5.12 have been end-of-lifed, yet you appear to be using an even older version that that. If you're going to ask questions about a far-obsolete version of Perl, please mention it.
$ perlbrew use 5.10.1
$ perl -we'
my ($x,$y,$z);
if ($x) {
} elsif ($y eq $z) {
}'
Use of uninitialized value $y in string eq at -e line 4.
Use of uninitialized value $z in string eq at -e line 4.
$ perlbrew use 5.8.9
$ perl -we'
my ($x,$y,$z);
if ($x) {
} elsif ($y eq $z) {
}'
Use of uninitialized value in string eq at -e line 3.
Use of uninitialized value in string eq at -e line 3.

Related

During Perl substitution, increase the output with spaces in order it is of the same length as the input

(Disclaimer: I don't understand much of Perl!)
A (nice!) answer to this question of mine advised me to use (more or less) the following perl trick for a tricky substitution:
perl -pe 's#μ(.+?)>(.+?)(?:\&(.+?))?¢¢# sprintf(":%s:`%s`", $1, ($3 eq "" or $2 eq $3) ? $2 : "$3 <$2>")#ge'
And, indeed, that works nicely:
echo "μctanpkg>a4&a4¢¢" | perl -pe 's#μ(.+?)>(.+?)(?:\&(.+?))?¢¢# sprintf(":%s:`%s`", $1, ($3 eq "" or $2 eq $3) ? $2 : "$3 <$2>")#ge'
returns:
:ctanpkg:`a4`
Now, I need to add at the end of this substitution as many spaces as needed in order the output string is of the same length as the input one.
How could I achieve such a result?
I take it that the outstanding issue, drawing a "bounty," is the length of the repalcement string.
The problem there is the presence of Unicode characters, which are taken as bytes -- print the length of the input string and it's 19, while there are 16 (sixteen) characters in it; that's the μ and the two ¢, each needing two bytes.
So add flags for Unicode handling, to decode those octets into logical characters. Also add the utf8 pragma (via -Mutf8 below) so that our literal characters in the source are interpreted correctly
echo "μctanpkg>a4&a4¢¢" | perl -CSA -Mutf8 -wlnE'
say;
s{μ(.+?)>(.+?)(?:&(.+?))?¢¢}{
$r = sprintf(":%-s:`%s`", $1, ($3 eq "" or $2 eq $3) ? $2 : "$3 <$2>");
"=" x (length($_) - length($r)) . $r }ge;
say'
Can be copy-pasted as is, with multiple lines (in bash and some other shells) or used on one line. This is with -n in order to print at will; remove the prints and change to -p as needed.
I used = for alignment so that it is easy to see, change as needed. If the replacement winds up longer then the repetition operator (x) prints a warning and returns an empty string; we can do without that warning of course but there is a question of design.
In an example in a comment the replacement string indeed ends up longer, so we need to pad the original instead, for alignment. But in the question only the replacement is printed, presumably the original being printed elsewhere -- from a program that runs this one-liner? So we need a clarification for the desired behavior of this program in such a case.
For one, then one surely won't add to the replacement string, so the statement
"=" x (length($_) - length($r)) . $r
should rather turn into something to the effect of
$shrank_by = length($_) - length($r);
( $shrank_by > 0 ? "=" x $shrank_by : "" ) . $r
You can use a second sprintf to do that, with a pattern that uses left-justify. The length can be passed in as an argument. The pattern for that is as follows.
printf "<%-*s>", 6, "a"; # prints "<a >"
You can get the length of the last successful match with $&, as documented in perlvar.
perl -pe 's#μ(.+?)>(.+?)(?:\&(.+?))?¢¢# sprintf("%-*s", length($&), sprintf(":%s:`%s`", $1, ($3 eq "" or $2 eq $3) ? $2 : "$3 <$2>"))#ge'
With your example input, this isn't visible as it's using spaces. But if we add extra <> to the pattern, you can see that it's working.
# V V
$ echo "μctanpkg>a4&a4¢¢" | perl -pe 's#μ(.+?)>(.+?)(?:\&(.+?))?¢¢# sprintf("<%-*s>", length($&), sprintf(":%s:`%s`", $1, ($3 eq "" or $2 eq $3) ? $2 : "$3 <$2>"))#ge'
<:ctanpkg:`a4` >
This will also work for lines with multiple matches.
echo "μctanpkg>a4&a4¢¢ μctanpkg>a4&a4¢¢" | perl -pe 's#μ(.+?)>(.+?)(?:\&(.+?))?¢¢# sprintf("<%-*s>", length($&), sprintf(":%s:`%s`", $1, ($3 eq "" or $2 eq $3) ? $2 : "$3 <$2>"))#ge'
<:ctanpkg:`a4` > <:ctanpkg:`a4` >
Please note that $& used to incur a performance penalty in older Perls. This was fixed in 5.20. Even if you are on a version below that, your use-case probably isn't impacted very much.

Regex to match a line in a multi-lined string in Perl

I have the following code:
use Capture::Tiny qw(capture);
my $cmd = $SOME_CMD;
my ($stdout, $stderr, $exit_status) = capture { system($cmd); };
unless ($exit_status && $stdout =~ /^Repository:\s+(.*)/) {
my $name = $1;
}
It run the $cmd and tries to parse the output. The output looks like:
Information for package perl-base:
Repository: #System
Name: perl-base
Version: 5.10.0-64.81.13.1
For some reason $name is empty probably because it could not group due to multi-lined string. I also tried /^Repository:\s+(.*)/s and /^Repository:\s+(.*)$/ but it didn't work as well.
I want the $name to have #System. How can I do it?
I believe you want the multiline m flag:
use strict;
use warnings;
my $s = 'Information for package perl-base:
Repository: #System
Name: perl-base
Version: 5.10.0-64.81.13.1';
$s =~ /^Repository:\s+(.*)/m;
print $1; # => #System
You can make your regex more accurate with $ to anchor the end of line and + instead of \s+: /^Repository: +(.*)$/m.
$name is empty because it is declared inside a block, which means it is out of scope outside that block. You would know this if you had used use strict, which does not allow you to access undeclared variables.
What you need to do is to declare the variable outside the block:
my $name; # declared outside block
unless ($exit_status && $stdout =~ /^Repository:\s+(.*)/m) {
$name = $1;
}
print "Name is: $name\n"; # accessible outside the block
Also, you need to remove the beginning of line anchor ^, or add the /m modifier.
First, the logic of that unless statement is broken, as it short-circuts on success:
unless ($exit_status && $stdout =~ /^Repository:\s+(.*)/) { ... }
is just a syntactic "convenience" for
if (not ($exit_status && $stdout =~ /^Repository:\s+(.*)/) ) { ... }
So if the command ran successfully and $exit_status is falsey (0 for success) then the &&-ed condition is false right there, and so it short-circuits since it is already decided.† Thus the regex never runs and $1 stays undef.
But it gets worse: if $exit_status were a positive number and the regex matches (quite possible), then the &&-ed condition is true and with not the whole if is false so you don't get its block to run! While there was valid output from the command (since regex matched).
So I'd suggest to disentangle those double-negatives, for something like
if ( $exit_status==0 and $stdout =~ /.../m ) { ... } # but see text
Then there must be an elsif ($exit_status) to interrogate further. But a command may return an exit code as it pleases, and some return non-zero merely to communicate specifics even when they ran successfully! So better break that up, to get to see everything, like
if ($exit_status) { ... } # interrogate
if ($stdout =~ /.../m) { ... } # may have still ran fine even with exit>0
The moral here, if I may emphasize, is about dangers of convoluted code, combined logical negatives, meaningful evaluations inside composite conditions, and all that.
Next, as mentioned, the regex attempts to match a pattern in a multiline string while it uses the anchor ^ -- which anchors the pattern to the beginning of the whole string, not to a line within, as clearly intended; so it would not match the shown text.
With the modifier /m added the behavior of the anchor ^ is changed so to match the beginning of lines within a string.
† If this gets one's head spinning consider the equivalent
if ( (not $exit_status) or (not $stdout =~ /^Repository:\s+(.*)/) ) { ...
With falsey $exit_status the first (not $exit_status) is true so the whole if is true right there and the second expression need not be evaluated and so it isn't (in Perl)
Try it with a one-liner
perl -wE'if ( 0 and do { say "hi" } ) { say "bye" }'
This doesn't print anything; no hi nor bye. With 0 the whole condition is certainly false so the do block isn't evaluated, and the if's block isn't either.
If we change and to or though (or 0 to 1), then the first condition (0) doesn't decide yet and the second condition is evaluated, so hi is printed. That condition is true (printing statements normally return 1) and so bye prints, too.

Perl do substitution in substitution itself

I was doing some regex substitution operation with the html snippet using Perl.
This is how I match the wanted part: (class="p_hw"><a href=")(http://[^<>"]*?xxxx\.com\/[^<>"]*[=/])([^<>"]*)(">(?:<b>)?)(.*?)(?=<)
I need to replace the http:// with entry:// followed by certain parameter value of the http url($3 for that matter) if that value exists in a hash(%hw_f), or else the first word(or phrase) from $5 will be used when it exists in %hw_f. If all conditions are not matched, the snippet will stay unchanged.
I have tried the following:
s#(class="p_hw"><a href=")(http://[^<>"]*?xxxx\.com\/[^<>"]*[=/])([^<>"]*)(">(?:<b>)?)(.*?)(?=<)#
my #n = split(/\,|;/, $5);
my #m = map {s,^\s+|\s+$,,mgr} #n;
my $new = $3 =~ s/^\s+|\s+$//mgr;
my $new2 = $new =~ s/\+/ /mgr;
exists $hw_f{$new2} ? "$1entry://$new2$4$5" : (exists $hw_f{$m[0]} ? "$1entry://$m[0]$4$5" : "$1$2$3$4$5") #eg;
%hw_f is where all conditions will be matched against.
It gives the following error:
Use of uninitialized value $1 in concatenation (.) or string
I need to obtain a new value based on $3 within the substitution, continue with that new value. How could I do that?
I'm not going to try to really fix the logic of what you're trying to accomplish because it's rather ill advised. What I will do is offer some semantic and coding advice.
1: Use Regexp::Common and URI to deal with URLs. It is almost never worth it to write your own regexes. Parsing HTML with regex requires that you seriously know what you're doing. https://metacpan.org/search?q=regexp%3A%3Acommon
2: Always only use {} and // to wrap regex. (A 99% rule)
3: Always immediately copy the numbered variables into meaningfully named my() variables unless the expression is trivial.
4: Modify arrays inplace with postfix foreach.
5: Spread out the code formatting to make it visually appealing.
6: Use sprintf for complicated variable recombinations. It makes it a lot easier to see what variable is used where and for what.
HTH
# 1 2 3 4 5
s{(class="p_hw"><a href=\")(http://[^<>"]*?xxxx\.com/[^<>"]*[=/])([^<>\"]*)(\">(?:<b>)?)(.*?)(?=<)}{
my ($m1, $m2, $m3, $m4, $m5) = ($1, $2, $3, $4, $5);
my #n = split /[,|;]/, $m5;
s/^\s+|\s+$//mg foreach #n;
(my $new = $m3) =~ s/^\s+|\s+$//mg;
(my $new2 = $new) =~ s/\+/ /g;
exists $hw_f{$new2} ?
sprintf "%sentry://%s%s%s", $m1, $new2, $m4, $m5 :
exists $hw_f{$n[0]} ?
sprintf "%sentry://%s%s%s", $m1, $n[0], $m4, $m5 :
"$m1$m2$m3$m4$m5";
}ige;
Update:
while (<DICT>) {
s#(class="p_hw"><a href=")(http://[^<>"]*?wordinfo\.info\/[^<>"]*[=/])([^<>"]*)(">(?:<b>)?)(.*?)(?=<)#
my $one = $1;
my $two = $2;
my $three = $3;
my $four = $4;
my $five = $5;
my #n = split(/\,|;/, $5);
my #m = map {s,^\s+|\s+$,,mgr} #n;
my $new = $3 =~ s/^\s+|\s+$//mgr;
my $new2 = $new =~ s/\+/ /mgr;
exists $hw_f{$new2} ? $one."entry://$new2$four$five" : (exists $hw_f{$m[0]} ? $one."entry://$m[0]$four$five" : "$one$two$three$four$five") #eg;
print $FH $_;
}
Assigning all the capture variables before all the regex engine invocation as #DavidO in the comment mentioned, it finally works. Thanks.
from your post it is not obvious what you try to achieve. If you would describe the problem in following format it would be easier to understand
--- Example -----------------------
I extract from web page a snippet with <a href="http:\\....... which I would like to convert/transform into following format <a href="http:\\........
At least in this way we know what is INPUT and what OUTPUT expected.
--- End of the example ------------
When you apply regex with memory it is easier to store remembered values in an array or better hash
use strict;
use warnings;
use Data::Dumper;
my %href;
$data = shift;
if( $data =~ /<a href="(\w+):\\\\([\w\d\.]+)\\([\w\d\.]+)\\(.+)">([^<]+)</ ) {
#href{qw(protocol dns dir rest desc)} = ($1,$2,$3,$4,$5);
print Dumper(\%href);
} else {
print "No match found\n";
}

Perl regex strange behaviour

Method 1:
$C_HOME = "$ENV{EO_HOME}\\common\\";
print $C_HOME;
gives C:\work\System11R1\common\
ie The environment variable is getting expanded.
Method 2:
Parse properties file having
C_HOME = $ENV{EO_HOME}\common\
while(<IN>) {
if(m/(.*)\s+=\s+(.*)/)
{
$o{$1}=$2;
}
}
$C_HOME = $o{"C_HOME"};
print $C_HOME;
This gives a output of $ENV{EO_HOME}\common\
ie The environment variable is not getting expanded.
How do I make sure that the environment variable gets expanded in the second case also.
The problem is in the line:
$o{$1}=$2;
Of course perl will not evaluate $2 automatically as it read it.
If you want, you can evaluate it manually:
$o{$1}=eval($2);
But you must be sure that it is ok from security point of view.
the value of $o{C_HOME} contains the literal string $ENV{C_HOME}\common\. To get the $ENV-value eval-ed, use eval...
$C_HOME = eval $o{"C_HOME"};
I leave it to you to find out why that will fail, however...
Expression must be evaluated:
$C_HOME = eval($o{"C_HOME"});
Perl expands variables in double-quote-like code strings, not in data.
You have to eval a string to explicity interpolate variables inside it, but doing so without checking what you are passing to eval is dangerous.
Instead, look for everything you may want to interpolate inside the string and eval those using a regex substitution with the /ee modifier.
This program looks for all references to elements of the %ENV hash in the config value and replaces them. You may want to add support for whitespace wherever Perl allows it ($ ENV { EO_HOME } compiles just fine). It also assigns test values for %ENV which you will need to remove.
use strict;
use warnings;
my %data;
%ENV = ( EO_HOME => 'C:\work\System11R1' );
while (<DATA>) {
if ( my ($key, $val) = m/ (.*) \s+ = \s* (.*) /x ) {
$val =~ s/ ( \$ENV \{ \w+ \} ) / $1 /gxee;
$data{$key} = $val;
}
}
print $data{C_HOME};
__DATA__
C_HOME = $ENV{EO_HOME}\common\
output
C:\work\System11R1\common\

Regex range operator

I have a string '11 15 '. W/ a Regex I then compare the values within that string, in this case 11 and 15 (could be any number of digits but I'll keep it simple with 2 2-digit numbers).
For each of those numbers, I then see if it matches any of the numbers I want; in this case I want to see if the number is '12', '13', or '14'. If it is, then I change the value of '$m':
my $string = '11 15 ';
while ( $string =~ /([0-9]{1,})\s+/ig ) {
my $m = $1;
print $m . ".....";
$m = 'change value' if $m =~ /[12...14]{2,}/g;
print $m . "\n";
}
Produces:
11.....change value
15.....15
'15' stays the same, as it should. But '11' changes. What am I doing wrong?
[12...14] matches against "1", "2", ".", and "4". "11" Matches that; "15" doesn't. If you're just matching against numbers, you shouldn't be using regular expressions. Change your line to the following:
$m = 'change value' if $m ~~ [11..14];
Or, if unable to guarantee perl >= v5.10:
$m = 'change value' if grep { $m == $_ } 11..14;
You've misunderstood the regular expression. Where you've written [12...14]{2,}, this means "match 2 or more of the characters 1 or 2 or dot or dot or dot or dot or 1 or 4".
Try something like:
$m='change value' if $m=~/(\d{2,})/ and $1 >= 12 and $1 <= 14;
In a substitution operation, this could be written as:
$m =~ s/(\d{2,})/ $1 >= 12 && $1 <= 14 ? 'change value' : $1/ge;
That is, capture 2 or more digits and then test what you have captured to see if they're what you want to change by using perl code in the replacement section of the substitution. The e modifier indicates that Perl should evaluate the replacement as Perl code.
Let's rewrite your code a bit:
my $string = '11 15 ';
while ( $string =~ /(\d+)/g ) {
I've changed your while statement's regular expression. You can use \d+ to represent one or more digits, and that's easier to understand than [0-9]{1,}. You also (since a space won't match \d) don't need the last space on the end of your string.
Let's look at the rest of the code:
my $string = '11 15';
while ( $string =~ /(\d+)/g ) {
my $match = $1;
print "$match.....";
if ($match >= 12 and $match <= 14) { #if ($match ~~ [12..14]) for Perl > 5.10
print 'change value\n';
}
else {
print "$match\n";
}
}
You can't use a regular expression the way you are to test for range.
Instead, use the regular range test of
if ($match >= 12 and $match <= 14)
or the newer group test:
if ($match ~~ [12..14]) #Note only two dots and not three!
That last one only works in newer versions of Perl like 5.12 I have on my Mac, and 5.14 I have on my Linux box, but not the Perl 5.8 I have on my Solaris box).
A few tips:
Use indents and spaces. It makes your code more readable.
Use descriptive names for variables. Instead of $m, I used $match.
Don't use the appended if statements. The appended if is harder to spot, so you might miss something important, and it makes your code harder to update. It can be used if the statement itself is clear and simple, and it improves readability. The last is a bit subjective, but you'll commonly see appended if statements in things like return if not -f $file;.
Keep variables single purpose. In this case, instead of changing the value of $match, I used an if/else statement. Imagine if your code was a bit more complex, and someone had to add in a new feature. They see the $match variable and think this is what they need. Unfortunately, you changed what $match is. It's now a value to be printed out and not the string match. It might take the person who changed your program quite a while to figure out what happened to the value of $match and why it has bee mysteriously set to changed value.
In the print statement, you can include variables inside of double quotes. This is very different from almost all other languages. This is because Perl variable use sigils to mark variable names. It usually makes it easier to read if your combine variables and other strings in a single string.
For example:
print "The range of possible values are $low to $high\n";
vs.
print "The range of possible values are " . $low . " to " . $high . "\n";
Notice how in the second example, I had to be careful of spaces inside the quotes while in the first example, the required spaces came rather naturally. Imagine having to change that statement in a later version of the program. Which would be easier to maintain?