Breaking out of loop causes different results [duplicate] - regex

This question already has answers here:
Why doesn't Perl's each() iterate through the entire hash the second time?
(2 answers)
Closed 7 years ago.
I have this code:
#!/usr/bin/perl
use strict;
use warnings;
my $judge_exes = {
"^A" => "foo",
"^B" => "bar",
"^C" => "baz",
};
sub get_judge {
my ($test_id) = #_;
my $exe = undef;
while (my ($regex, $judge) = each %$judge_exes) {
if ($test_id =~ /$regex/) {
$exe = $judge;
last;
}
}
if ($exe) {
return $exe;
} else {
return "Undefined!";
}
}
print get_judge("A1");
print get_judge("B2");
print get_judge("C3");
(ideone: http://ideone.com/slxebG)
I expect to get the output foobarbaz, but I end up getting fooUndefined!baz. However, when I comment out the last statement, I get the correct behavior:
#!/usr/bin/perl
use strict;
use warnings;
my $judge_exes = {
"^A" => "foo",
"^B" => "bar",
"^C" => "baz",
};
sub get_judge {
my ($test_id) = #_;
my $exe = undef;
while (my ($regex, $judge) = each %$judge_exes) {
if ($test_id =~ /$regex/) {
$exe = $judge;
# last;
}
}
if ($exe) {
return $exe;
} else {
return "Undefined!";
}
}
print get_judge("A1");
print get_judge("B2");
print get_judge("C3");
(ideone: http://ideone.com/QJpxbK)
Why is this happening? (I'm on Perl 5.16.2, but the issue is also present on 5.10.1 and whatever ideone is using.)
As far as I understand, last just breaks out of the while loop, which is what I want.
$exe doesn't seem to be an alternative falsey value causing me to hit the wrong if-branch. (I could return early, which would be better, but I still wouldn't understand the cause of this.)
I think I'm dereferencing the hash reference correctly.
I didn't think that regex matching would have any side effects relating to loop termination.
The regexes seem to be matching correctly, since I can actually get them to match appropriately under some circumstances, so I don't think it's an interpolation issue.
I'm not modifying the container that I'm iterating over.
Am I just making some silly non-Perl related mistake?

You have become confused. I am not sure where the confusion lies, but you must be aware that the each operator maintains its state between calls. That means your last inside the while loop won't terminate the iteration. Instead it will continue where it last left off when it is next encountered.
You should also be sparing with double-quotes, as they interpolate any scalar or array variables, or backslashed control characters.
Here is how I suggest that you write your algorithm
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my $judge_exes = {
'^A' => 'foo',
'^B' => 'bar',
'^C' => 'baz',
};
say get_judge('A1', $judge_exes);
say get_judge('B2', $judge_exes);
say get_judge('C3', $judge_exes);
say get_judge('D4', $judge_exes);
say get_judge('E5', $judge_exes);
sub get_judge {
my ($test_id, $judges) = #_;
for my $re ( keys %$judges ) {
return 1 if $test_id =~ /$re/;
}
'Undefined!';
}
output
1
1
1
Undefined!
Undefined!

Oh, it looks like the behavior is explained here:
Using keys %hash in scalar context returns the number of keys in the hash and resets the iterator associated with the hash. You may need to do this if you use last to exit a loop early so that when you re-enter it, the hash iterator has been reset.
Wasn't expecting that — I had assumed that each would reset the iterator.

Related

How can I match these function calls, and extract the nmae of the function and the first argument?

I am trying to parse an array of elements. Those who match a pattern like the following:
Jim("jjanson", Customer.SALES);
I want to create a hash table like Jim => "jjanson"
How can I do this?
I can not match the lines using:
if($line =~ /\s*[A-Za-z]*"(.*),Customer.*\s*/)
You're not matching either the '(' after the name, nor the ' ' after the comma, before "Customer.".
I can get 'jjanson"' using this expression:
/\s*[A-Za-z]\(*"(.*), Customer.*\s*/
But I assume you don't want jjanson", so we need to modify it like so. (I tend to include the negative character class when I'm looking for simply-delimited stuff. So, in this case I'll make the expression "[^"]*"
/\s*[A-Za-z]\(*"([^"]+)", Customer.*\s*/
Also, I try not to depend upon whitespace, presence or number, I'm going to replace the space with \s*. That you didn't notice that you skipped the whitespace is a good illustration of the need to say "ignore a bunch of whitespace".
/\s*[A-Za-z]\(*"([^"]+)",\s*Customer.*\s*/
Now it's only looking for the sequence ',' + 'Customer' in the significant characters. Functionally, the same, if more flexible.
But since you only do one capture, I can't see what you'd map to what. So I'll do my own mapping:
my %records;
while ( my $line = $source->()) { # simply feed for a source of lines.
my ( $first, $user, $tag )
= $line = m/\s*(\p{Alpha}+)\s*\(\s*"([^"]+)",\s*Customer\.(\S+?)\)\/
;
$records{ $user }
= { first => $first
, username => $user
, tag => $tag
};
}
This is much more than you would tend to need in a one-off, quick solution. But I like to store as much of my input as seems significant.
Note that Jim("jjanson", Customer.SALES); matches the syntax of a function call with two arguments. You can thus abuse string eval:
#!/usr/bin/env perl
use strict;
use warnings;
use YAML::XS;
my $info = extract_first_arg(q{ Jim("jjanson", Customer.SALES);} );
print Dump $info;
sub extract_first_arg {
my $call = shift;
my ($name) = ($call =~ m{ \A \s* (\w+) }x);
unless ($name) {
warn "Failed to find function name in '$call'";
return;
}
my $username = eval sprintf(q{
package My::DangerZone;
no strict;
local *{ %s } = sub { $_[0] };
%s
}, $name, $call);
return { $name => $username };
}
Output:
---
Jim: jjanson
Or, you can abuse autoloading:
our $AUTOLOAD;
print Dump eval 'no strict;' . q{ Jim("jjanson", Customer.SALES); };
sub AUTOLOAD {
my ($fn) = ($AUTOLOAD =~ /::(\w+)\z/);
return { $fn => $_[0] };
}
I would not necessarily recommend using these methods, especially on input that is not in your control, and in a situation where this script has access to sensitive facilities.
On the other hand, I have, in the right circumstances, utilized this kind of thing in transforming one given set of information into something that can be used elsewhere.
Try this:
$line = 'Jim("jjanson", Customer.SALES)';
my %hashStore = (); #Jim("jjanson"
if($line=~m/^\s*([^\(\)]*)\(\"([^\"]*)\"/g) { $hashStore{$1} = $2; }
use Data::Dumper;
print Dumper \%hashStore;
Output:
$VAR1 = {
'Jim' => 'jjanson'
};

Regex value comparison

I want to compare two numbers isolated from this sample data:
'gi|112807938|emb|CU075707.1|_Xenopus_tropicalis_finished_cDNA,_clone_TNeu129d01 C1:TCONS_00039972(XLOC_025068),_12.9045:32.0354,_Change:1.3118,_p:0.00025,_q:0.50752 C2:TCONS_00045925(XLOC_029835),_10.3694:43.8379,_Change:2.07985,_p:0.0004,_q:0.333824',
'gi|115528274|gb|BC124894.1|_Xenopus_laevis_islet-1,_mRNA_(cDNA_clone_MGC:154537_IMAGE:8320777),_complete_cds C1:TCONS_00080221(XLOC_049570),_17.9027:40.8136,_Change:1.18887,_p:0.00535,_q:0.998852 C2:TCONS_00092192(XLOC_059015),_17.8995:35.5534,_Change:0.990066,_p:0.0355,_q:0.998513',
'gi|118404233|ref|NM_001078963.1|_Xenopus_(Silurana)_tropicalis_pancreatic_lipase-related_protein_2_(pnliprp2),_mRNA C1:TCONS_00031955(XLOC_019851),_0.944706:5.88717,_Change:2.63964,_p:0.01915,_q:0.998852 C2:TCONS_00036655(XLOC_023660),_2.31819:11.556,_Change:2.31757,_p:0.0358,_q:0.998513',
using the following regex:
#!/usr/bin/perl -w
use strict;
use File::Slurp;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
my (#log_change, #largest_change);
foreach (#intersect) {
chomp;
my #condition1_match = ($_ =~ /C1:.*?Change:(-?\d+\.\d+)|C1:.*?Change:(-?inf)/); # Sometimes the value is 'inf' or '-inf'. This allows either a numerical or inf value to be captured.
my #condition2_match = ($_ =~ /C2:.*?Change:(-?\d+\.\d+)|C2:.*?Change:(-?inf)/);
push #log_change, "#condition1_match\t#condition2_match";
}
print Dumper (\#log_change);
Which gives this output:
'1.3118 2.07985 ',
'1.18887 0.990066 ',
'2.63964 2.31757 ',
Ideally, within the same loop I now want to make a comparison between the values held in #condition1_match and #condition2_match such that the larger value is pushed onto a new array, unless comparing against a non numerical 'inf' in which case push the numerical value.
Something like this:
my (#log_change, #largest_change);
foreach (#intersect) {
chomp;
my #condition1_match = ($_ =~ /C1:.*?Change:(-?\d+\.\d+)|C1:.*?Change:(-?inf)/);
my #condition2_match = ($_ =~ /C2:.*?Change:(-?\d+\.\d+)|C2:.*?Change:(-?inf)/);
push #log_change, "#condition1_match\t#condition2_match";
unless ($_ =~ /Change:-?inf/) {
if (#condition1_match > #condition2_match) {
push #largest_change, #condition1_match;
}
else {
push #largest_change, #condition2_match;
}
}
}
print Dumper (\#largest_change);
Which gives:
'2.07985',
undef,
'0.990066',
undef,
'2.31757',
undef,
as well as a lot of this error message:
Use of uninitialized value $condition2_match[1] in join or string at intersect.11.8.pl line 114.
I'm unsure as to what exactly the error message means, as well as why I'm getting undef values in my #largest_change
As you've written your code, #condition_match1 and #condition_match2 will be created with 2 elements -- corresponding to the 2 capture groups in your regular expression -- each time there is a match. But one of these elements will always necessarily be undef, leading to the uninitialized ... warnings.
In this case, you can repair this program by putting the | inside the capture group:
my ($condition1_match) = ($_ =~ /C1:.*?Change:(-?\d+\.\d+|-?inf)/);
my ($condition2_match) = ($_ =~ /C2:.*?Change:(-?\d+\.\d+|-?inf)/);
so that there is a single capture group and the matching operation produces a list with a single, defined element.
In addition, the comparison
if (#condition1_match > #condition2_match) {
is probably not doing what you think it is doing. In Perl, a numerical comparison between two arrays is a comparison of array lengths. What you apparently mean to do is to compare the defined value in each of those arrays, so you would need to do something more cumbersome like:
my $condition1_match = $condition1_match[0] // $condition1_match[1];
my $condition2_match = $condition2_match[0] // $condition2_match[1];
if ($condition1_match > $condition2_match) {
push #largest_change, $condition1_match;
} else {
push #largest_change, $condition2_match;
}

Perl Inserting string into a url at specific places

I have the following piece of code:
#!/usr/bin/perl
use strict;
use warnings;
#use diagnostics;
use URI qw( );
my #insert_words = qw(HELLO GOODBYE);
while (<DATA>) {
chomp;
my $url = URI->new($_);
my $query = $url->query;
foreach (#insert_words) {
# Use package vars to communicate with /(?{})/ blocks.
local our $insert_word = $_;
local our #queries;
if (defined $query) {
$query =~ m{
^(.*[/=])([^/=&]*)((?:[/=&].*)?)\z
(?{
if (length $2) {
push #queries, "$1$insert_word$2$3";
push #queries, "$1$insert_word$3";
push #queries, "$1$2$insert_word$3";
}
})
(?!)
}x;
}
if (#queries) {
for (#queries) {
$url->query($_);
print $url, "\n";
}
}
else {
print $url, "\n";
}
}
}
__DATA__
http://www.example.com/index.php?route=9&other=7
The above piece of code works correctly and produces the following output:
http://www.example.com/index.php?route=9&other=HELLO7
http://www.example.com/index.php?route=9&other=HELLO
http://www.example.com/index.php?route=9&other=7HELLO
http://www.example.com/index.php?route=HELLO9&other=7
http://www.example.com/index.php?route=HELLO&other=7
http://www.example.com/index.php?route=9HELLO&other=7
http://www.example.com/index.php?route=9&other=GOODBYE7
http://www.example.com/index.php?route=9&other=GOODBYE
http://www.example.com/index.php?route=9&other=7GOODBYE
http://www.example.com/index.php?route=GOODBYE9&other=7
http://www.example.com/index.php?route=GOODBYE&other=7
http://www.example.com/index.php?route=9GOODBYE&other=7
As you can see it inserts the words in the array at specific places in the url.
What I am now having problems with:
I would now like to add the functionality to do all the possible combinations of HELLO and GOODBYE (or whatever is in the #insert_words) as well, for example it should also add the following url's to the output I already get:
http://www.example.com/index.php?route=HELLO&other=GOODBYE
http://www.example.com/index.php?route=HELLO&other=HELLO
http://www.example.com/index.php?route=GOODBYE&other=HELLO
http://www.example.com/index.php?route=GOODBYE&other=GOODBYE
But I do not know how to go about this in the best way?
Your help with this will be much appreciated, many thanks
Please don't use fancy regexes like that - they are an experimental feature of Perl and are far from simple to comprehend.
If I understand you then you need to do this recursively.
I think you want all variations of the URL with each query parameter as it is, or preceded, succeeded, or replaced by every value in #insert_words.
This seems to do what you ask. It uses URI::QueryParam to split up the query portion of the URL properly instead of using your nasty regex. It does produce substantially more combinations than you show in your question but I can see no other way of interpreting your requirement.
The number of possible variations is 49. Each parameter can have its original value, or be preceded, succeeded or replaced by either of two values. That is seven possible values for each parameter and so 7² or 49 different variations for two parameters.
use strict;
use warnings;
use URI;
use URI::QueryParam;
my #insert_words = qw/ HELLO GOODBYE /;
my #urls;
sub mod_param {
my ($url, $paridx, #insertions) = #_;
my #params = $url->query_param;
return if $paridx > $#params;
my $key = $params[$paridx];
my $oldval = $url->query_param($key);
my #variations = ($oldval);
push #variations, ($oldval.$_, $_.$oldval, $_) for #insertions;
for my $val (#variations) {
$url->query_param($key, $val);
if ($paridx == $#params) {
push #urls, "$url";
}
else {
mod_param($url, $paridx + 1, #insertions);
}
}
$url->query_param($key, $oldval);
}
while (<DATA>) {
chomp;
my $url = URI->new($_);
#urls = ();
mod_param($url, 0, #insert_words);
print $_, "\n" for #urls;
}
__DATA__
http://www.example.com/index.php?route=9&other=7
output
http://www.example.com/index.php?route=9&other=7
http://www.example.com/index.php?route=9&other=7HELLO
http://www.example.com/index.php?route=9&other=HELLO7
http://www.example.com/index.php?route=9&other=HELLO
http://www.example.com/index.php?route=9&other=7GOODBYE
http://www.example.com/index.php?route=9&other=GOODBYE7
http://www.example.com/index.php?route=9&other=GOODBYE
http://www.example.com/index.php?route=9HELLO&other=7
http://www.example.com/index.php?route=9HELLO&other=7HELLO
http://www.example.com/index.php?route=9HELLO&other=HELLO7
http://www.example.com/index.php?route=9HELLO&other=HELLO
http://www.example.com/index.php?route=9HELLO&other=7GOODBYE
http://www.example.com/index.php?route=9HELLO&other=GOODBYE7
http://www.example.com/index.php?route=9HELLO&other=GOODBYE
http://www.example.com/index.php?route=HELLO9&other=7
http://www.example.com/index.php?route=HELLO9&other=7HELLO
http://www.example.com/index.php?route=HELLO9&other=HELLO7
http://www.example.com/index.php?route=HELLO9&other=HELLO
http://www.example.com/index.php?route=HELLO9&other=7GOODBYE
http://www.example.com/index.php?route=HELLO9&other=GOODBYE7
http://www.example.com/index.php?route=HELLO9&other=GOODBYE
http://www.example.com/index.php?route=HELLO&other=7
http://www.example.com/index.php?route=HELLO&other=7HELLO
http://www.example.com/index.php?route=HELLO&other=HELLO7
http://www.example.com/index.php?route=HELLO&other=HELLO
http://www.example.com/index.php?route=HELLO&other=7GOODBYE
http://www.example.com/index.php?route=HELLO&other=GOODBYE7
http://www.example.com/index.php?route=HELLO&other=GOODBYE
http://www.example.com/index.php?route=9GOODBYE&other=7
http://www.example.com/index.php?route=9GOODBYE&other=7HELLO
http://www.example.com/index.php?route=9GOODBYE&other=HELLO7
http://www.example.com/index.php?route=9GOODBYE&other=HELLO
http://www.example.com/index.php?route=9GOODBYE&other=7GOODBYE
http://www.example.com/index.php?route=9GOODBYE&other=GOODBYE7
http://www.example.com/index.php?route=9GOODBYE&other=GOODBYE
http://www.example.com/index.php?route=GOODBYE9&other=7
http://www.example.com/index.php?route=GOODBYE9&other=7HELLO
http://www.example.com/index.php?route=GOODBYE9&other=HELLO7
http://www.example.com/index.php?route=GOODBYE9&other=HELLO
http://www.example.com/index.php?route=GOODBYE9&other=7GOODBYE
http://www.example.com/index.php?route=GOODBYE9&other=GOODBYE7
http://www.example.com/index.php?route=GOODBYE9&other=GOODBYE
http://www.example.com/index.php?route=GOODBYE&other=7
http://www.example.com/index.php?route=GOODBYE&other=7HELLO
http://www.example.com/index.php?route=GOODBYE&other=HELLO7
http://www.example.com/index.php?route=GOODBYE&other=HELLO
http://www.example.com/index.php?route=GOODBYE&other=7GOODBYE
http://www.example.com/index.php?route=GOODBYE&other=GOODBYE7
http://www.example.com/index.php?route=GOODBYE&other=GOODBYE

Perl regexp use of uninitialized pattern patch

My script loads some stuff from some files in some arrays, you enter a text from the keyboard, the script searches the relevant part of the text in those arrays, if it finds it, it does something, if not, well, another thing, at least in theory.
I get the following errors:
Use of uninitialized value in pattern match (m//) at emo_full_dynamic.pl line 120, <STDIN> chunk 2.
Modification of a read-only value attempted at emo_full_dynamic.pl line 121, <STDIN> chunk 2.
line 120 = $plm3 =~ /arr_(\w+.txt)/;
My problem, I think, is at $plm3 =~ /arr_(\w+.txt)/;. I used it so that I can store the name of an array in $1.
Here's my code:
#!/usr/bin/perl
use warnings;
$idx = 0;
$oldsep = $/;
opendir(DIR, 'c:/downloads/text_files/arrs/');
#files = readdir(DIR);
while ($idx <= $#files )
{
$value = $files[$idx];
if ( $value !~ m/^arr/i)
{
splice #files, $idx, 1;
}
else
{
$idx++;
}
}
foreach $plm (#files)
{
if($plm =~ m/txt$/)
{
open(ARR, "C:/downloads/text_files/arrs/$plm") or die $!;
while(<ARR>)
{ {
chomp($_);
$plm =~ m/arr_(\w+).txt/;
push(#{$1}, $_);
}
close ARR;
}
}
$plm = 0;
$idx = 0;
$stare = <STDIN>;
chomp($stare);
while($stare)
{
foreach $plm2 (#files)
{
if($plm2 =~ m/txt$/)
{
$plm2 =~ m/arr_(\w+).txt/;
if(grep $stare =~ m/$_/i, #{$1})
{
$flag = 1;
}
else
{
$flag = 0;
}
}
}
if($flag == 1)
{
$/ = "%\n";
$plm3 =~ /arr_(\w+.txt)/;
open SUPARARE, "C:/downloads/text_files/replies/$1" or die $!;
etc etc....
First of all, it's always a good idea to use strict pragma -- unless you have a valid reason to avoid it --.
Second, I don't see $plm3 initialized anywhere in your code. You have probably forgot to initialize it.
I think you are assigning something to variable $1 on line 121
Apparently there are some copy/paste issues which negates my initial answer.
Other mistakes, great and small:
You don't use strict. (fatal flaw)
Your opendir is used once, then never closed.
You use global filehandles, instead of lexical (e.g. open my $fh, ...)
Using a complext loop + splice instead of grep (#files=grep /^arr/i, #files)
Using chomp($_) when chomp per default chomps the $_ variable
I don't even know what this line means:
if(grep $stare =~ m/$_/i, #{$1}) {
You seem to be using a pattern match, where $_ is the pattern (which in this case is.. what? Nothing? Anything?), whose return value is used as a grep pattern for an array reference, that may or may not be initialized. A very horrible statement. If it indeed works as intended, the readability is very low.
Redeclaring $/ seems like a frivolous thing to do in this context, but I can't really tell, as the script ends there.

Perl regex replacement string special variable

I'm aware of the match, prematch, and postmatch predefined variables. I'm wondering if there is something similar for the evaluated replacement part of the s/// operator.
This would be particularly useful in dynamic expressions so they don't have to be evaluated a 2nd time.
For example, I currently have %regexs which is a hash of various search and replace strings.
Here's a snippet:
while (<>) {
foreach my $key (keys %regexs) {
while (s/$regexs{$key}{'search'}/$regexs{$key}{'replace'}/ee) {
# Here I want to do something with just the replaced part
# without reevaluating.
}
}
print;
}
Is there a convenient way to do it? Perl seems to have so many convenient shortcuts, and it seems like a waste to have to evaluate twice (which appears to be the alternative).
EDIT: I just wanted to give an example: $regexs{$key}{'replace'} might be the string '"$2$1"' thus swapping the positions of some text in the string $regexs{$key}{'search'} which might be '(foo)(bar)' - thus resulting in "barfoo". The second evaluation that I'm trying to avoid is the output of $regexs{$key}{'replace'}.
Instead of using string eval (which I assume is what's going on with s///ee), you could define code references to do the work. Those code references can then return the value of the replacement text. For example:
use strict;
use warnings;
my %regex = (
digits => sub {
my $r;
return unless $_[0] =~ s/(\d)(\d)_/$r = $2.$1/e;
return $r;
},
);
while (<DATA>){
for my $k (keys %regex){
while ( my $replacement_text = $regex{$k}->($_) ){
print $replacement_text, "\n";
}
}
print;
}
__END__
12_ab_78_gh_
34_cd_78_yz_
I'm pretty sure there isn't any direct way to do what you're asking, but that doesn't mean it's impossible. How about this?
{
my $capture;
sub capture {
$capture = $_[0] if #_;
$capture;
}
}
while (s<$regexes{$key}{search}>
<"capture('" . $regexes{$key}{replace}) . "')">eeg) {
my $replacement = capture();
#...
}
Well, except to do it really properly you'd have to shoehorn a little more code in there to make the value in the hash safe inside a singlequotish string (backslash singlequotes and backslashes).
If you do the second eval manually you can store the result yourself.
my $store;
s{$search}{ $store = eval $replace }e;
why not assign to local vars before:
my $replace = $regexs{$key}{'replace'};
now your evaluating once.