Perl regex match ignoring order - regex

I am trying to compare a variable and another variable in regex form. If the contents of the variables are exactly the same, match happens fine but if the order of the values are different, I want the match to happen.
Example:
#!/usr/bin/perl
my $output = "test0 test1";
my $expected_output = "test1 test0";
my $expected_regex = qr/^$expected_output\s*$/;
print "Expected_regex :: $expected_regex\n";
if ($output =~ $expected_regex) {
print "pass\n";
}
In my example, what can I do to make $output = $expected_regex even though they contain the same values but not in the same order?

Assuming your inputs are really "that simple", i.e. words separated by spaces, you can do something like this:
#! /usr/bin/perl -w
use strict;
use warnings;
my $output = "test0 test1";
my $expected_output = "test1 test0";
# Store the sorted pieces of each string in a list
my #o = sort(split(/ /, $output));
my #e = sort(split(/ /, $expected_output));
# Compare both arrays for equality of each member
print "pass\n" if (#o ~~ #e);
See smart matching in detail for the funny ~~ operator.
If your inputs are not that simple, the / / in the splits could possibly be elaborated, or a similar technique could be derived.
If not, just keep the first two lines of this and put it in all your non-trivial scripts. That's sure to help you.

Related

How do I know if there is a number in my list?

My code looks like this:
#!/usr/bin/perl
$counter=0;
#list=<STDIN>;
chomp(#list);
if (#list==~ /^[+-]?\d+$/ )
{
$counter++;
}
print $counter;
So I write datas like: a b c d 1 2
And then it should print 2
beacuse of the 1,2
But it does not matter what datas i write into the list i get back 0.
So what is the problem with my if?
Always use strict; use warnings;. If your goal is to count the number of digit elements in your list, you can use grep to filter the list elements, then apply scalar to get its length (or use $length directly instead of storing the filtered list in #matches):
#!/usr/bin/perl
use strict;
use warnings;
my #list = <STDIN>;
chomp(#list);
my #matches = grep /^[+-]?\d+$/, #list;
print scalar #matches . "\n";
Sample run:
$ ./count.pl
-62
a
b
4
c
+91023
d
3
Honestly, it looks like you just guessed at this syntax. That's not really a great way to write a program :-)
Your main problems are on this line:
if (#list==~ /^[+-]?\d+$/ )
There are two pretty big problems here. Firstly, the #list. The match operator (=~) works on a single string at a time. So you need to use it on a scalar value. And if you give it an array (as you have done here) then Perl will (silently) evaluate the array as a scalar - which means that rather than getting the contents of the array, you'll get the number of elements in the array - which will be an integer so your regex will always match.
But, you say, it doesn't match. Yes, I realise that. And that's down to your second error - you have the match operator wrong. The operator is =~ and you're using ==~ (see the extra =). You would hope that an error like that cause a syntax error, but you've accidentally used a version that is syntactically valid but just doesn't do what you want. Perl interprets your code as:
if (#list = =~ /^[+-]?\d+$/ )
Note the spaces I've added. The one between the two = characters is important. This is saying "perform the match operation and assign the results to #list". But what is the match operator matching against? Well, it hasn't been given an explicit variable to match against, and in that case it matches against the default variable, $_. You haven't put anything into $_ so the match fails.
At this point, I should point out that if you had use warnings in your code, then you would be getting all sorts of useful warnings about what you're doing wrong. All Perl programmers (including the most experienced ones) should always have use strict and use warnings in their code.
There's another point of confusion here. It's the way you read your input.
#list = <STDIN>;
You haven't made it clear, but I suspect that you type in your list all in one line. In that case, you don't want to store your input in an array; you should store it in a scalar.
chomp($list = <STDIN>);
You can then convert it to a list (to be stored in an array) using split().
#list = split /\s+/, $list;
You can then get the count of numbers in your array using grep.
my $count = grep { /^[-+]\d+$/ } #list;
When evaluated as a scalar, grep returns the number of times the block of code was true.
Putting those together (and adding strict and warnings) we get the following:
#!/usr/bin/perl
use strict;
use warnings;
chomp(my $list = <STDIN>);
my $count = grep { /^[-+]\d+$/ } split /\s+/, $list;
Which, to my mind at least, looks simpler than your version.

Search two perl regex and store in one declared variable

So this is the scenario:
I perform a perl regex search on a string and store it in a variable
I then want to retrieve a substring from that variable and save it under the same variable
When I try with two separate variables it works, but I would like to minimies my variable declaration. What currently works:
my ($temp1) = ($buildLog =~ /(build\_version.*\d*)/);
my ($buildVersion) = ($temp1 =~ /(\d.*)/);
print "$temp1\n"; #$temp1 contains: build_version = 1411450178
print "$buildVersion\n"; #$buildVersion contains: 1411450178
But when I try to do it with one variable it only prints out the 1 ie that it found the match, but I would like the actual value. See below:
my ($temp2) = ($buildLog =~ /(build\_version.*\d*)/);
$temp2 = ($temp2 =~ /(\d.*)/);
print "$temp2\n"; #$temp1 just prints out 1
Could anybody please provide a quick explination of the behaviour and if it is indeed possible to use only one variable to get the content of the search ?
Thanks,
CJ
Only one regex is needed:
use strict;
use warnings;
my $buildLog = 'foobar build_version = 1411450178 bazbiz';
my ($buildVersion) = $buildLog =~ /build_version\D*(\d+)/;
print "$buildVersion\n";
Outputs:
1411450178
This answer has the correct solution for what you are trying to do. I just wanted to provide a quick explanation of why your code is not working as you expect.
You are confusing scalar and list modes. Your code
$temp2 = ($temp2 =~ /(\d.*)/);
is taking the results of the match (in a list context) and assigning it to a scalar. This assigns the number of elements in the list to the scalar.
You could also have used
$temp2 = ($temp2 =~ /(\d.*)/)[0];
to pick up the first match result.
#jm666's answer works because it assigns the list of match results to a list of variables.
The following
my ($temp2) = ($buildLog =~ /(build\_version.*\d*)/);
($temp2) = ($temp2 =~ /(\d.*)/);
will print
1411450178

Whole word matching with unexpected insertion in data

I have string consider
my $string = 'String need to be evaluated';
in $string I'm searching evaluated or any other word.
problem is their may be insertion of some tags in string
eg. Str<data>ing need to be eval<data>ua<data>ted which is unexpected.
In this case how could I search for the words?
here is the code I tried:
my $string = 'Text to be evaluated';
my $string2 = "Te<data>xt need to be eval<data2>ua<data>ted";
# patten to match
$pattern = "evaluated";
#b = split('',$pattern);
for my $i(#b){
$i="$i"."\(?:<data>\)?";
print "$i#\n";
}
$pattern = join('',#b);
print "\n$pattern\n";
if ($string2 =~ /$pattern/){
print "$pattern found\n";
}
Do you suggest any other method or module to make it easy? i don't know what kind of data will get inserted.
Not sure if that is what you need but how about
#b = split('',$pattern);
for my $i(#b){
$i=$i.".*";
print "$i \n";
}
$pattern = join('',#b);
That should match any string that had the pattern before it got random insertions as long as the characters of the pattern are still there and in the correct order.
It does find evaluated in the string esouhgvw8vwrg355#*asrgl/\u[\w]atet(45)<data>efdvd what is about as noisy as it gets. But of course, if it is impossible to distinguish between insertion and original string, you will get "false" positives. For example if the string used to be evaluted and it becomes something like evalu<hereisyourmissinga>ted you will get a positive. Of course, if you knew that insertions would always be in tags while text is not, users answer is much safer.
As long as you single quote your input string, characters like [\w] (45) and whatnot should not hurt either. I cannot see why they would be interpolated at any point.
Of course, you could use regexp to do the job:
foreach my $s ($string,$string2){
my $cs= $s;
### canonize
$cs =~ s!<[^>]*>!!gs;
### match
if ($cs =~ m!$pattern!i){
print "Found $pattern in $s!\n";
}
}

perl negative look behind with groupings

I have a problem trying to get a certain match to work with negative look behind
example
#list = qw( apple banana cherry);
$comb_tlist = join ("|", #list);
$string1 = "include $(dir)/apple";
$string2 = "#include $(dir)/apple";
if( $string1 =~ /^(?<!#).*($comb_tlist)/) #matching regex I tried, works kinda
The array holds a set of variables that is matched against the string.
I need the regex to match $string1, but not $string2.
It matches $string1, but it ALSO matches $string2. Can anyone tell me what I am attempting wrong here. Thanks!
The problem is that negative lookbehind and beginning of line ^ is both zero width matches. So when you say
"start at the beginning of the string"
and then say
"check that the character before it is not #"
...you actually check the character before the start of the string. Which is of course not #, because it is nothing.
Use a lookahead instead. This works:
use strict;
use warnings;
my #list = qw( apple banana cherry);
my $comb_tlist = join ("|", #list);
my $string1 = 'include $(dir)/apple';
my $string2 = '#include $(dir)/apple';
if( $string1 =~ /^(?!#).*($comb_tlist)/) { say "String1"; }
if( $string2 =~ /^(?!#).*($comb_tlist)/) { say "String2"; }
Note that you have made four critical mistakes in your sample code. First off, you use string1 which is a bareword, which will be interpreted as a string. Second, you declare #list but then use #tlist. Third, you don't (seem to) use
use strict;
use warnings;
These pragmas could have informed you of your error, and without them, it is fairly likely that you would not have been warned about your first two critical errors. There is no good reason not to use them, so do that in the future.
Fourth, the declaration
$string1 = "include $(dir)/apple";
Means that you try to interpolate the variable $( in your string. $ is a meta character in double quoted strings, so you should use single quotes:
my $string1 = 'include $(dir)/apple';
Some problems:
Always use use strict; use warnings;.
Fix the use of string1 where you meant $string1.
Fix the scoping errors detected by the above by using my where appropriate.
Fix the typo in the variable names (#list vs #tlist).
I'm sure you didn't mean to interpolate the $( variable.
You'll never find a # before the first character of the string, so /^(?<!#).* .../ makes no sense. It simply means /^.* .../. You probably wanted /^[^#].* .../
You don't need negative lookbehind, just match a first character that is not #:
use strict;
use warnings;
my #list = qw( apple banana cherry);
my $comb_tlist = join ("|", #list);
my $string1 = "include dir/apple";
my $string2 = "#include dir/apple";
for ($string1, $string2) {
print "match:$_\n" if( /^[^#].*($comb_tlist)/);
}
Also, if you mean to match a literal $(dir), then you need to escape the $ sign with a backslash, otherwise it denotes a scalar variable. If this is the case, "$(dir)" should be \$(dir) in Perl code.
Sometimes complex regexes became trivial, if you just split them in two or three.
Filterout commented strings in first step.

Manipulating backreferences for substitution in perl

As a part of an attempt to replace scientific numbers with decimal numbers I want to save a backreference into a string variable, but it doesn't work.
My input file is:
,8E-6,
,-11.78E-16,
,-17e+7,
I then run the following:
open FILE, "+<C:/Perl/input.txt" or die $!;
open(OUTPUT, "+>C:/Perl/output.txt") or die;
while (my $lines = <FILE>){
$find = "(?:,)(-?)(0|[1-9][0-9]*)(\.)?([0-9]*)?([eE])([+\-]?)([0-9]+)(?:,)";
$noofzeroesbeforecomma = eval("$7-length($4)");
$replace = '"foo $noofzeroesbeforecomma bar"';
$lines =~ s/$find/$replace/eeg;
print (OUTPUT $lines);
}
close(FILE);
I get
foo bar
foo bar
foo bar
where I would have expected
foo 6 bar
foo 14 bar
foo 7 bar
$noofzeroesbeforecomma seems to be empty or non-existant.
Even with the following adjustment I get an empty result
$noofzeroesbeforecomma = $2;
Only inserting $2 directly in the replace string gives me something (which is then, unfortunately, not what I want).
Can anyone help?
I'm running Strawberry Perl (5.16.1.1-64bit) on a 64-bit Windows 7 machine, and quite inexperienced with Perl
Your main problem is not using
use strict;
use warnings;
warnings would have told you
Use of uninitialized value $7 in concatenation (.) or string at ...
Use of uninitialized value $4 in concatenation (.) or string at ...
I would recommend you try and find a module that can handle scientific notation, rather than trying to hack your own.
Your code, in a working order might look something like this. As you can see, I have put a q() around your eval string to avoid it being evaluated before $7 and $4 exists. I also removed the eval itself, since while double eval on an eval is somewhat excessive.
use strict;
use warnings;
while (my $lines = <DATA>) {
my $find="(?:,)(-?)(0|[1-9][0-9]*)(\.)?([0-9]*)?([eE])([+\-]?)([0-9]+)(?:,)";
my $noof = q|$7-length($4)|;
$lines =~ s/$find/$noof/eeg;
print $lines;
}
__DATA__
,8E-6,
,-11.78E-16,
,-17e+7,
Output:
6
14
7
As a side note, not using strict is asking for trouble. Doing it while using a variable name such as $noofzeroesbeforecomma is asking for twice the trouble, as it is rather easy to make typos.
This is not about backreferences but the original problem, transforming numbers from scientific notation. I'm sure there are some cases in which this fails:
#!/usr/bin/env perl
use strict;
use warnings;
use bignum;
for (<DATA>) {
next unless /([+-]?\d+(?:\.\d+)?)[Ee]([+-]\d+)/;
print $1 * 10 ** $2 . "\n";
}
__DATA__
,8E-6,
,-11.78E-16,
,-17e+7,
Output:
0.000008
-0.000000000000001178
-170000000
I suggest you use the Regexp::Common::number plugin for the Regexp::Common module which will find all real numbers for you and allow you to replace those that have an exponent marker
This code shows the idea. using the -keep option makes the module put each component into one of the $N variables. The exponent marker - e or E - is in $7, so the number can be transformed depending on whether this was present
use strict;
use warnings;
use Regexp::Common;
my $real_re = $RE{num}{real}{-keep};
while (<>) {
s/$real_re/ $7 ? sprintf '%.20f', $1 : $1 /eg;
print;
}
output
Given your example input, this code produces the following. The values can be tidied up further using additional code in the substitution
,0.00000800000000000000,
,-0.00000000000000117800,
,-170000000.00000000000000000000,
The problem is that Perl can handle all those types of expressions. And since the standard item of data in Perl is the string, you would only need to capture the expression to use it. So, take this expression:
/(-?\d+(?:.\d+)?[Ee][+-]?\d+)/
to extract it from the surrounding text and use sprintf to format it, like Borodin showed.
However, if it helps you to see a better case of what you tried to do, this works better
my ( $whole, $frac, $expon )
= $line =~ m/(?:,)-?(0|[1-9]\d*)(?:\.(\d*))?[eE]([+\-]?\d+)(?:,)/
;
my $num = $expon - length( $frac );
Why not capture the sign with the exponent anyway, if you're going to do arithmetic with it?
It's better to name your captures and eschew eval when it's not necessary.
The substitution--as is--doesn't make much sense.
Really, since neither the symbols or the digits can be case sensitive, just put a (?i) at the beginning, and avoid the E "character class" [Ee]:
/((?i)-?\d+(?:.\d+)?e[+-]?\d+)/