Why do strings not match? - regex

I have a Wordpad file from which I extract two strings and compare them. In this case they are both equal, but I cannot use the =~ expression to evaluate them.
if($pin_list =~ /$lvl_list/){ do something}
What I have tried in debug mode:
Both strings are equal as evaluated by eq
Both strings are equal as evaluated by ==
Manually set another variable to same string and then perform if statement with new variable; if($pin_list =~ /$x/){do something}. This attempt was successful.
Performed chomp(var) on both string vars several times and then ran code. FAILED
Removed carriage return via $tst_pins =~ s/\n//g on both vars. FAILED
Length of both vars is the same.
Manual printed both vars and visually verified both strings are the same.
Anyone got any ideas? I suspect it is something that has to do with WordPad and perhaps a hidden char, but don't know how to track it down.
tchrist -> Good question. In this case the strings are equal, but that will not always be the case. Under normal conditions, I am simply looking for the one string to be a subset of another.
For those who may be interested. Problem solved.
I had a string that i 'joined' with '+'. So the string looked like the following:
"1+2+3+4+a+b+etc"
The '+' ended up being the problem. At the suggestion of a colleague I performed a substr and whittled away one of the strings down to the offending point. It occurred just after it captured the '+'. I then joined using a blank space instead of the '+', and everything works.
Using different characters other than the alphabet will have an impact that I still am at a loss as to explain why when everything else said it was equal.
Bret

The match operator (m// aka //) checks if the provided string is matched by the provided regex pattern, not if it is character for character equal to the provided regex pattern. If you want to build a regex pattern that will match a string exactly, use quotemeta.
This checks if $str1 is equal to $str2:
my $pat = quotemeta($str1);
$str2 =~ /^$pat\z/
quotemeta can also be called via \Q..\E.
$str1 =~ /^\Q$str2\E\z/
Of course, you could just use eq.
$str1 eq $str2

+ and other characters have special meanings inside regular expressions, so just using $expression =~ /$some_arbitrary_string/ can get you into trouble.
If the question is whether one string is literally contained in another string, you can use index and not worry about all the rules for specifying regular expressions:
if (index($pin_list, $lvl_list) >= 0) {
do_something;
}

Related

Perl is returning hash when I am trying to find the characters after a searched-for character

I want to search for a given character in a string and return the character after it.
Based on a post here, I tried writing
my $string = 'v' . '2';
my $char = $string =~ 'v'.{0,1};
print $char;
but this returns 1 and a hash (last time I ran it, the exact output was 1HASH(0x11823a498)). Does anyone know why it returns a hash instead of the character?
Return a character after a specific pattern (a character here)
my $string = 'example';
my $pattern = qr(e);
my ($ret) = $string =~ /$pattern(.)/; #--> 'x'
This matches the first occurrence of $pattern in the $string, and captures and returns the next character, x. (The example doesn't handle the case when there may not be a character following, like for the other e; it would simply fail to match so $ret would stay undef.)
I use qr operator to form a pattern but a normal string would do just as well here.
The regex match operator returns different things in scalar and list contexts: in the scalar context it is true/false for whether it matched, while in the list context it returns matches. See perlretut
So you need that matching to be in the list context, and a common way to provide that is to put the variable that is being assigned to in parenthesis.
The first problem with the example in the question is that the =~ operator binds more tightly than the . operator, so the example is effectively
my $char = ( ($string =~ 'v') . {0,1} );
So there's first the regex match, which succeeds and returns 1 (since it is in the scalar context, imposed by the . operator) and then there is a hash-reference {0,1} which is concatenated to that 1. So $char gets assigned the 1 concatenated with a stringification for a hashref, which is a string HASH(0x...) (in the parens is a hex stringification of an address).
Next, the needed . in the pattern isn't there. Got confused with the concatenation . operator?
Then, the capturing parenthesis are absent, while needed for the intended subpattern.
Finally, the matching is the scalar context, as mentioned, what would only yield true/false.
Altogether, that would need to be
my ($char) = $string =~ ( q{v} . q{(.)} );
But I'd like to add: while Perl has very fluid semantics I'd recommend to not build regex patterns on the fly like that. I'd also recommend to actually use delimiters in the match operator, for clarity (even though you indeed mostly don't have to).

Using the length of the matched group inside regex

Assume this
char=l
string="Hello, World!"
Now, I want to replace all char in string but continuous occurrence (run-length encoding) while reading from STDIN
I tried this:
$c=<>;$_=<>;print s/($c)\1*/length($&)/grse;
When the input is given as
l
Hello, World!
It returns Hello, World!. But when I ran this
$c=<>;$_=<>;print s/(l)\1*/length($&)/grse;
it returned He2o, Wor1d.
So, since the input is given in separate lines, $c contained \n (checked with $c=~/\n/)
So, I tried
$c=<>.chomp;$_=<>;print s/($c)\1*/length($&)/grse;
and
$c=<>;$_=<>;print s/($c.chomp)\1*/length($&)/grse;
Neither worked. Could anyone please say why?
In Perl, . is used to concatenate strings, and not to call methods (unlike in some other languages; Ruby for instance). Have a look at documentation of chomp to see how it should be use. You should be doing
chomp($c=<>)
Rather than
$c=<>.chomp
Your full code should thus simply be:
chomp($c=<>);$_=<>;print s/($c)\1*/length($&)/grse;
If $c is always a single character, then the regex can be simplified to s/$c+/length($&)/grse. Also, if $c can be a regex meta-character (eg, +, *, (, [, etc), then it you should escape it (and it makes sense to escape it just in case). To do so, you can use \Q..\E (or quotemeta, although it is more verbose and thus maybe less adapted to a one-liner):
s/\Q$c\E+/length($&)/grse
If you don't escape $c one way or another, and your one-liner is ran with ( as first input for instance, you'll get the following error:
Quantifier follows nothing in regex; marked by <-- HERE in m/(+ <-- HERE / at -e line 1, <> line 2
Regarding what $c=<>.chomp actually means in Perl (since this is a valid Perl code that can make sense in some contexts):
$c=<>.chomp means <> concatenated to chomp, where chomp without arguments is understood as chomp($_). And chomp returns the total number of characters removed, and since $_ is empty, no characters are removed, which means that this chomp returns 0. So you are basically writing $c=<>.0, which means that if your input is l\n, you end up with l\n0 instead of l.
One way to debug this kind of this yourself is to:
Enable warnings with the -w flag. In that case, it would have printed
Use of uninitialized value $_ in scalar chomp at -e line 1, <> line 1.
This is arguably not the most helpful warning ever, but it would have helped you get an idea of where your mistake was.
Print variables to be sure that they contain what you expect. For instance, you could co perl -wE '$c=<>.chomp;print"|$c|"', which would print:
|l
0|
Which should help giving you an idea of what was wrong.

Matching the last digits of a number in Perl

I have a file in which there are a lot of GUIDs mentioned like this
Dlg1={929EC5C7-0A40-4BE4-8F0A-60C3CB4A62A7}-SdWelcome-0
I wanted to replace the last eight digits of these GUIDs with the last eight digits of a new GUID which is already generated using a tool. What I have tried so follows.
Read the last eight digits of the generated GUID like this:
$GUID =~ /[0-9a-fA-F]{8}/;
Assign it to a new variable like:
$newGUID = $1;
Now try to replace this with the old GUID inside the file:
if ($line =~ /^.* {(.*)}/) {
$line =~ s/[0-9a-fA-F]{8}}/$newGUID/;
}
But it does not seem to be working. It replaces the last eight digits of the old GUID with 32 digits of the new GUID. How can I fix this?
it replaces the last 8 digits of old GUID with 32 digits of new GUID , any ideas how to achieve it.
You now have this:
$line =~s/[0-9a-fA-F]{8}}/$newGUID/;
You say that replaces the last eight characters of your GUID with the entire 32 digit new GUID. That means your finding and replacing the right characters, but what you're replacing it with is wrong.
What is $newGUID equal to? Is it an entire 32 digit GUID? If so, you need to pull off the last 8 characters.
Two things I would recommend.
If you are using a hexadecimal number in your regular expression, use [[:xdigit:]] and not [0-9a-fA-F]. Although both are pretty much equivalent. Using :xdigit: is cleaner and it's easier to understand.
In Perl, we love regular expressions. Heck, Perl regular expression syntax has invaded and found homes in almost all other programming languages. However, regular expressions can be difficult to get right and test. They can also be difficult to understand too. However, sometimes there are better ways of doing something besides a regular expression that's cleaner and easier to undertstand.
In this case, you should use substr rather than regular expressions. You know exactly what you want, and you know the location in the string. The substr command would make what you're doing easier to understand and even cleaner:
use constant {
GUID_RE => qr/^[[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{12}$/,
};
my $old_guid = '929EC5C7-0A40-4BE4-8F0A-60C3CB4A62A7';
my $new_guid = 'oooooooo-oooo-oooo-oooo-ooooXXXXXXXX';
# Regular expressions are great for verifying formats!
if ( not $old_guid =~ GUID_RE ) {
die qq(Old GUID "$new_guid" is not a GUID string);
}
if ( not $new_guid =~ GUID_RE ) { # Yes, I know this will die in this case
die qq(New GUID "$new_guid" is not a GUID string);
}
# Easy to understand, I'm removing the last eight characters of $old_guid
# and appending the last eight digits of $new_guid
my $munged_guid = substr( $old_guid, 0, -8 ) . substr( $new_guid, -8 );
say $munged_guid; # Prints 929EC5C7-0A40-4BE4-8F0A-60C3XXXXXXX
I'm using regular expressions to verify that the GUID are correctly formatted which is a great task for regular expressions.
I define a GUID_RE constant. You can look to see how it's defined and verify if it's in the correct format (12 hex digits, 4 hex digits, 4 hex digits, and 12 hex digits all separated by dashes).
Then, I can use that GUID_RE constant in my program, and it's easy to see what I'm doing. Is my GUID actually in the GUID_ID format?
Using substr instead of regular expressions make it easy to see exactly what I am doing. I am removing the last eight characters off of $old_guid and appending the last eight characters of $new_guid.
Again, your immediate issue is that your s/.../.../ is finding the right characters, but your substitution string isn't correct. However, this isn't the best use for regular expressions.
I think your problem is that you're not correctly setting $1 to the last eight digits (if it's coming from that regex, it would match the first eight digits and isn't setting any groups). You could instead try something like $newGUID = substr($GUID, -8);. I also think something like $GUIDTail makes more sense for the variable since it doesn't store an entire GUID.
Also, at the moment you're eating the closing curly brace. You should either include that in newGuid/guidTail, include it in the s/// call, or change the curly in the match to (?=\}) (which represents match this but don't include it in the match).
P.S.: You're making the assumption there that's there's only one GUID on the line. You may want to tack a global modifier to the match if there's any chance of multiple GUIDs (or otherwise disambiguating which one you want to modify, but this will just replace the first one).
Here's a small code snippet that demonstrates the principle I think you are after. First off, I start with a given string, and take the last 8 characters of it and store it in a new variable, $insert. Then I perform a somewhat strict substitution on the input data (here in the internal file handle DATA, which is convenient when demonstrating), and print the altered string.
The regex in the substitution looks for curly brackets { ... } with a mixture of hex digits [:xdigit:] and dashes \- between them ([[:xdigit:]\-]+), followed by 8 hex digits. The \K escape allows us to "keep" the matched string before it, so all we need to do is insert our stored string, and replace the closing curly bracket.
If you wish to try this on a file, change <DATA> to <> and run it like so:
perl script.pl input
Code:
use strict;
use warnings;
my $new = "929EC5C7-0A40-4BE4-8F0A-1234567890";
my $insert = substr($new, -8);
while (<DATA>) {
s/\{[[:xdigit:]\-]+\K[[:xdigit:]]{8}\}/$insert}/i;
print;
}
__DATA__
Dlg1={929EC5C7-0A40-4BE4-8F0A-60C3CB4A62A7}-SdWelcome-0
Output:
Dlg1={929EC5C7-0A40-4BE4-8F0A-60C334567890}-SdWelcome-0

Sensethising domains

So I'm trying to put all numbered domains into on element of a hash doing this:
### Domanis ###
my $dom = $name;
$dom =~ /(\w+\.\w+)$/; #this regex get the domain names only
my $temp = $1;
if ($temp =~ /(^d+\.\d+)/) { # this regex will take out the domains with number
my $foo = $1;
$foo = "OTHER";
$domain{$foo}++;
}
else {
$domain{$temp}++;
}
where $name will be something like:
something.something.72.154
something.something.72.155
something.something.72.173
something.something.72.175
something.something.73.194
something.something.73.205
something.something.73.214
something.something.abbnebraska.com
something.something.cableone.net
something.something.com.br
something.something.cox.net
something.something.googlebot.com
My code currently print this:
72.175
73.194
73.205
73.214
abbnebraska.com
cableone.net
com.br
cox.net
googlebot.com
lstn.net
but I want it to print like this:
abbnebraska.com
cableone.net
com.br
cox.net
googlebot.com
OTHER
lstn.net
where OTHER is all the numbered domains, so any ideas how?
You really shouldn't need to split the variable into two, e.g. this regex will match the case you want to trap:
/\d{1,3}\.\d{1,3}$/ -- returns true if the string ends with two 1-3 long digits separated by a dot
but I mean if you only need to separate those domains that are not numbered you could just check the last character in the domain whether it is a letter, because TLDs cannot contain numbers, so you would do something like
/\w$/ -- if returns true, it is not a numbered domain (providing you've stripped spaces and new lines)
But I suppose it is better to be more specific in the regex, which also better illustrates the logic you are looking for in your script, so I'd use the former regex.
And actually you could do something like this:
if (my ($domain) = $name =~ /\.(\w+.\w+)$/)
{
#the domain is assigned to the variable $domain
} else {
#it is a number domain
}
Take what it currently puts, and use the regex:
/\d+\.\d+/
if it matches this, then its a pair of numbers, so remove it.
This way you'll be able to keep any words with numbers in them.
Please, please indent your code correctly, and use whitespace to separate out various bits and pieces. It'll make your code so much easier to read.
Interestingly, you mentioned that you're getting the wrong output, but the section of the code you post has no print, printf, or say statement. It looks like you're attempting to count up the various domain names.
If these are the value of $name, there are several issues here:
if ($temp =~ /(^d+\.\d+)/) {
Matches nothing. This is saying that your string starts with one or more letter d followed by a period followed by one or more digits. The ^ anchors your regular expression to the beginning of the string.
I think, but not 100% sure, you want this:
if ( $temp =~ /\d\.\d/ ) {
This will find all cases where there are two digits with a period in between them. This is the sub-pattern to /\d+\.\d+/, so both regular expressions will match the same thing.
The
$dom =~ /(\w+\.\w+)$/;
Is matching anywhere in the entire string $dom where there are two letters, digits. or underscores with a decimal between them. Is that what you want?
I also believe this may indicate an error of some sort:
my $foo = $1;
$foo = "OTHER";
$domain{$foo} ++;
This is setting $foo to whatever $dom is matching, but then immediately resets $foo to OTHER, and increments $domain{OTHER}.
We need a sample of your initial data, and maybe the actual routine that prints your output.

how do you match two strings in two different variables using regular expressions?

$a='program';
$b='programming';
if ($b=~ /[$a]/){print "true";}
this is not working
thanks every one i was a little confused
The [] in regex mean character class which match any one of the character listed inside it.
Your regex is equivalent to:
$b=~ /[program]/
which returns true as character p is found in $b.
To see if the match happens or not you are printing true, printing true will not show anything. Try printing something else.
But if you wanted to see if one string is present inside another you have to drop the [..] as:
if ($b=~ /$a/) { print true';}
If variable $a contained any regex metacharacter then the above matching will fail to fix that place the regex between \Q and \E so that any metacharacters in the regex will be escaped:
if ($b=~ /\Q$a\E/) { print true';}
Assuming either variable may come from external input, please quote the variables inside the regex:
if ($b=~ /\Q$a\E/){print true;}
You then won't get burned when the pattern you'll be looking for will contain "reserved characters" like any of -[]{}().
(apart the missing semicolons:) Why do you put $a in square brackets? This makes it a list of possible characters. Try:
$b =~ /\Q${a}\E/
Update
To answer your remarks regarding = and =~:
=~ is the matching operator, and specifies the variable to which you are applying the regex ($b) in your example above. If you omit =~, then Perl will automatically use an implied $_ =~.
The result of a regular expression is an array containing the matches. You usually assign this so an array, such as in ($match1, $match2) = $b =~ /.../;. If, on the other hand, you assign the result to a scalar, then the scalar will be assigned the number of elements in that array.
So if you write $b = /\Q$a\E/, you'll end up with $b = $_ =~ /\Q$a\E/.
$a='program';
$b='programming';
if ( $b =~ /\Q$a\E/) {
print "match found\n";
}
If you're just looking for whether one string is contained within another and don't need to use any character classes, quantifiers, etc., then there's really no need to fire up the regex engine to do an exact literal match. Consider using index instead:#!/usr/bin/env perl
#!/usr/bin/env perl
use strict;
use warnings;
my $target = 'program';
my $string = 'programming';
if (index($string, $target) > -1) {
print "target is in string\n";
}