Slashes and hashes in Perl and metacharacters - regex

I'd asked earlier about escaping special characters and understand the rules around // and ## yet the example below doesn't work and by my understanding I need to escape, the escape character. It's being searched for as match for it's usual meaning of \ between the names. I'm stumped. Please help. This has me in knots, despite probably appearing easy to the masses.I know I could have written it as $userInfo =~ #\#;
#!C:\strawberry\perl\bin\perl.exe
#strict
#diagnostics
$userInfo = "firstname\middlename\lastname.";
if($userInfo =~ m/\\/){
print("Found it");
}
else{
print("No match found");
}

The problem there is that you have to escape the backslash in your $userInfo assignment too:
$userInfo = "firstname\\middlename\\lastname.";

You are trying to search a string which contains the literal bakslash character \. Double quotes interpolate. Use single quotes instead.
use warnings;
use strict;
my $userInfo = 'firstname\middlename\lastname.';
if ($userInfo =~ m/\\/){
print("Found it");
}
else{
print("No match found");
}
The warnings pragma would have generated a warning message.
See also: Quote and Quote-like Operators

I agree with toolic, if you can, use single quotes.
It will save some preprocessing time needed for string interpolation.
However, if you really need to escape special characters, you can write it like this:
#!C:\strawberry\perl\bin\perl.exe
#strict
#diagnostics
$userInfo = "firstname\\middlename\\lastname."; #please note escaped backslahes
if($userInfo =~ m/\\/)
{
print("Found it");
}
else
{
print("No match found");
}

Related

Regex matching by scalars in perl

I am using regular expression using scalars here. First time though. I will put the code. It should be self evident
#!/usr/bin/perl
my $regex = "PM*C";
my $var = "PM_MY_CALC";
if($var =~ m/$regex/){
print "match \n";
}
else{
print "no match\n";
}
The output that I get is "no match"..
am i missing something obvious here? obviously It did not match any other stuff.. so just made both the regex and the variable to be checked equal.. still no match.
I have tried doing this too..
if($var =~ $regex ){
based on some search from perlMonks.
am i missing something obvious here?
You're missing how regular expressions work. They don't work how shell filename expansion works.
Your regex uses * which means "zero of more of the preceding character". So M* matches nothing, 'M', 'MM', 'MMM', etc.
You wanted to match "PM" followed by any number of any character followed by "C". The correct regex for that is PM.*C. A dot (.) means "match (almost) any character" and (as I said above) * matches zero or more of that.
I recommend reading the Perl Regular Expression tutorial.

Strings with quotemeta enabled not able to match specific regex

For following strings with quotemeta enabled, the if statements are not able to match .cpp and .o file names. Am I doing anything wrong here.
E\:\\P4\\NTG5\\PATHOLOGY_products\\arm\-qnx\-m650\-4\.4\.2\-osz\-trc\-dbg\\gen\\deliveries\\ntg5\\arm\\api\\sys\\most\\pf\\mss\\src\\private\\DSIDSYSMOSTServerMoCCAStream\.cpp\
`E\:\\P4\\NTG5\\PATHOLOGY_products\\arm\-qnx\-m650\-4\.4\.2\-osz\-trc\-dbg\\bin\\deliveries\\ntg5\\arm\\api\\sys\\most\\pf\\mss\\src\\DSIDSYSMOSTServerMoCCAStream\.o\`
if ($a_path =~ m/[\\>](\w+\.(?:cpp|c))/) {
$compile_line_array->source_filename($a_path);
$compile_line_array->include_list_index($include_path_cnt);
$j=0;
last;
}
if($a_path =~ m/[\\>](\w+\.(?:o))/) {
$compile_line_array->object_file($a_path);
}
The regexes match a word character followed by a .; if your strings have a backslash before every ., they will not match.
Somehow, you are not thinking about this correctly: "quotemeta" isn't something that is enabled or disabled, it is an operator that sticks backslashes before some characters in your string. Why are you using it in the first place?
Why do you have your filenames run through quotemeta? As you've demonstrated, that's going to backslash escape all your .'s. Therefore if that's what you want to match against, you'll have to add some backslashes to your regex.
if ($a_path =~ m/[\\>](\w+\\\.(?:cpp|c))/) {
or
if($a_path =~ m/[\\>](\\\w+\.(?:o))/) {

Search for a pattern in a string

I need to determine whether
+review CR #someuser
exists in a string. How can I do that?
You will need to escape the meta characters and use a matching regex m//.
if ($string =~ /\+review CR \#someuser/) {
# do something
}
Note that you cannot use the \Q ... \E escape sequence to escape the meta characters, since #someuser will still be interpolated. You could use it for the +, but then you would still need to escape the #, so this way is simpler. You can also use the quotemeta function. However, in this case that might be overkill.
Read more about this in perldoc perlop
Use index:
$search_string = "+review CR \#someuser";
if (index($string, $search_string) != -1) { # found }
Or, if you use a regex, you'll want to make sure the '+' and '#' are properly escaped:
if ( $string =~ m#\+review CR \#someuser# ) {
# found
}

What regex should be used to avoid symbols before text in perl?

i have following variable. i only want to print yes if the variable has "imoport/canada/campingplaces/tobermory" not # or anything. What should insert in a regex for this kind of things.
my $textfile = "# imoport/canada/campingplaces/tobermory
imoport/canada/campingplaces/tobermory
#imoport/canada/campingplaces/tobermory";
my $textNeeded= "imoport/canada/campingplaces/tobermory"
THIS IS WHAT i am using
if ($textfile =~ m/$textNeeded/i) {
print "yes working"
}
note:- i am getting data from differnt text files so some text files might just have "#imoport/canada/campingplaces/tobermory". I want to avoid those
Despite the quite vague problem description, I think I have puzzled out what you mean. You mean you may have lines where the text is commented out with #, and you want to avoid matching those.
print "yes" if $textfile =~ /^\s*$textNeeded/im;
This will match any string inside $textfile which has a newline followed by optional whitespace followed by your string. The /m option makes the regex multiline, meaning that ^ and $ match line endings represented by newlines inside a larger string.
You may wish to be wary of regex meta characters in your search string. If for example your search string is foo[bar].txt, those brackets will be interpreted as a character class instead. In which case you would use
/^\s*\Q$textNeeded\E/im
instead. The \Q ... \E will make the text inside match only literal characters.
I think you need to create an Anchor to say you want a match if your target string appears at the BEGINNING of the line. This uses the up-carat symbol:
if ($textfile =~ m/^$textNeeded/i) {
print "yes working"
}
This wont report a match if you have spaces or tabs before your textNeeded string.
To simply return the rows having no leading hash, something like this:
my $textfile = "# imoport/canada/campingplaces/tobermory
imoport/canada/campingplaces/tobermory
#imoport/canada/campingplaces/tobermory";
for (split /^/, $textfile) {
print $_ if(m/^\s*[a-zA-Z].*/);
}
Returns:
imoport/canada/campingplaces/tobermory

Perl Regex negation for multiple words

I need to exclude some URLs for a jMeter test:
dont exclude:
http://foo/bar/is/valid/with/this
http://foo/bar/is/also/valid/with/that
exclude:
http://foo/bar/is/not/valid/with/?=action
http://foo/bar/is/not/valid/with/?=action
http://foo/bar/is/not/valid/with/specialword
Please help me?
My following Regex isnt working:
foo/(\?=|\?action|\?form_action|specialword).*
First problem: / is the general delimiter so escape it with \/ or alter the delimiter.
Second Problem: It will match only foo/action and so on, you need to include a wildcard before the brackets: foo\/.*(\?=|\?action|\?form_action|specialword).*
So:
/foo\/.*(\?=|\?action|\?form_action|specialword).*/
Next problem is that this will match the opposite: Your excludes. You can either finetune your regex to do the inverse OR you can handle this in your language (i.e. if there is no match, do this and that).
Always pay attention to special characters in regex. See here also.
There are countless ways to shoot yourself in the foot with regular expressions. You could write some kind of "parser" using /g and /c in a loop, but why bother? It seems like you are already having trouble with the current regular expression.
Break the problem down into smaller parts and everything will be less complicated. You could write yourself some kind of filter for grep like:
sub filter {
my $u = shift;
my $uri = URI->new($u);
return undef if $uri->query;
return undef if grep { $_ eq 'specialword' } $uri->path_segments;
return $u;
}
say for grep {filter $_} #urls;
I wouldn't cling that hard to a regular expression, especially if others have to read the code too...
Change the regex delimiter to something other than '/' so you don't have to escape it in your matches. You might do:
m{//foo/.+(?:\?=action|\?form_action|specialword)$};
The ?: denotes grouping-only.
Using this, you could say:
print unless m{//foo/.+(?:\?=action|\?form_action|specialword)$};
Your alternation is wrong. foo/(\?=|\?action|\?form_action|specialword) matches any of
foo/?=
foo/?action
foo/?form_action
foo/?specialword
so you need instead
m{foo/.*(?:\?=action|\?=form_action|specialword)}
The .* is necessary to account for the possible bar/is/valid/with/this after /foo/.
Note that I have changed your ( .. ) to the non-capturing (?: .. ) and I have used braces for the regex delimiter to avoid having to escape the slashes in the expression.
Finally, you need to write either
unless ($url =~ m{/foo/.*(?:\?=action|\?=form_action|specialword)}) { ... }
or
if ($url !~ m{/foo/.*(?:\?=action|\?=form_action|specialword)}) { ... }
since the regex matches URLs that are to be discarded.