Search for a pattern in a string - regex

I need to determine whether
+review CR #someuser
exists in a string. How can I do that?

You will need to escape the meta characters and use a matching regex m//.
if ($string =~ /\+review CR \#someuser/) {
# do something
}
Note that you cannot use the \Q ... \E escape sequence to escape the meta characters, since #someuser will still be interpolated. You could use it for the +, but then you would still need to escape the #, so this way is simpler. You can also use the quotemeta function. However, in this case that might be overkill.
Read more about this in perldoc perlop

Use index:
$search_string = "+review CR \#someuser";
if (index($string, $search_string) != -1) { # found }
Or, if you use a regex, you'll want to make sure the '+' and '#' are properly escaped:
if ( $string =~ m#\+review CR \#someuser# ) {
# found
}

Related

Regular expression chaining/mixing in perl

Consider the following:
my $p1 = "(a|e|o)";
my $p2 = "(x|y|z)";
$text =~ s/($p1)( )[$p2]([0-9])/some action to be done/g;
Is the regular expression pattern in string form equal to the concatenation of the elements in the above? That is, can the above can be written as
$text =~ s/((a|e|o))( )[(x|y|z)]([0-9])/ some action to be done/g;
Well, yes, variables in a pattern get interpolated into the pattern, in a double-quoted context, and the expressions you show are equivalent. See discussion in perlretut tutorial.
(I suggest using qr operator for that instead. See discussion of that in perlretut as well.)
But that pattern clearly isn't right
Why those double parenthesis, ((a|e|o))? Either have the alternation in the variable and capture it in the regex
my $p1 = 'a|e|o'; # then use in a regex as
/($p1)/ # gets interpolated into: /(a|e|o)/
or indicate capture in the variable but then drop parens in the regex
my $p1 = '(a|e|o)'; # use as
/$p1/ # (same as above)
The capturing parenthesis do their job in either way and in both cases a match ( a or e or o) is captured into a suitable variable, $1 in your expression since this is the first capture
A pattern of [(x|y|z)] matches either one of the characters (, x, |,... (etc) -- that [...] is the character class, which matches either of the characters inside (a few have a special meaning). So, again, either use the alternation and capture in your variable
my $p2 = '(x|y|z)'; # then use as
/$p2/
or do it using the character class
my $p2 = 'xyz'; # and use as
/([$p2])/ # --> /([xyz])/
So altogether you'd have something like
use warnings;
use strict;
use feature 'say';
my $text = shift // q(e z7);
my $p1 = 'a|e|o';
my $p2 = 'xyz';
$text =~ s/($p1)(\s)([$p2])([0-9])/replacement/g;
say $_ // 'undef' for $1, $2, $3, $4;
I added \s instead of a literal single space, and I capture the character-class match with () (the pattern from the question doesn't), since that seems to be wanted.
Neither snippets are valid Perl code. They are therefore equivalent, but only in the sense that neither will compile.
But say you have a valid m//, s/// or qr// operator. Then yes, variables in the pattern would be handled as you describe.
For example,
my $p1 = "(a|e|o)";
my $p2 = "(x|y|z)";
$text =~ /($pl)( )[$p2]([0-9])/g;
is equivalent to
$text =~ /((a|e|o))( )[(x|y|z)]([0-9])/g;
As mentioned in an answer to a previous question of yours, (x|y|z) is surely a bug, and should be xyz.

Why is "Search pattern not terminated" in this snippet?

I am trying to update a piece of code to remove any non-alphanumeric character, assign the resultant string to a new variable, and rewrite my HTML to include that value in a new meta tag:
if ( $main::url =~ m/index:Devices/ )
{
my $prodname = getMetaValue(\$doc,'Product_Name');
$prodname =~ tr/[^a-zA-Z0-9 ];
$strippedname =~ $prodname;
$doc =~ s{</head>}{<meta name='Stripped_Name' content='$strippedname' />\n</head>}is;
}
The last line throws a "Search pattern not terminated" error, and I can't figure out why. I use a similar method that does work elsewhere in the script:
if ( $main::url =~ m/index:Devices/ )
{
my $prodname = getMetaValue(\$doc,'Product_Name');
my $brandname = getMetaValue(\$doc,'Manufacturer_Name');
my $devicefullname = $brandname.' '.$prodname;
$doc =~ s{</head>}{<meta name='Device_Full_Name' content='$devicefullname' />\n</head>}is;
}
Any idea why the special character removal script fails me?
Thank you!
The syntax of the tr operator is tr/CHARS/REPLACEMENT/. Further, it performs a transliteration (not regex match) which normally replaces the given literal characters, and in a rather particular way.
But you can do what you want with tr, as it allows ranges and has /c modifier (complement)
$prodname =~ tr/a-zA-Z0-9 //dc;
From Quote-Like-Operators in perlop
If the /c modifier is specified, the SEARCHLIST character set is complemented.
However, using tr/// (specially with /c) is a bit obscure in comparison with using s///, which you also utilize later in the code. Use of s/// would make it clearer
$prodname =~ s/[^a-zA-Z0-9 ]//g;
The modifier /g makes it remove all occurences of characters specified by [^...].
The regex itself can also be written as
s/[^a-z0-9 ]//gi;
but see Negation in perlrecharclass for notes on using /i with negated class and unicode. For an efficiency improvement we can add the + quantifier, s/[...]+//gi, as all occurences need be removed anyway. Note that the tr/// should be much faster here.
With POSIX character classes this can be written as s/[^[:alnum:] ]//g;
tr/// needs three instances of the delimiter, not just one.
$prodname =~ tr/[^a-zA-Z0-9 ];
Moreover, [ means the literal square bracket in tr. Maybe you wanted m// or s///?

Strings with quotemeta enabled not able to match specific regex

For following strings with quotemeta enabled, the if statements are not able to match .cpp and .o file names. Am I doing anything wrong here.
E\:\\P4\\NTG5\\PATHOLOGY_products\\arm\-qnx\-m650\-4\.4\.2\-osz\-trc\-dbg\\gen\\deliveries\\ntg5\\arm\\api\\sys\\most\\pf\\mss\\src\\private\\DSIDSYSMOSTServerMoCCAStream\.cpp\
`E\:\\P4\\NTG5\\PATHOLOGY_products\\arm\-qnx\-m650\-4\.4\.2\-osz\-trc\-dbg\\bin\\deliveries\\ntg5\\arm\\api\\sys\\most\\pf\\mss\\src\\DSIDSYSMOSTServerMoCCAStream\.o\`
if ($a_path =~ m/[\\>](\w+\.(?:cpp|c))/) {
$compile_line_array->source_filename($a_path);
$compile_line_array->include_list_index($include_path_cnt);
$j=0;
last;
}
if($a_path =~ m/[\\>](\w+\.(?:o))/) {
$compile_line_array->object_file($a_path);
}
The regexes match a word character followed by a .; if your strings have a backslash before every ., they will not match.
Somehow, you are not thinking about this correctly: "quotemeta" isn't something that is enabled or disabled, it is an operator that sticks backslashes before some characters in your string. Why are you using it in the first place?
Why do you have your filenames run through quotemeta? As you've demonstrated, that's going to backslash escape all your .'s. Therefore if that's what you want to match against, you'll have to add some backslashes to your regex.
if ($a_path =~ m/[\\>](\w+\\\.(?:cpp|c))/) {
or
if($a_path =~ m/[\\>](\\\w+\.(?:o))/) {

Perl Regex negation for multiple words

I need to exclude some URLs for a jMeter test:
dont exclude:
http://foo/bar/is/valid/with/this
http://foo/bar/is/also/valid/with/that
exclude:
http://foo/bar/is/not/valid/with/?=action
http://foo/bar/is/not/valid/with/?=action
http://foo/bar/is/not/valid/with/specialword
Please help me?
My following Regex isnt working:
foo/(\?=|\?action|\?form_action|specialword).*
First problem: / is the general delimiter so escape it with \/ or alter the delimiter.
Second Problem: It will match only foo/action and so on, you need to include a wildcard before the brackets: foo\/.*(\?=|\?action|\?form_action|specialword).*
So:
/foo\/.*(\?=|\?action|\?form_action|specialword).*/
Next problem is that this will match the opposite: Your excludes. You can either finetune your regex to do the inverse OR you can handle this in your language (i.e. if there is no match, do this and that).
Always pay attention to special characters in regex. See here also.
There are countless ways to shoot yourself in the foot with regular expressions. You could write some kind of "parser" using /g and /c in a loop, but why bother? It seems like you are already having trouble with the current regular expression.
Break the problem down into smaller parts and everything will be less complicated. You could write yourself some kind of filter for grep like:
sub filter {
my $u = shift;
my $uri = URI->new($u);
return undef if $uri->query;
return undef if grep { $_ eq 'specialword' } $uri->path_segments;
return $u;
}
say for grep {filter $_} #urls;
I wouldn't cling that hard to a regular expression, especially if others have to read the code too...
Change the regex delimiter to something other than '/' so you don't have to escape it in your matches. You might do:
m{//foo/.+(?:\?=action|\?form_action|specialword)$};
The ?: denotes grouping-only.
Using this, you could say:
print unless m{//foo/.+(?:\?=action|\?form_action|specialword)$};
Your alternation is wrong. foo/(\?=|\?action|\?form_action|specialword) matches any of
foo/?=
foo/?action
foo/?form_action
foo/?specialword
so you need instead
m{foo/.*(?:\?=action|\?=form_action|specialword)}
The .* is necessary to account for the possible bar/is/valid/with/this after /foo/.
Note that I have changed your ( .. ) to the non-capturing (?: .. ) and I have used braces for the regex delimiter to avoid having to escape the slashes in the expression.
Finally, you need to write either
unless ($url =~ m{/foo/.*(?:\?=action|\?=form_action|specialword)}) { ... }
or
if ($url !~ m{/foo/.*(?:\?=action|\?=form_action|specialword)}) { ... }
since the regex matches URLs that are to be discarded.

Slashes and hashes in Perl and metacharacters

I'd asked earlier about escaping special characters and understand the rules around // and ## yet the example below doesn't work and by my understanding I need to escape, the escape character. It's being searched for as match for it's usual meaning of \ between the names. I'm stumped. Please help. This has me in knots, despite probably appearing easy to the masses.I know I could have written it as $userInfo =~ #\#;
#!C:\strawberry\perl\bin\perl.exe
#strict
#diagnostics
$userInfo = "firstname\middlename\lastname.";
if($userInfo =~ m/\\/){
print("Found it");
}
else{
print("No match found");
}
The problem there is that you have to escape the backslash in your $userInfo assignment too:
$userInfo = "firstname\\middlename\\lastname.";
You are trying to search a string which contains the literal bakslash character \. Double quotes interpolate. Use single quotes instead.
use warnings;
use strict;
my $userInfo = 'firstname\middlename\lastname.';
if ($userInfo =~ m/\\/){
print("Found it");
}
else{
print("No match found");
}
The warnings pragma would have generated a warning message.
See also: Quote and Quote-like Operators
I agree with toolic, if you can, use single quotes.
It will save some preprocessing time needed for string interpolation.
However, if you really need to escape special characters, you can write it like this:
#!C:\strawberry\perl\bin\perl.exe
#strict
#diagnostics
$userInfo = "firstname\\middlename\\lastname."; #please note escaped backslahes
if($userInfo =~ m/\\/)
{
print("Found it");
}
else
{
print("No match found");
}