Regex matching by scalars in perl - regex

I am using regular expression using scalars here. First time though. I will put the code. It should be self evident
#!/usr/bin/perl
my $regex = "PM*C";
my $var = "PM_MY_CALC";
if($var =~ m/$regex/){
print "match \n";
}
else{
print "no match\n";
}
The output that I get is "no match"..
am i missing something obvious here? obviously It did not match any other stuff.. so just made both the regex and the variable to be checked equal.. still no match.
I have tried doing this too..
if($var =~ $regex ){
based on some search from perlMonks.

am i missing something obvious here?
You're missing how regular expressions work. They don't work how shell filename expansion works.
Your regex uses * which means "zero of more of the preceding character". So M* matches nothing, 'M', 'MM', 'MMM', etc.
You wanted to match "PM" followed by any number of any character followed by "C". The correct regex for that is PM.*C. A dot (.) means "match (almost) any character" and (as I said above) * matches zero or more of that.
I recommend reading the Perl Regular Expression tutorial.

Related

Pre-compiled regex with special characters matching

I'm trying to match if a word such as *FOO (* as a normal character) is in a line. My input is a C++ source code. I need to use a pre-compiled regex for this due to program flow requirements, so I tried the following:
$pattern = qr/[^a-zA-Z](\*FOO)[^a-zA-Z]|^\s*(\*FOO)[^a-zA-Z]/;
And I use it like this:
if ($line =~ m/$pattern/) { ... }
It works and catches lines containing *FOO such as hey *FOO.BAR but also matches lines such as:
//FOO programming using stuff and things
which I want to ignore. What am I missing? Is \* not the right way to escape * in a pre-compiled regex in perl? If *FOO is stored in $word and the pattern looks like this:
$pattern = qr/[^a-zA-Z](\\$word)[^a-zA-Z]|^\s*(\\$word)[^a-zA-Z]/;
Is that different from the previous pattern? Because I tried both and the result seems to be the same.
I found a way to bypass this problem by removing the first char of $word and escaping * in the pattern, but if $word = "**.?FOO" for example, how do I create a qr// with $word so that all the meta-characters are escaped?
You do need to escape the *. One way to do it is by the quotemeta \Q operator:
use warnings;
use strict;
my $qr = qr/\Q*FOO/;
while (<DATA>) { print if /$qr/ }
__DATA__
//FOO programming using stuff and things
hey *FOO.BAR
Note that this escapes all ASCII non-"word" characters through the rest of the pattern. If you need to limit its action to only a part of the pattern then stop it using \E. Please see linked docs.
The above determines whether *FOO is in the line, regardless of whether it is a word or a part of one. It is not clear to me which is needed. Once that is specified the pattern can be adjusted.
Note that /\*FOO/ works, too. What you tried failed probably because of all the rest that you are trying to match, which purpose I do not understand. If you only need to detect whether the pattern is present the above does it. if there is a more specific requirement please clarify.
As for the examples: for me that string //FOO... is not matched by the main (first) $pattern you show. The second one won't interpolate $word -- but is firstly much too convoluted. The regex can really tie one in nasty knots when pushed; I suggest to keep it simple as much as possible.
Question 1:
my $word = '*FOO';
my $pattern = qr/\\$word/;
is equivalent to
my $pattern = qr/\\*FOO/; # zero or more '\' followed by 'FOO'
The $word is simply interpolated as is.
To get something equivalent to
my $pattern = qr/\*FOO/;
you should use
my $word = '*FOO';
my $pattern = qr/\Q$word\E/;
By default, an interpolated variable is considered a mini-regular expression, meta characters in the variable such as *, +, ? are still interpreted as meta character. \Q...\E will add a backslash before any character not matching /[A-Za-z_0-9]/, thus any meta characters in the interpolated variable is interpreted as literal ones. Refer to perldoc.
Question 2
I tried
my $pattern = qr/[^a-zA-Z](\*FOO)[^a-zA-Z]|^\s*(\*FOO)[^a-zA-Z]/;
my $line = '//FOO programming using stuff and things';
if($line =~ m/$pattern/){
print "$&\n";
}
else{
print "No match!";
}
and it printed "No match!". I can't explain how you get it matched.

Regular Expression - Perl

I am trying to get the a sub string from a string using regular expression but it getting error as my regular expression is not working. Can any one help me out in writing correct one :
Here is the Pattern on which i am trying to write the regular expression :
MSM8_BD_V4.3_1-1_idle-Kr_Run3.xlsx
MSM8_BD_V4.3_2-6_mp3-Kr_Run2.xlsx
MSM8_BD_V4.3_Camera_snap-7.xlsx
MSM8_BD_V4.3_Camera_snap-8.xlsx
MSM8_BD_V4.3_Radio_202.16-0.xlsx
I am trying to get the bold part of the substring .
below is the Regular expression i tried:
my $line = "MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx";
my ($captured) = $line =~ /MSM8939_BD_V4\.\3\_[d]*(.+?)\w/gx;
print "$captured\n";
[d] matches nothing but the literal letter d. You want \d, without the brackets, to match a digit. However, it looks like you also want to include underscores. That would be [\d_].
Try this:
/^MSM8_BD_V4\.3_[\d_]*-?([^-]+)/
If I run this on your input (with e.g. perl -nE 'say $1 if /^MSM8_BD_V4\.3_[\d_]*-?([^-]+)/'), I get this output:
1_idle
6_mp3
Camera_snap
Camera_snap
Radio_202.16
my $line = "MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx";
for (qw(
MSM8939_BD_V4.3_1-1_idle-Kratos_Run3.xlsx
MSM8939_BD_V4.3_2-6_mp3-Kratos_Run2.xlsx
MSM8939_BD_V4.3_Camera_snap-7.xlsx
MSM8939_BD_V4.3_Camera_snap-8.xlsx
MSM8939_BD_V4.3_Radio_202.16-0.xlsx
)) {
my ($captured) = ($_ =~ /.*[-_]([^\W_]+_[\w.]+)-/gx);
print "$captured\n";
}
Use a greedy pattern to go as far as possible, then grab the last two strings that look like what you want which are still followed by a hyphen.
As does the other answer which was just edited while I was typing, this produces:
1_idle
6_mp3
Camera_snap
Camera_snap
Radio_202.16
This one may be more general in that the beginning of the substring is not hard-coded, i.e., you could use it in other cases which did not necessarily start with MSM8_BD_V4.3.

EXtracting sub-string in Perl?

I have a string in a variable:
$mystr = "some text %PARTS/dir1/dir2/myfile.abc some more text";
Now %PARTS is literally present in the string, it is not a variable or hash.
I want to extract the sub-string %PARTS/dir1/dir2/myfile.abc from it. I created the following reg expression. I am just a beginner in Perl. So please let me know if I have done anything wrong.
my $local_file = substr ($mystr, index($mystr, '%PARTS'), index($mystr, /.*%PARTS ?/));
I even tried this:
my $local_file = substr ($mystr, index($mystr, '%PARTS'), index($mystr, /.*%PARTS' '?/));
But both give nothing if I print $local_file.
What might be wrong here?
Thank You.
UPDATE: Referred the following sites for using this method:
http://perlmeme.org/howtos/perlfunc/substr.html see example 1c
How to take substring of a given string until the first appearance of specified character?
The index function returns the first index of the occurrence of a substring in a string, else -1. It has nothing to do with regular expressions.
Regular expressions are applied to a string with the bind operator =~.
To extract the matched area of a regular expression, enclose the pattern in parens (a capture group). The matched substring will then be available in $1:
my $str = "some text %PARTS/dir1/dir2/myfile.abc some more text";
if ($str =~ /(%PARTS\S+)/) {
my $local_file = $1;
...; # do something
} else {
die "the match failed"; # do something else
}
The \S character class will match every non-space character.
To learn about regular expressions, you can look at the perlretut.
The index function is not related to regexps. Its arguments are just strings, not regexps. So your usage is wrong.
Regexps are a powerful feature of Perl and the most appropriate tool for this task:
my ($local_file) = $mystr =~ /(%PARTS[^ ]+)/;
See perlop for more information on the =~ operator.

Regular expression which matches a specific pattern

I want to find a regular expression in Perl which matches a pattern such as this:
my $sumthing = "people say
for -->";
Over here after say there is a single newline character. So I need to find a regular expression which could match such a pattern which includes a newline within a pattern. Please help me to find this as I'm new to Perl & regular expression.
The possible methods I tried were these:
if (($sumthing !~ (/\n+$/)) && ($sumthing !~ (/^\n+/m)))
They kindly help me to find out an expression to match this kind of a pattern, but not getting the output as desired.
It's not clear what you want. Do you want match that string exactly? If so, you could use
$sumthing =~ /^people say\nfor -->\z/
or
$sumthing eq "people say\nfor -->"
Or maybe what you need to know is that . matches any character including newline when /s is used?
/people .* -->/s
The following will check for anything then new line then anything. Not sure if I totally understood your question.
if($sumthing =~ m/.*\n.*/)
Have a look at the /s modifier which causes .to match anything, including a newline.
my $str = "people say for\nsomething...";
$str =~ m{say(.*)}s and print "'$1'\n";
This would print:
' for
something...'

Regex to match suffixes to english words

I'm searching for the word "move" and i want to match "moved" as well when I print.
The way I'm going about this is:
if ($sentence =~ /($search_key)d$/i) {
$search_key = $search_keyd;
}
$subsentences[$i] =~ s/$search_key/ **$search_key** /i;
$subsentences[$i] =~ s/\b$parsewords[1]_\w+/ --$parsewords[1]--/i;
print "MATCH #$count\n",split(/_\S+/,$subsentences[$i]), "\n";
$count++;
This is part of a longer code so if anything is unclear let me know. The _ is because the words in the sentence are tagged (ex. I_NN move_VB to_PREP ....).
Where $search_keyd will be $search_key."d", which worked!
A nice addition would be to check if the word ended in e and therefore only a d would need to be appended. I'd guess it'd look something like this: e?$/d$
Even a general answer will suffice.
I'm new to Perl. So sorry if this is elementary. Thanks in advance!!!
If I understand you correctly, you want to search for "move" and add a highlight, but also include any variation of the basic word, such as "moves" "moved".
When you are replacing words in a text like this, you usually want to replace all the words, and then you need the /g operator on the regex, like so:
$subsentences[$i] =~ s/$search_key/ **$search_key** /ig
Also, you should make sure to not match partials of words. E.g. you want to match "move", but not perhaps "remove". For this, you can use \b to mark word boundry:
$subsentences[$i] =~ s/\b$search_key/ **$search_key** /ig
In order to match certain suffixes, you need a character class with valid characters or combination of characters. move[sd] will find "moves" and "moved". However, for a word like "jump", you would need to be a bit more specific: "jump(s|ed)". Note that [sd] can be replaced with (s|d). So barring any bad spelling in your text, you can get away with:
$subsentences[$i] =~ s/\b$search_key(s|d|ed)/ **$search_key$1** /ig
Note that $1 matches whatever is found inside the first matching parenthesis.
To find the number of matching words:
my $matches = $subsentences[$i] =~ s/\b$search_key(s|d|ed)/ **$search_key$1** /ig
If you want to be more specific with the suffixes, i.e. make it not match badly spelled words like "moveed", you'd need to do some special matching. Something like:
if ($search_key =~ /e$/i) { $suffix = '(s|d)' }
else { $suffix = '(s|ed)' }
my $matches = $subsentences[$i] =~ s/\b$search_key$suffix/ **$search_key$1** /ig
It can probably become very complicated the more search words you add.
Some help about regexes here
If what you want is to match all complete words which begin with your search term, i.e. 'move' matches 'move', 'moved', 'movers', etc, then you want to use a character class to detect the end of the word.
So, instead of:
if ($sentence =~ /($search_key)d$/i)
Try using:
if ($sentence =~ /($search_key\w*)\W$/i)
The \w* will match any number of standard word characters and the \W should prevent you from including other characters, such as whitespace or punctuation.