I have a variable which is like $var = 1.2.3 or $var = Variable/1.2.3. I am trying to write a regex that matches and stores in $1. The code goes as follows:
if ($var =~ m/[\w+\/\d+.\d+.\d+])/){
$a = $1;
}
I want to match in the above condition if any of the $var prevails. Please suggest me. Thank you.
Parentheses capture. It is error-prone to rely on $1, simply use the return value from the match operator.
for my $var ('1.2.3', 'Variable/1.2.3') {
if (my ($version) = $var =~ m{
(?:\A | /) # beginning of string or a slash
(\d+ [.] \d+ [.] \d+) # capture version number triple
\z # end of string
}msx) {
print ">>> $version <<<\n";
}
}
__END__
>>> 1.2.3 <<<
>>> 1.2.3 <<<
You should remove that character class. A character classes matches just a single character. It doesn't represent any sequence. Also, . is a meta-character in regex. To match it literally, you need to escape it.
And then, you need to make the part before / as optional, using ? quantifier, as it is not necessarily present in string:
if ($var =~ m/((?:\w+\/)?\d+\.\d+\.\d+)/){
$a = $1;
}
Just FYI, you can use any delimiter for match operator, so as to avoid escaping /:
m!((?:\w+/)?\d+\.\d+\.\d+)!
Try this:
if ($var =~ m/(?>[a-z_][a-z0-9_]*+\/)?+[0-9]++\.[0-9]++\.[0-9]++/i) {
$a = $&;
}
Related
The following regular expression gives me proper results when tried in Notepad++ editor but when tried with the below perl program I get wrong results. Right answer and explanation please.
The link to file I used for testing my pattern is as follows:
(http://sainikhil.me/stackoverflow/dictionaryWords.txt)
Regular expression: ^Pre(.*)al(\s*)$
Perl program:
use strict;
use warnings;
sub print_matches {
my $pattern = "^Pre(.*)al(\s*)\$";
my $file = shift;
open my $fp, $file;
while(my $line = <$fp>) {
if($line =~ m/$pattern/) {
print $line;
}
}
}
print_matches #ARGV;
A few thoughts:
You should not escape the dollar sign
The capturing group around the whitespaces is useless
Same for the capturing group around the dot .
which leads to:
^Pre.*al\s*$
If you don't want words like precious final to match (because of the middle whitespace, change regex to:
^Pre\S*al\s*$
Included in your code:
while(my $line = <$fp>) {
if($line =~ /^Pre\S*al\s*$/m) {
print $line;
}
}
You're getting messed up by assigning the pattern to a variable before using it as a regex and putting it in a double-quoted string when you do so.
This is why you need to escape the $, because, in a double-quoted string, a bare $ indicates that you want to interpolate the value of a variable. (e.g., my $str = "foo$bar";)
The reason this is causing you a problem is because the backslash in \s is treated as escaping the s - which gives you just plain s:
$ perl -E 'say "^Pre(.*)al(\s*)\$";'
^Pre(.*)al(s*)$
As a result, when you go to execute the regex, it's looking for zero or more ses rather than zero or more whitespace characters.
The most direct fix for this would be to escape the backslash:
$ perl -E 'say "^Pre(.*)al(\\s*)\$";'
^Pre(.*)al(\s*)$
A better fix would be to use single quotes instead of double quotes and don't escape the $:
$ perl -E "say '^Pre(.*)al(\s*)$';"
^Pre(.*)al(\s*)$
The best fix would be to use the qr (quote regex) operator instead of single or double quotes, although that makes it a little less human-readable if you print it out later to verify the content of the regex (which I assume to be why you're putting it into a variable in the first place):
$ perl -E "say qr/^Pre(.*)al(\s*)$/;"
(?^u:^Pre(.*)al(\s*)$)
Or, of course, just don't put it into a variable at all and do your matching with
if($line =~ m/^Pre(.*)al(\s*)$/) ...
Try removing trailing newline character(s):
while(my $line = <$fp>) {
$line =~ s/[\r\n]+$//s;
And, to match only words that begin with Pre and end with al, try this regular expression:
/^Pre\w*al$/
(\w means any letter of a word, not just any character)
And, if you want to match both Pre and pre, do a case-insensitive match:
/^Pre\w*al$/i
I have a perl regex which converts hyphens to spaces eg:-
$string =~ s/-/ /g;
I need to modify this to ignore specific hyphenated phrases and not replace the hyphen e.g. in a string like this:
"use-either-dvi-d-or-dvi-i"
I wish to NOT replace the hyphen in dvi-d and dvi-i so it reads:
"use either dvi-d or dvi-i"
I have tried various negative look ahead matches but failed miserably.
You can use this PCRE regex with verbs (*SKIP)(*F) to skip certain words from your match:
dvi-[id](*SKIP)(*F)|-
RegEx Demo
This will skip words dvi-i and dvi-d for splitting due to use of (*SKIP)(*F).
For your code:
$string =~ s/dvi-[id](*SKIP)(*F)|-/ /g;
Perl Code Demo
There is an alternate lookarounds based solution as well:
/(?<!dvi)-|-(?![di])/
Which basically means match hyphen if it is not preceded by dvi OR if it is not followed by d or i, thus making sure to not match - when we have dvi on LHS and [di] on RHS.
Perl code:
$string =~ s/(?<!dvi)-|-(?![di])/ /g;
Perl Code Demo 2
$string =~ s/(?<!dvi)-(?![id])|(?<=dvi)-(?![id])|(?<!dvi)-(?=[id])/ /g;
While using just (?<!dvi)-(?![id]) you will exclude also dvi-x or x-i, where x can be any character.
It is unlikely that you could get a simple and straightforward regex solution to this. However, you could try the following:
#!/usr/bin/env perl
use strict;
use warnings;
my %whitelist = map { $_ => 1 } qw( dvi-d dvi-i );
my $string = 'use-either-dvi-d-or-dvi-i';
while ( $string =~ m{ ( [^-]+ ) ( - ) ( [^-]+ ) }gx ) {
my $segment = substr($string, $-[0], $+[0] - $-[0]);
unless ( $whitelist{ $segment } ) {
substr( $string, $-[2], 1, ' ');
}
pos( $string ) = $-[ 3 ];
}
print $string, "\n";
The #- array contains the starting offsets of matched groups, and the #+ array contains the ends offsets. In both cases, element 0 refers to the whole match.
I had to resort to something like this because of how \G works:
Note also that s/// will refuse to overwrite part of a substitution that has already been replaced; so for example this will stop after the first iteration, rather than iterating its way backwards through the string:
$_ = "123456789";
pos = 6;
s/.(?=.\G)/X/g;
print; # prints 1234X6789, not XXXXX6789
Maybe #tchrist can figure out how to bend various assertions to his will.
we can ignore specific words using negative Look-ahead and negative Look-behind
Example :
(?!pattern)
is a negative look-ahead assertion
in your case the pattern is
$string =~ s/(?<!dvi)-(?<![id])/ /g;
output :
use either dvi-d or dvi-i
Reference : http://www.perlmonks.org/?node_id=518444
Hope this will help you.
How can I find the first substring until I find the first digit?
Example:
my $string = 'AAAA_BBBB_12_13_14' ;
Result expected: 'AAAA_BBBB_'
Judging from the tags you want to use a regular expression. So let's build this up.
We want to match from the beginning of the string so we anchor with a ^ metacharacter at the beginning
We want to match anything but digits so we look at the character classes and find out this is \D
We want 1 or more of these so we use the + quantifier which means 1 or more of the previous part of the pattern.
This gives us the following regular expression:
^\D+
Which we can use in code like so:
my $string = 'AAAA_BBBB_12_13_14';
$string =~ /^\D+/;
my $result = $&;
Most people got half of the answer right, but they missed several key points.
You can only trust the match variables after a successful match. Don't use them unless you know you had a successful match.
The $&, $``, and$'` have well known performance penalties across all regexes in your program.
You need to anchor the match to the beginning of the string. Since Perl now has user-settable default match flags, you want to stay away from the ^ beginning of line anchor. The \A beginning of string anchor won't change what it does even with default flags.
This would work:
my $substring = $string =~ m/\A(\D+)/ ? $1 : undef;
If you really wanted to use something like $&, use Perl 5.10's per-match version instead. The /p switch provides non-global-perfomance-sucking versions:
my $substring = $string =~ m/\A\D+/p ? ${^MATCH} : undef;
If you're worried about what might be in \D, you can specify the character class yourself instead of using the shortcut:
my $substring = $string =~ m/\A[^0-9]+/p ? ${^MATCH} : undef;
I don't particularly like the conditional operator here, so I would probably use the match in list context:
my( $substring ) = $string =~ m/\A([^0-9]+)/;
If there must be a number in the string (so, you don't match an entire string that has no digits, you can throw in a lookahead, which won't be part of the capture:
my( $substring ) = $string =~ m/\A([^0-9]+)(?=[0-9])/;
$str =~ /(\d)/; print $`;
This code print string, which stand before matching
perl -le '$string=q(AAAA_BBBB_12_13_14);$string=~m{(\D+)} and print $1'
AAAA_BBBB_
I'm trying to match a regular expression in Perl. My code looks like the following:
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/$pattern/) {
print "Match found!"
}
The problem arises in that brackets indicate a character class (or so I read) when Perl tries to match the regex, and the match ends up failing. I know that I can escape the brackets with \[ or \], but that would require another block of code to go through the string and search for the brackets. Is there a way to have the brackets automatically ignored without escaping them individually?
Quick note: I can't just add the backslash, as this is just an example. In my real code, $source and $pattern are both coming from outside the Perl code (either URIEncoded or from a file).
\Q will disable metacharacters until \E is found or the end of the pattern.
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ($source =~ m/\Q$pattern/) {
print "Match found!"
}
http://www.anaesthetist.com/mnm/perl/Findex.htm
Use quotemeta():
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = quotemeta("Hello_[version]");
if ($source =~ m/$pattern/) {
print "Match found!"
}
You are using the Wrong Tool for the job.
You do not have a pattern! There are NO regex
characters in $pattern!
You have a literal string.
index() is for working with literal strings...
my $source = "Hello_[version]; Goodbye_[version]";
my $pattern = "Hello_[version]";
if ( index($source, $pattern) != -1 ) {
print "Match found!";
}
You can escape set of special characters in an expression by using the following command.
expression1 = 'text with special characters like $ % ( )';
expression1 =~s/[\?\*\+\^\$\[\]\\\(\)\{\}\|\-]/"\\$&"/eg ;
#This will escape all the special characters
print "expression1'; # text with special characters like \$ \% \( \)
I'm using a Perl program to extract text from a file. I have an array of strings which I use as delimiters for the text, e.g:
$pat = $arr[1] . '(.*?)' . $arr[2];
if ( $src =~ /$pat/ ) {
print $1;
}
However, two of the strings in the array are $450 and (Buy now). The problem with these is that the symbols in the strings represent end-of-string and capture group in Perl regular expressions, so the text doesn't parse as I intend.
Is there a way around this?
Try Perl's quotemeta function. Alternatively, use \Q and \E in your regex to turn off interpolation of values in the regex. See perlretut for more on \Q and \E - they may not be what you're looking for.
quotemeta escapes meta-characters so they are interpreted as literals. As a shortcut, you can use \Q...\E in double-quotish context to surround stuff that should be quoted:
$pat = quotemeta($arr[1]).'(.*?)'.quotemeta($arr[2]);
if($src=~$pat) { print $1 }
or
$pat = "\Q$arr[1]\E(.*?)\Q$arr[2]"; # \E not necessary at the end
if($src=~$pat) { print $1 }
or just
if ( $src =~ /\Q$arr[1]\E(.*?)\Q$arr[2]/ ) { print $1 }
Note that this isn't limited to interpolated variables; literal characters are affected too:
perl -wle'print "\Q.+?"'
\.\+\?
though obviously it happens after variable interpolation, so "\Q$foo" doesn't become '\$foo'.
Use quotemeta:
$pat = quotemeta($arr[1]) . '(.*?)' . quotemeta($arr[2]);
if ($src =~ $pat)
print $1;