perl - strings comparison and regex

perl - strings comparison and regex - regex

What is the difference between the two lines?
if ($data =~ m/$str/) {
#### ^--- HERE
print "OK";
}
and
if ($data =~ /$str/) {
print "OK";
}
The whole difference is just an 'm'.

m is indicator that you're about to use matching regexp, as opposed to replacing, using transliteration or other operators that can be used with /. If you use / as separator, then m is optional. Standalone / assumes m. m is mandatory if you want to use other symbols as quotes around regexp like $str =~ m|$regexp|. This is useful for writing more readable code if you regexp contains lots of / inside so you don't have to quote them.
Additionally, some other separators that can be specified with m will process quoted string differently.
http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators
With the m you can use any pair of non-whitespace (ASCII) characters
as delimiters. This is particularly useful for matching path names
that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is
the delimiter, then a match-only-once rule applies, described in
m?PATTERN? below. If "'" (single quote) is the delimiter, no
interpolation is performed on the PATTERN. When using a character
valid in an identifier, whitespace is required after the m.

Related

Perl search and replace using variables, string contains dot

I’m using a variable to search and replace a string using Perl.
I want to replace the string 23.0 with 23.0.1, so I tried this:
my $old="23.0";
my $new="23.0.1";
$_ =~ s/$old/$new/g;
The problem is that it also replaced the string 2310, so I tried:
my $old="23\.0"
and also /ee.
But can’t get the correct syntax for it to work. Can someone show me the correct syntax?

There are two things that will help you here:
The quotemeta function - that will escape meta characters. And also the \Q and \E regex flags, that stop regex interpolation.
print quotemeta "21.0";
Or:
my $old="23.0";
my $new="23.0.1";
my $str = "2310";
$str =~ s/\Q$old\E/$new/g;
print $str;

Just use single quotes and escape the dot.
my $old='23\.0';

To complement Sobrique's excellent answer, let me note that the reason your attempt with "23\.0" didn't work is that "23\.0" and "23.0" evaluate to the same string: in a double-quoted string literal, the backslash escape sequence \. simply evaluates to ..
There are several things you could do to avoid this:
If you indeed want to match a fixed string, and don't need or want to include any special regexp metacharacters in it, you can do as Sobrique suggest and use quotemeta or \Q to escape them.
In particular, this is almost always the correct solution if the string to be matched comes from user input. If you do want to allow some limited set of non-literal metacharacters, you can unescape those after running the pattern through quotemeta. For a simple example, here's a quick-and-dirty way to turn a basic glob-like pattern (using the metacharacters ? and * for "any character" and "any string of characters" repectively) into an equivalent regexp:
my $regexp = "^\Q$glob\E\$"; # quote and anchor the pattern
$regexp =~ s/\\\?/./g; # replace "?" (escaped to "\?" by \Q) with "."
$regexp =~ s/\\\*/.*/g; # replace "*" (escaped to "\*" by \Q) with ".*"
Conversely, if you want to have a literal regexp pattern in your code, without immediately matching it against something, you can use the qr// regexp-like quote operator, like this:
my $old = qr/\b23\.0(\.0)?\b/; # match 23.0 or 23.0.0 (but not 123.012!)
my $new = "23.0.1"; # just a literal string
s/$old/$new/g; # replace any string matching $old in $_ with $new
Note that qr// has other effects beyond just allowing you to use regexp syntax in a string literal: it actually pre-compiles the pattern into a special Regexp object, so that it doesn't need to be recompiled every time it's used later. In particular, as a side effect, the string representation of a qr// regexp literal will usually not exactly match the original content, although it will be equivalent as a regexp. For example, say qr/\b23\.0(\.0)?\b/ will, on my Perl version, output (?^u:\b23\.0(\.0)?\b).
You could also just use a normal double-quoted string literal, and double any backslashes in it, but that's (usually) less efficient than using qr//, and also less readable due to leaning toothpick syndrome.
Using a single-quoted string literal would be slightly better, since backslashes in a single-quoted string are only special when followed by another backslash or a single quote. Even so, readability can still suffer if you happen to need to match any literal backslashes in your regexp, not to mention that it's easy to create subtle bugs if you forget to double a backslash in those rare places where it's still needed.

perl: how to check alphanumeric values and limit the string size to 30 by using regex

I am trying to write a regex for perl that would check for alphanumeric values (having spaces) but not including underscore "_" and limit the number of character to 30 I am trying this but this is not working could anyone please tell me what I am doing wrong! This code is even taking special characters as alphanumeric values. $currLine = 'Kapil# 123' this should not be a valid value.
** apologies by $currLine = "regex" i meant $currLine =~ "regex"
if ($currLine = /^[a-zA-Z0-9]{1,30}$/){
say "Line3 Good: ", $currLine;
} else {
say "Error in Line 3: Name not alphamumeric ";
}

$currLine = /^[a-zA-Z0-9]{1,30}$/
means
$currLine = $_ =~ /^[a-zA-Z0-9]{1,30}$/
You want to use
$currLine =~ /^[a-zA-Z0-9]{1,30}$/
Now on to the other problems.
You didn't allow spaces. (What follows allows whitespace. If you mean SPACE specifically, use that instead of \s).
You allow a trailing newline.
You allow 31 characters if the 31st is a newline.
You forbid many alphanumeric characters.
You forbid zero characters.
$currLine =~ /^[\p{Alnum}\s]{0,30}\z/

You are using = (assignment) where you should have =~ (bind).
Enabling warnings may have alerted you to this. The code you have is matching $_ and then assigning the results of the match to $currLine.

For your regular expression to match all alphanumeric values including spaces, you need to include for space inside your character class. You should also be using the bind operator =~ instead of = here.
if ( $currLine =~ /^[a-z0-9\s]{1,30}$/i ) { ...
Note: I included the i modifier for case-insensitive matching.

You are using assignment operator(=) instead of match operator(=~). You should change the if statement to:
if ($currLine =~ /^[a-zA-Z0-9]{1,30}$/)
This can also be shortened to:
if ($currLine =~ /^[^\W_]{1,30}$/)
[^\W] already matches anything apart from what is represented by \w. To discard _, we add it to negated character class, thus using - [^\W_]. Note however that, this matches much more than mere [a-zA-Z0-9]. It includes other unicode characters that come under word character. To just allow that regex to consider ASCII text, add /a character set modifier:
/^[^\W_]{1,30}$/a

what does s{SOMESTR}{$myvar} mean in perl?

Someone has written something like the following code :
#! /usr/bin/perl
my $myVar = 'somecomplicatedString';
my $someString = 'mySystemvariable=SOMESTR';
if ( $someString =~ /SOMESTR/ ) {
$someSting =~ s{SOMESTR}{$myVar}
}
# $someString now equals 'mySystemvariable=somecomplicatedString'
What is the difference between the s/// operator and the s{}{} operator?

You can use any set of delimiter in Perl match operator - m//, or substitution operator - s///.
Other examples:
s#oldTest#newTest#
s/oldTest/newTest/
s!oldTest!newTest!
s~oldTest~newTest~
s{oldTest}{newTest} # Here we use appropriate opening and closing braces.
m/someText/
m!someText!
/someText/ # You can omit the `m` when `/` is delimiter
!someText! # This is Wrong. You can't omit `m` in other delimiter.
The major advantage you see with varying delimiter is that you can avoid escaping a delimiter in the text, by using a different character as delimiter.
So, using # as delimiter, you don't need to escape / in the string.
From perlop doc:
Under m// operator section:
If "/" is the delimiter then the initial m is optional. With the m you can use any pair of non-whitespace (ASCII) characters as delimiters. This is particularly useful for matching path names that contain "/", to avoid LTS (leaning toothpick syndrome). If "?"
is the delimiter, then a match-only-once rule applies, described in m?PATTERN? below.
Under s/// operator section:
Any non-whitespace delimiter may replace the slashes. Add space after the s when using a character allowed in identifiers.

It is the same operator, but using different delimiters which can be used to achieve better readability.
{} are convenient when using /e modifier,
$string =~ s{(\d)}{
# ...
$1 + 1;
}e;

Difference between Perl regular expression delimiters /.../ and #...#

Today I came across two different syntaxes for a Perl regular expression match.
#I have a date string
my $time = '2012-10-29';
#Already familiar "m//":
$t =~ m/^(\d{4}-\d\d-\d\d)$/
#Completely new to me m##.
$t =~ m#^(\d{4}-\d\d-\d\d)#/
Now what is the difference between /expression/ and #expression#?

As everone else said, you can use any delimiter after the m.
/ has one special feature: you can use it by itself, e.g.
$string =~ /regexp/;
is equivalent to:
$string =~ m/regexp/;

Perl allows you to use pretty much any characters to delimit strings, including regexes. This is especially useful if you need to match a pattern that contains a lot of slash characters:
$slashy =~ m/\/\//; #Bad
$slashy =~ m|//|; #Good
According to the documentation, the first of those is an example of "leaning toothpick syndrome".
Most but not all characters behave in the same way when escaping. There is an important exception: m?...? is a special case that only matches a single time between calls to reset().
Another exception: if single quotes are used for the delimiter, no variable interpolation is done. You still have to escape $, though, as it is a special character matching the end of the line.

Nothing except what you have to escape in the regex. You can use any pair of matched characters you like.
$string = "http://example.com/";
$string =~ m!http://!;
$string =~ m#http://!#;
$string =~ m{http://};
$string =~ m/http:\/\//;

After the match or search/replace operator (the m and s, respectively) you can use any character as the delimiter, e.g. the # in your case. This also works with pairs of parenthesis: s{ abc (.*) def }{ DEF $1 ABC }x.
Advantages are that you don't have to escape the / (but the actual delimiter characters, of course). It's often used for clarity, especially when dealing with things like paths or protocols.

There is no difference; the "/" and "#" characters are used as delimiters for the expression. They simply mark the "boundary" of the expression, but are not part of the expression. In theory you can use most non-alphanumeric characters as a delimiter. Here is a link to the PHP manual (It doesn't matter that it is the PHP manual, the Regex syntax is the same, I just like it because it explains well) on Perl compatible regular expression syntax; read the part about delimiters

Which characters can be used as regular expression delimiters?

Which characters can be used as delimiters for a Perl regular expression? m/re/, m(re) and måreå all seem to work, but I'd like to know all possibilities.

From perlop:
With the m you can use any pair of non-whitespace characters as delimiters.
So anything goes, except whitespace. The full paragraph for this is:
If "/" is the delimiter then the initial m is optional. With the m you can use any pair of non-whitespace characters as delimiters. This is particularly useful for matching path names that contain "/", to avoid LTS (leaning toothpick syndrome). If "?" is the delimiter, then the match-only-once rule of ?PATTERN? applies. If "'" is the delimiter, no interpolation is performed on the PATTERN. When using a character valid in an identifier, whitespace is required after the m.

As is often the case, I wonder "can I write a Perl program to answer that question?".
Here is a pretty good first approximation of trying all of the printable ASCII chars:
#!/usr/bin/perl
use warnings;
use strict;
$_ = 'foo bar'; # something to match against
foreach my $ascii (32 .. 126) {
my $delim = chr $ascii;
next if $delim eq '?'; # avoid fatal error
foreach my $m ('m', 'm ') { # with and without space after "m"
my $code = $m . $delim . '(\w+)' . $delim . ';';
# print "$code\n";
my $match;
{
no warnings 'syntax';
($match) = eval $code;
}
print "[$delim] didn't compile with $m$delim$delim\n" if $#;
if (defined $match and $match ne 'foo') {
print "[$delim] didn't match correctly ($match)\n";
}
}
}

Just about any non-whitespace character can be used, though identifier characters have to be separated from the initial m by whitespace. Though when you use a single quote as the delimiter, it disables interpolation and most backslash escaping.

There is currently a bug in the lexer that sometimes prevents UTF-8 characters from being used as a delimiter, even though you can sneak Latin1 by it if you aren't in full Unicode mode.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

perl - strings comparison and regex - regex

What is the difference between the two lines? if ($data =~ m/$str/) { #### ^--- HERE print "OK"; } and if ($data =~ /$str/) { print "OK"; } The whole difference is just an 'm'.

Related

Perl search and replace using variables, string contains dot

perl: how to check alphanumeric values and limit the string size to 30 by using regex

what does s{SOMESTR}{$myvar} mean in perl?

Difference between Perl regular expression delimiters /.../ and #...#

Which characters can be used as regular expression delimiters?

Categories

Resources