Regex to match last character of a string - regex

So I have little problem, I need to check in perl if last character in string is "a". I know, that I can do it in this way:
$test = "mama";
$test2 = substr $test, -1
And now just check if $test2 not equal "a". But how can I do this with regex?

The $ matches the end of the string:
my $test = "mama";
print "Terminal 'a' in $test\n" if $test =~ /a$/;

In Perl, $ does not necessarily match the end of the string:
^ Match string start (or line, if /m is used)
$ Match string end (or line, if /m is used) or before newline
\b Match word boundary (between \w and \W)
\B Match except at word boundary (between \w and \w or \W and \W)
\A Match string start (regardless of /m)
\Z Match string end (before optional newline)
\z Match absolute string end
\G Match where previous m//g left off
\K Keep the stuff left of the \K, don't include it in $&
Therefore, to check if the last character of $s really is 'a', you must use:
if ($s =~ /a\z/) { ...
because
$ perl -E 'say "yes" if "a\n" =~ /a$/'
yes

Related

Perl Regexp::Common package not matching certain real numbers when used with word boundary

The following code below print "34" instead of the expected ".34"
use strict;
use warnings;
use Regexp::Common;
my $regex = qr/\b($RE{num}{real})\s*/;
my $str = "This is .34 meters of cable";
if ($str =~ /$regex/) {
print $1;
}
Do I need to fix my regex? (The word boundary is need as not including it will cause it match something string like xx34 which I don't want to)
Or is it is a bug in Regexp::Common? I always thought that a longest match should win.
The word boundary is a context-dependent regex construct. When it is followed with a word char (letter, digit or _) this location should be preceded either with the start of a string or a non-word char. In this concrete case, the word boundary is followed with a non-word char and thus requires a word char to appear right before this character.
You may use a non-ambiguous word boundary expressed with a negative lookbehind:
my $regex = qr/(?<!\w)($RE{num}{real})/;
^^^^^^^
The (?<!\w) negative lookbehind always denotes one thing: fail the match if there
is no word character immediately to the left of the current location.
Or, use a whitespace boundary if you want your matches to only occur after whitespace or start of string:
my $regex = qr/(?<!\S)($RE{num}{real})/;
^^^^^^^
Try this patern: (?:^| )(\d*\.?\d+)
Explanation:
(?:...) - non-capturing group
^| - match either ^ - beginning oof a string or - space
\d* - match zero or more digits
\.? - match dot literally - zero or one
\d+ - match one or more digits
Matched number will be stored in first capturing group.
Demo

How can I match start of the line or a character in Perl?

For example, this is ok:
my $str = 'I am $name. \$escape';
$str =~ s/[^\\]\K\$([a-z]+)/Bob/g;
print $str; # 'I am Bob. \$escape';
But below is not what I was expected.
my $str = '$name';
$str =~ s/[^\\]\K\$([a-z]+)/Bob/g;
print $str; # '$name';
How can I correct this?
How can I match start of the line or a character in Perl?
The circumflex inside a character class loses the meaning of the start-of-string anchor. Instead of a character class, you need to use a non-capturing group:
$str =~ s/(?:^|\\)\K\$([a-z]+)/Bob/g;
^^^^^^^^
This (?:^|\\) will either assert the position at the string start or will match \.
For those who understand the question as match only if the $ symbol is not escaped, the solution will be
$str =~ s/(?<!\\)(?:\\\\)*\K\$([a-z]+)/Bob/g;
Here, the (?<!\\) zero-width assertion is a negative lookbehind that fails the match if $ is preceded with \ symbol and (?:\\\\)* will consume any escaped backslashes (if present) before $ while \K match reset operator will discard all these backslashes from the match value.
If your goal is to match $ that are not escaped by a backslash, you can change your pattern to:
(?<!\\)(?:\\{2})*\K\$([a-z]+)
This way you don't have to use an alternation since the negative lookbehind matches a position not preceded by a backslash (that includes the start of the string).
In addition, (?:\\{2})* prevents to miss cases when a backslash before a $ is itself escaped with an other backslash. For example: \\$name

Matching first letter of word

I want to match the first letter of a word in one string to another with the similar letter. In this example the letter H:
25HB matches to HC
I am using the match operator shown below:
my ($match) = ( $value =~ m/^d(\w)/ );
to not match the digit, but the first matching word character. How could I correct this?
That regex doesn't do what you think it does:
m/^d(\w)/
Matches 'start of line' - letter d then a single word character.
You may want:
m/^\d+(\w)/
Which will then match one or more digits from the start of line, and grab the first word character after that.
E.g.:
my $string = '25HC';
my ( $match ) =( $string =~ m/^\d+(\w)/ );
print $match,"\n";
Prints H
You are not clear about what you want. If you want to match the first letter in a string to the same letter later in the string:
m{
( # start a capture
[[:alpha:]] # match a single letter
) # end of capture
.*? # skip minimum number of any character
\1 # match the captured letter
}msx; # /m means multilines, /s means . matches newlines, /x means ignore whitespace in pattern
See perldoc perlre for more details.
Addendum:
If by word, you mean any alphanumeric sequence, this may be closer to what you want:
m{
\b # match a word boundary (start or end of a word)
\d* # greedy match any digits
( # start a capture
[[:alpha:]] # match a single letter
) # end of capture
.*? # skip minimum number of any character
\b # match a word boundary (start or end of a word)
\d* # greedy match any digits
\1 # match the captured letter
}msx; # /m means multilines, /s means . matches newlines, /x means ignore whitespace in pattern
You could try ^.*?([A-Za-z]).
The following code returns:
ITEM: 22hb
MATCH: h
ITEM: 33HB
MATCH: H
ITEM: 3333
MATCH:
ITEM: 43 H
MATCH: H
ITEM: HB33
MATCH: H
Script.
#!/usr/bin/perl
my #array = ('22hb','33HB','3333','43 H','HB33');
for my $item (#array) {
my $match = $1 if $item =~ /^.*?([A-Za-z])/;
print "ITEM: $item \nMATCH: $match\n\n";
}
I believe this is what you are looking for:
(If you can provide more clear example of what you are looking for we may be able to help you better)
The following code takes two strings and finds the first non-digit character common in both the strings:
my $string1 = '25HB';
my $string2 = 'HC';
#strip all digits
$string1 =~ s/\d//g;
foreach my $alpha (split //, $string1) {
# for each non-digit check if we find a match
if ($string2 =~ /$alpha/) {
print "First matching non-numeric character: $alpha\n";
exit;
}
}

Matching regexp with boundary values in Perl

I am trying to match below mentioned regexp with \b and \W. It doesn't match with \b but matches with \W
my $response = "ABC-12-1-1::HELLO=TX,PROVFEADDR=\"\",ValueFORM=NAME-CITY-STREET-PRT,";
print "\n\n\n$response\n\n\n";
if ( $response =~ /PROVFEADDR=\b/ ) ##### matching with //PROVFEADDR=\W/
{
print "matched\n";
} else {
print "not matched\n";
}
Any clues
As per the user comments I am editing post a little.
I understood now why it is matching with \W. Below is the problem why i started using \b
PROVFEADDR is a variable to match. In this particular case I have to match PROVFEADR=. Earlier we were using \W+ instead of \b. With \W+, problem is when we have to match at the end of string. \W+ expects atleast one \W which is not there if it is at the last of the string. So I replaced with \b which worked in the above mentioned scenario. Any suggestion which can handle both cases?
The reason \b does not match is that it needs a word and a non-word character next to it, and you have two non-word characters.
In your comments, you have mentioned that you are looking for a replacement for \W that also matches end of line, in which case a negative lookahead assertion can be used:
if($response =~ /PROVFEADDR=(?!\w)/)
It asserts that the next character is not an alphanumeric character. Which will also match end of line (the empty string).
In $response, the character after PROVFEADDR= is the double quote, not a word, so it matches \W(non-word).
It doesn't match \b because it's not a word boundary. Compare it with:
if($response =~ /PROVFEADDR\b=/)
Here, between R and = is a word boundary.

match exactly a word starting with $( or special character)contained in a string in Perl

how do i exactly match a word like $abc from a string "this is $$abc abc$abc $abc abc_$abc_ing";
You can use the regex:
(\$[a-z]+)
Explanation:
( : Start of group
\$ : A literal $. Since $ is a metacharacter we escape it.
[a-z]+ : one or more letters that is a word. You can use the modifier i
to match uppercase letters aswell.
) : End of grouping