Replace a string or nothing in Perl - regex

I want to replace "metre" or "mt" with "m" using perl. How to do this? I was using this:
$string=~ s/[Mm][Ee\s][Tt][Rr\s][Ee\s]/m/g;
It is working for "metre", but not for "mt"

The i modifier lets you do a case-insensitive match.
$string =~ s/metre|mt/m/gi;
or
$string =~ s/m(?:etre|t)/m/gi;
, which is more efficient.
Assuming that you are trying to replace the "word" metre or mt, a unit of length, to m, you would want to use the word boundary meta character \b, like #M42 pointed out in the comments. This will prevent matches like mt in warmth.

$string =~ s{metre|mt}{m}ig;
Explanation:
$string = your string
s = replace
{metre|mt} =old content
{m} =replace content
i =ignore case
g =do this action globally

Related

Pre-compiled regex with special characters matching

I'm trying to match if a word such as *FOO (* as a normal character) is in a line. My input is a C++ source code. I need to use a pre-compiled regex for this due to program flow requirements, so I tried the following:
$pattern = qr/[^a-zA-Z](\*FOO)[^a-zA-Z]|^\s*(\*FOO)[^a-zA-Z]/;
And I use it like this:
if ($line =~ m/$pattern/) { ... }
It works and catches lines containing *FOO such as hey *FOO.BAR but also matches lines such as:
//FOO programming using stuff and things
which I want to ignore. What am I missing? Is \* not the right way to escape * in a pre-compiled regex in perl? If *FOO is stored in $word and the pattern looks like this:
$pattern = qr/[^a-zA-Z](\\$word)[^a-zA-Z]|^\s*(\\$word)[^a-zA-Z]/;
Is that different from the previous pattern? Because I tried both and the result seems to be the same.
I found a way to bypass this problem by removing the first char of $word and escaping * in the pattern, but if $word = "**.?FOO" for example, how do I create a qr// with $word so that all the meta-characters are escaped?
You do need to escape the *. One way to do it is by the quotemeta \Q operator:
use warnings;
use strict;
my $qr = qr/\Q*FOO/;
while (<DATA>) { print if /$qr/ }
__DATA__
//FOO programming using stuff and things
hey *FOO.BAR
Note that this escapes all ASCII non-"word" characters through the rest of the pattern. If you need to limit its action to only a part of the pattern then stop it using \E. Please see linked docs.
The above determines whether *FOO is in the line, regardless of whether it is a word or a part of one. It is not clear to me which is needed. Once that is specified the pattern can be adjusted.
Note that /\*FOO/ works, too. What you tried failed probably because of all the rest that you are trying to match, which purpose I do not understand. If you only need to detect whether the pattern is present the above does it. if there is a more specific requirement please clarify.
As for the examples: for me that string //FOO... is not matched by the main (first) $pattern you show. The second one won't interpolate $word -- but is firstly much too convoluted. The regex can really tie one in nasty knots when pushed; I suggest to keep it simple as much as possible.
Question 1:
my $word = '*FOO';
my $pattern = qr/\\$word/;
is equivalent to
my $pattern = qr/\\*FOO/; # zero or more '\' followed by 'FOO'
The $word is simply interpolated as is.
To get something equivalent to
my $pattern = qr/\*FOO/;
you should use
my $word = '*FOO';
my $pattern = qr/\Q$word\E/;
By default, an interpolated variable is considered a mini-regular expression, meta characters in the variable such as *, +, ? are still interpreted as meta character. \Q...\E will add a backslash before any character not matching /[A-Za-z_0-9]/, thus any meta characters in the interpolated variable is interpreted as literal ones. Refer to perldoc.
Question 2
I tried
my $pattern = qr/[^a-zA-Z](\*FOO)[^a-zA-Z]|^\s*(\*FOO)[^a-zA-Z]/;
my $line = '//FOO programming using stuff and things';
if($line =~ m/$pattern/){
print "$&\n";
}
else{
print "No match!";
}
and it printed "No match!". I can't explain how you get it matched.

How to construct Perl regex to replace (lowercase followed by uppercase) by (lowercase. uppercase)?

I would like to write a perl regex to replace some pattern to other pattern. The source pattern is lowercase followed by uppercase. The destination pattern is the lowercase followed by dot, space and the uppercase.
Here is an example:
CamelCase
I want to change it to the following:
Camel. Case
Thanks in advance!
Another solution what accepts utf8 too:
use Modern::Perl;
use utf8;
use Encode qw(encode);
foreach ( qw(CamelCase ČamelČasě ŽaBaČek) ) {
say Encode::encode('utf8', "$1. $2")
if( m{(\b\p{Upper}\w*)((?:\p{Lower}\w*\p{Upper}|\p{Upper}\w*\p{Lower})\w*\b)} );
}
produces
Camel. Case
Čamel. Časě
ŽaBa. Ček
based on: https://stackoverflow.com/a/6323679/632407
I suggest the following:
$StringToConvert = s/([A-Z][a-z]+)(?=[A-Z])/$1\. /g;
The positive look ahead "(?=[A-Z])" is the key. The only Words that are replaced are those that have another capitalized character in front of it.
Also note the "g" which means to search the string globally. This results in the following:
CamelCase
Camel. Case
as well as
CamelCaseMultiple
Camel. Case. Multiple
And on and on for as many capitalized words there may be in the string.
Solution using split and join:
my $str = 'CamelCase';
my #words = split /(?=[A-Z])/, $str;
print join '. ', #words;

Close last 4 characters in breaket of php string

I have some strings like below
my-name-is-2547
this-is-stack-2012
hllo-how-2011
Now I want the above strings to be changed to something like the ones below using regex.
my-name-is-(2547)
this-is-stack-(2012)
hllo-how-(2011)
I don't want to use substr or other, only regex replace.
$pattern = '/(\d+)$/';
$replacement = '($1)';
echo preg_replace($pattern, $replacement, $string);
If you are sure that a numbers are only at the end:
regular expression:
(\d+)
using 1 capturing group. Replaced by: ($1).
so the outpu will be:
my-name-is-(2547)
this-is-stack-(2012)
hllo-how-(2011)

Metaquoting patterns in a variable list

I have a list of patterns I want to look for in a string. These patterns are numerous and contain numerous metacharacters that I want to just match literally. So this is the perfect application for metaquoting with \Q..\E. The complication is that I need to join the variable list of patterns into a regular expression.
use strict;
use warnings;
# sample string to represent my problem
my $string = "{{a|!}} Abra\n{{b|!!}} {{b}} Hocus {{s|?}} Kedabra\n{{b|+?}} {{b|??}} Pocus\n {{s|?}}Alakazam\n";
# sample patterns to look for
my #patterns = qw({{a|!}} {{s|?}} {{s|+?}} {{b|?}});
# since these patterns can be anything, I join the resulting array into a variable-length regex
my $regex = join("|",#patterns);
my #matched = $string =~ /$regex(\s\w+\s)/; # Error in matching regex due to unquoted metacharacters
print join("", #matched); # intended result: Hocus\n Pocus\n
When I attempt to introduce metaquoting into the joining operation, they appear to have no effect.
# quote all patterns so that they match literally, but make sure the alternating metacharacter works as intended
my $qmregex = "\Q".join("\E|\Q", #patterns)."\E";
my #matched = $string =~ /$qmregex(\s\w+\s)/; # The same error
For some reason the metaquoting has no effect when it is included in the string I use as the regular expression. For me, they only work when they are added directly to a regex as in /\Q$anexpression\E/ but as far as I can tell this isn't an option for me. How do I get around this?
I don't understand your expected result, as Abra and Kedabra are the only strings preceded by any of the patterns.
To solve your problem you must escape each component of the regex separately as \Q and \E affect only the value of the string in which they appear, so "\Q" and "\E" are just the null string "" and "\E|\Q" is just "|". You could write
my $qmregex = join '|', map "\Q$_\E", #patterns;
but it is simpler to call the quotemeta function.
You must also enclose the list in parentheses (?:...) to isolate the alternation, and apply the /g modifier to the regex match to find all ocurrences within the string.
Try
use strict;
use warnings;
my $string = "{{a|!}} Abra\n{{b|!!}} {{b}} Hocus {{s|?}} Kedabra\n{{b|+?}} {{b|??}} Pocus\n {{s|?}}Alakazam\n";
my #patterns = qw( {{a|!}} {{s|?}} {{s|+?}} {{b|?}} );
my $regex = join '|', map quotemeta, #patterns;
my #matched = $string =~ /(?:$regex)(\s\w+\s)/g;
print #matched;
output
Abra
Kedabra

How can I capture multiple matches from the same Perl regex?

I'm trying to parse a single string and get multiple chunks of data out from the same string with the same regex conditions. I'm parsing a single HTML doc that is static (For an undisclosed reason, I can't use an HTML parser to do the job.) I have an expression that looks like:
$string =~ /\<img\ssrc\="(.*)"/;
and I want to get the value of $1. However, in the one string, there are many img tags like this, so I need something like an array returned (#1?) is this possible?
As Jim's answer, use the /g modifier (in list context or in a loop).
But beware of greediness, you dont want the .* to match more than necessary (and dont escape < = , they are not special).
while($string =~ /<img\s+src="(.*?)"/g ) {
...
}
#list = ($string =~ m/\<img\ssrc\="(.*)"/g);
The g modifier matches all occurences in the string. List context returns all of the matches. See the m// operator in perlop.
You just need the global modifier /g at the end of the match. Then loop through
until there are no matches remaining
my #matches;
while ($string =~ /\<img\ssrc\="(.*)"/g) {
push(#matches, $1);
}
Use the /g modifier and list context on the left, as in
#result = $string =~ /\<img\ssrc\="(.*)"/g;