Perl regex replace in same case

Perl regex replace in same case - regex

If you have a simple regex replace in perl as follows:
($line =~ s/JAM/AAA/g){
how would I modify it so that it looks at the match and makes the replacement the same case as the match for example:
'JAM' would become 'AAA'
and 'jam' would become 'aaa'

Unicode-based solution:
use Unicode::UCD qw(charinfo);
my %category_mapping = (
Lu # upper-case Letter
=> 'A',
Ll # lower-case Letter
=> 'a',
);
join q(), map { $category_mapping{charinfo(ord $_)->{category}} } split //, 'jam';
# returns aaa
join q(), map { $category_mapping{charinfo(ord $_)->{category}} } split //, 'JAM';
# returns AAA
Here the unhandled characters resp. their categories are a bit easier to see than in the other answers.

In Perl 5 you can do something like:
$line =~ s/JAM/$_=$&; tr!A-Z!A!; tr!a-z!a!; $_/gie;
It handles all different cases of JAM, like Jam, and it's easy to add other words, eg:
$line =~ s/JAM|SPAM/$_=$&; tr!A-Z!A!; tr!a-z!a!; $_/gie;

Something like this perhaps?
http://perldoc.perl.org/perlfaq6.html#How-do-I-substitute-case-insensitively-on-the-LHS-while-preserving-case-on-the-RHS%3f
Doing it in two-steps is probably a better/simpler idea...
Using the power of google I found this
The :samecase modifier, short :ii (since it's a variant of :i) preserve case.
my $x = 'Abcd';
$x ~~ s:ii/^../foo/;
say $x; # Foocd
$x = 'ABC'
$x ~~ s:ii/^../foo/;
say $x # FOO
This is very useful if you want to globally rename your module Foo, to Bar,
but for example in environment variables it is written as all uppercase.
With the :ii modifier the case is automatically preserved.

$line =~ s/JAM/{$& eq 'jam' ? 'aaa' : 'AAA'}/gie;

Related

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.

One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";

Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi

The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.

I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge

You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".

If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

How do I use a variable as a regex modifier in Perl?

I'm writing an abstraction function that will ask the user a given question and validate the answer based on a given regular expression. The question is repeated until the answer matches the validation regexp.
However, I also want the client to be able to specify whether the answer must match case-sensitively or not.
So something like this:
sub ask {
my ($prompt, $validationRe, $caseSensitive) = #_;
my $modifier = ($caseSensitive) ? "" : "i";
my $ans;
my $isValid;
do {
print $prompt;
$ans = <>;
chomp($ans);
# What I want to do that doesn't work:
# $isValid = $ans =~ /$validationRe/$modifier;
# What I have to do:
$isValid = ($caseSensitive) ?
($ans =~ /$validationRe/) :
($ans =~ /$validationRe/i);
} while (!$isValid);
return $ans;
}
Upshot: is there a way to dynamically specify a regular expression's modifiers?

Upshot: is there a way to dynamically specify a regular expression's modifiers?
From perldoc perlre:
"(?adlupimsx-imsx)"
"(?^alupimsx)"
One or more embedded pattern-match modifiers, to be turned on (or
turned off, if preceded by "-") for the remainder of the pattern or
the remainder of the enclosing pattern group (if any).
This is particularly useful for dynamic patterns, such as those read
in from a configuration file, taken from an argument, or specified in
a table somewhere. Consider the case where some patterns want to be
case-sensitive and some do not: The case-insensitive ones merely need
to include "(?i)" at the front of the pattern.
Which gives you something along the lines of
$isValid = $ans =~ m/(?$modifier)$validationRe/;
Just be sure to take the appropriate security precautions when accepting user input in this way.

You might also like the qr operator which quotes its STRING as a regular expression.
my $rex = qr/(?$mod)$pattern/;
$isValid = <STDIN> =~ $rex;

Get rid of your $caseSensitive parameter, as it will be useless in many cases. Instead, users of that function can encode the necessary information directly in the $validationRe regex.
When you create a regex object like qr/foo/, then the pattern is at that point compiled into instructions for the regex engine. If you stringify a regex object, you'll get a string that when interpolated back into a regex will have exactly the same behaviour as the original regex object. Most importantly, this means that all flags provided or omitted from the regex object literal will be preserved and can't be overridden! This is by design, so that a regex object will continue to behave identical no matter what context it is used in.
That's a bit dry, so let's use an example. Here is a match function that tries to apply a couple similar regexes to a list of strings. Which one will match?
use strict;
use warnings;
use feature 'say';
# This sub takes a string to match on, a regex, and a case insensitive marker.
# The regex will be recompiled to anchor at the start and end of the string.
sub match {
my ($str, $re, $i) = #_;
return $str =~ /\A$re\z/i if $i;
return $str =~ /\A$re\z/;
}
my #words = qw/foo FOO foO/;
my $real_regex = qr/foo/;
my $fake_regex = 'foo';
for my $re ($fake_regex, $real_regex) {
for my $i (0, 1) {
for my $word (#words) {
my $match = 0+ match($word, $re, $i);
my $output = qq("$word" =~ /$re/);
$output .= "i" if $i;
say "$output\t-->" . uc($match ? "match" : "fail");
}
}
}
Output:
"foo" =~ /foo/ -->MATCH
"FOO" =~ /foo/ -->FAIL
"foO" =~ /foo/ -->FAIL
"foo" =~ /foo/i -->MATCH
"FOO" =~ /foo/i -->MATCH
"foO" =~ /foo/i -->MATCH
"foo" =~ /(?^:foo)/ -->MATCH
"FOO" =~ /(?^:foo)/ -->FAIL
"foO" =~ /(?^:foo)/ -->FAIL
"foo" =~ /(?^:foo)/i -->MATCH
"FOO" =~ /(?^:foo)/i -->FAIL
"foO" =~ /(?^:foo)/i -->FAIL
First, we should notice that the string representation of regex objects has this weird (?^:...) form. In a non-capturing group (?: ... ), modifiers for the pattern inside the group can be added or removed between the question mark and colon, while the ^ indicates the default set of flags.
Now when we look at the fake regex that's actually just a string being interpolated, we can see that the addition of the /i flag makes a difference as expected. But when we use a real regex object, it doesn't change anything: The outside /i cannot override the (?^: ... ) flags.
It is probably best to assume that all regexes already are regex objects and should not be interfered with. If you load the regex patterns from a file, you should require the regexes to use the (?: ... ) syntax to apply flages (e.g. (?^i:foo) as an equivalent to qr/foo/i). E.g. loading one regex per line from a file handle could look like:
my #regexes;
while (<$fh>) {
chomp;
push #regexes, qr/$_/; # will die here on regex syntax errors
}

You need to use the eval function. The below code will work:
sub ask {
my ($prompt, $validationRe, $caseSensitive) = #_;
my $modifier = ($caseSensitive) ? "" : "i";
my $ans;
my $isValid;
do {
print $prompt;
$ans = <>;
chomp($ans);
# What I want to do that doesn't work:
# $isValid = $ans =~ /$validationRe/$modifier;
$isValid = eval "$ans =~ /$validationRe/$modifier";
# What I have to do:
#$isValid = ($caseSensitive) ?
# ($ans =~ /$validationRe/) :
# ($ans =~ /$validationRe/i);
} while (!$isValid);
return $ans;
}

How do I tokenise a word given tokens that are subsumed incompletely in the word?

I understand how to use regex in Perl in the following way:
$str =~ s/expression/replacement/g;
I understand that if any part of the expression is enclosed in parentheses, it can be used and captured in the replacement part, like this:
$str =~ s/(a)/($1)dosomething/;
But is there a way to capture the ($1) above outside of the regex expression?
I have a full word which is a string of consonants, e.g. bEdmA, its vowelized version baEodamaA (where a and o are vowels), as well its split up form of two tokens, separated by space, bEd maA. I want to just pick up the vowelized form of the tokens from the full word, like so: beEoda, maA. I'm trying to capture the token within the full word expression, so I have:
$unvowelizedword = "bEdmA";
$tokens[0] = "bEd", $tokens[1] = "mA";
$vowelizedword = "baEodamA";
foreach $t(#tokens) {
#find the token within the full word, and capture its vowels
}
I'm trying to do something like this:
$vowelizedword = m/($t)/;
This is completely wrong for two reasons: the token $t is not present in exactly its own form, such as bEd, but something like m/b.E.d/ would be more relevant. Also, how do I capture it in a variable outside the regular expression?
The real question is: how can I capture the vowelized sequences baEoda and maA, given the tokens bEd, mA from the full word beEodamaA?
Edit
I realized from all the answers that I missed out two important details.
Vowels are optional. So if the tokens are : "Al" and "ywm", and the fully vowelized word is "Alyawmi", then the output tokens would be "Al" and "yawmi".
I only mentioned two vowels, but there are more, including symbols made up of two characters, like '~a'. The full list (although I don't think I need to mention it here) is:
#vowels = ('a', 'i', 'u', 'o', '~', '~a', '~i', '~u', 'N', 'F', 'K', '~N', '~K');

The following seems to do what you want:
#!/usr/bin/env perl
use warnings;
use strict;
my #tokens = ('bEd', 'mA');
my $vowelizedword = "beEodamaA";
my #regex = map { join('.?', split //) . '.?' } #tokens;
my $regex = join('|', #regex);
$regex = qr/($regex)/;
while (my ($matched) = $vowelizedword =~ $regex) {
$vowelizedword =~ s{$regex}{};
print "matched $matched\n";
}
Update as per your updated question (vowels are optional). It works from the end of the string so you'll have to gather the tokens into an array and print them in reverse:
#!/usr/bin/env perl
use warnings;
use strict;
my #tokens = ('bEd', 'mA', 'Al', 'ywm');
my $vowelizedword = "beEodamaA Alyawmi"; # Caveat: Without the space it won't work.
my #regex = map { join('.?', split //) . '.?$' } #tokens;
my $regex = join('|', #regex);
$regex = qr/($regex)/;
while (my ($matched) = $vowelizedword =~ $regex) {
$vowelizedword =~ s{$regex}{};
print "matched $matched\n";
}

Use the m// operator in so-called "list context", as this:
my #tokens = ($input =~ m/capturing_regex_here/modifiershere);

ETA: From what I understand now, what you were trying to say is that you want to match an optional vowel after each character of the tokens.
With this, you can tweak the $vowels variable to only contain the letters you seek. Optionally, you may also just use . to capture any character.
use strict;
use warnings;
use Data::Dumper;
my #tokens = ("bEd", "mA");
my $full = "baEodamA";
my $vowels = "[aeiouy]";
my #matches;
for my $rx (#tokens) {
$rx =~ s/.\K/$vowels?/g;
if ($full =~ /$rx/) {
push #matches, $full =~ /$rx/g;
}
}
print Dumper \#matches;
Output:
$VAR1 = [
'baEoda',
'mA'
];
Note that
... $full =~ /$rx/g;
does not require capturing groups in the regex.

I suspect that there is an easier way to do whatever you're trying to accomplish. The trick is not to make the regex generation code so tricky that you forget what it's actually doing.
I can only begin to guess at your task, but from your single example, it looks like you want to check that the two subtokens are in the larger token, ignoring certain characters. I'm going to guess that those sub tokens have to be in order and can't have anything else between them besides those vowel characters.
To match the tokens, I can use the \G anchor with the /g global flag in scalar context. This anchors the match to the character one after the end of the last match for the same scalar. This way allows me to have separate patterns for each sub token. This is much easier to manage since I only need to change the list of values in #subtokens.
Once you go through each of the pairs and find which ones match all the patterns, I can extract the original string from the pair.
use v5.14;
my $vowels = '[ao]*';
my #subtokens = qw(bEd mA);
# prepare the subtoken regular expressions
my #patterns = map {
my $s = join "$vowels", map quotemeta, (split( // ), '');
qr/$s/;
} #subtokens;
my #tokens = qw( baEodamA mAabaEod baEoda mAbaEoda );
my #grand_matches;
TOKEN: foreach my $token ( #tokens ) {
say "-------\nMatching $token..........";
my #matches;
PATTERN: foreach my $pattern ( #patterns ) {
say "Position is ", pos($token) // 0;
# scalar context /g and \G
next TOKEN unless $token =~ /\G($pattern)/g;
push #matches, $1;
say "Matched with $pattern";
}
push #grand_matches, [ $token, \#matches ];
}
# Now report the original
foreach my $tuple ( #grand_matches ) {
say "$tuple->[0] has both fragments: #{$tuple->[1]}";
}
Now, here's the nice thing about this structure. I've probably guessed wrong about your task. If I have, it's easy to fix without changing the setup. Let's say that the subtokens don't have to be in order. That's an easy change to the pattern I created. I just get rid of the
\G anchor and the /g flag;
next TOKEN unless $token =~ /($pattern)/;
Or, suppose that the tokens have to be in order, but other things may be between them. I can insert a .*? to match that stuff, effectively skipping over it:
next TOKEN unless $token =~ /\G.*?($pattern)/g;
It would be much nicer if I could manage all of this from the map where I create the patterns, but the /g flag isn't a pattern flag. It has to go with the operator.
I find it much easier to manage changing requirements when I don't wrap everything in a single regular expression.

Assuming the tokens need to appear in order and without anything (other than a vowel) between them:
my #tokens = ( "bEd", "mA" );
my $vowelizedword = "baEodamaA";
my $vowels = '[ao]';
my (#vowelized_sequences) = $vowelizedword =~ ( '^' . join( '', map "(" . join( $vowels, split( //, $_ ) ) . "(?:$vowels)?)", #tokens ) . '\\z' );
print for #vowelized_sequences;

Match regex and assign results in single line of code

I want to be able to do a regex match on a variable and assign the results to the variable itself. What is the best way to do it?
I want to essentially combine lines 2 and 3 in a single line of code:
$variable = "some string";
$variable =~ /(find something).*/;
$variable = $1;
Is there a shorter/simpler way to do this? Am I missing something?

my($variable) = "some string" =~ /(e\s*str)/;
This works because
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3 …).
and because my($variable) = ... (note the parentheses around the scalar) supplies list context to the match.
If the pattern fails to match, $variable gets the undefined value.

Why do you want it to be shorter? Does is really matter?
$variable = $1 if $variable =~ /(find something).*/;
If you are worried about the variable name or doing this repeatedly, wrap the thing in a subroutine and forget about it:
some_sub( $variable, qr/pattern/ );
sub some_sub { $_[0] = $1 if eval { $_[0] =~ m/$_[1]/ }; $1 };
However you implement it, the point of the subroutine is to make it reuseable so you give a particular set of lines a short name that stands in their place.
Several other answers mention a destructive substitution:
( my $new = $variable ) =~ s/pattern/replacement/;
I tend to keep the original data around, and Perl v5.14 has an /r flag that leaves the original alone and returns a new string with the replacement (instead of the count of replacements):
my $match = $variable =~ s/pattern/replacement/r;

Well, you could say
my $variable;
($variable) = ($variable = "find something soon") =~ /(find something).*/;
or
(my $variable = "find something soon") =~ s/^.*?(find something).*/$1/;

You can do substitution as:
$a = 'stackoverflow';
$a =~ s/(\w+)overflow/$1/;
$a is now "stack"

From Perl Cookbook 2nd ed
6.1 Copying and Substituting Simultaneously
$dst = $src;
$dst =~ s/this/that/;
becomes
($dst = $src) =~ s/this/that/;
I just assumed everyone did it this way, amazed that no one gave this answer.

Almost ....
You can combine the match and retrieve the matched value with a substitution.
$variable =~ s/.*(find something).*/$1/;
AFAIK, You will always have to copy the value though, unless you do not care to clobber the original.

$variable2 = "stackoverflow";
(my $variable1) = ($variable2 =~ /stack(\w+)/);
$variable1 now equals "overflow".

I do this:
#!/usr/bin/perl
$target = "n: 123";
my ($target) = $target =~ /n:\s*(\d+)/g;
print $target; # the var $target now is "123"

Also, to amplify the accepted answer using the ternary operator to allow you to specify a default if there is no match:
my $match = $variable =~ /(*pattern*).*/ ? $1 : *defaultValue*;

How do I remove all hyphens with a Perl regex?

I thought this would have done it...
$rowfetch = $DBS->{Row}->GetCharValue("meetdays");
$rowfetch = /[-]/gi;
printline($rowfetch);
But it seems that I'm missing a small yet critical piece of the regex syntax.
$rowfetch is always something along the lines of:
------S
-M-W---
--T-TF-
etc... to represent the days of the week a meeting happens

$rowfetch =~ s/-//gi
That's what you need for your second line there. You're just finding stuff, not actually changing it without the "s" prefix.
You also need to use the regex operator "=~" for this.

Here is what your code presently does:
# Assign 'rowfetch' to the value fetched from:
# The function 'GetCharValue' which is a method of:
# An Value in A Hash Identified by the key "Row" in:
# Either a Hash-Ref or a Blessed Hash-Ref
# Where 'GetCharValue' is given the parameter "meetdays"
$rowfetch = $DBS->{Row}->GetCharValue("meetdays");
# Assign $rowfetch to the number of times
# the default variable ( $_ ) matched the expression /[-]/
$rowfetch = /[-]/gi;
# Print the number of times.
printline($rowfetch);
Which is equivalent to having written the following code:
$rowfetch = ( $_ =~ /[-]/ )
printline( $rowfetch );
The magic you are looking for is the
=~
Token instead of
=
The former is a Regex operator, and the latter is an assignment operator.
There are many different regex operators too:
if( $subject =~ m/expression/ ){
}
Will make the given codeblock execute only if $subject matches the given expression, and
$subject =~ s/foo/bar/gi
Replaces ( s/) all instances of "foo" with "bar", case-insentitively (/i), and repeating the replacement more than once(/g), on the variable $subject.

Using the tr operator is faster than using a s/// regex substitution.
$rowfetch =~ tr/-//d;
Benchmark:
use Benchmark qw(cmpthese);
my $s = 'foo-bar-baz-blee-goo-glab-blech';
cmpthese(-5, {
trd => sub { (my $a = $s) =~ tr/-//d },
sub => sub { (my $a = $s) =~ s/-//g },
});
Results on my system:
Rate sub trd
sub 300754/s -- -79%
trd 1429005/s 375% --

Off-topic, but without the hyphens, how will you know whether a "T" is Tuesday or Thursday?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Perl regex replace in same case - regex

If you have a simple regex replace in perl as follows: ($line =~ s/JAM/AAA/g){ how would I modify it so that it looks at the match and makes the replacement the same case as the match for example: 'JAM' would become 'AAA' and 'jam' would become 'aaa'

In Perl 5 you can do something like: $line =~ s/JAM/$_=$&; tr!A-Z!A!; tr!a-z!a!; $_/gie; It handles all different cases of JAM, like Jam, and it's easy to add other words, eg: $line =~ s/JAM|SPAM/$_=$&; tr!A-Z!A!; tr!a-z!a!; $_/gie;

$line =~ s/JAM/{$& eq 'jam' ? 'aaa' : 'AAA'}/gie;

Related

Perl how do you assign a varanble to a regex match result

How do I use a variable as a regex modifier in Perl?

How do I tokenise a word given tokens that are subsumed incompletely in the word?

Match regex and assign results in single line of code

How do I remove all hyphens with a Perl regex?

Categories

Resources