Multiple substitutions with a single regular expression in perl

Multiple substitutions with a single regular expression in perl - regex

Say I have the following in perl:
my $string;
$string =~ s/ /\\ /g;
$string =~ s/'/\\'/g;
$string =~ s/`/\\`/g;
Can the above substitutions be performed with a single combined regular expression instead of 3 separate ones?

$string =~ s/([ '`])/\\$1/g;
Uses a character class [ '`] to match one of space, ' or ` and uses brackets () to remember the matched character. $1 is then used to include the remembered character in the replacement.

Separate substitutions may be much more efficient than a single complex one (e.g. when working with fixed substrings). In such cases you can make the code shorter, like this:
my $string;
for ($string) {
s/ /\\ /g;
s/'/\\'/g;
s/`/\\`/g;
}

Although it's arguably easier to read the way you have it now, you can perform these substitutions at once by using a loop, or combining them in one expression:
# loop
$string =~ s/$_/\\$_/g foreach (' ', "'", '`');
# combined
$string =~ s/([ '`])/\\$1/g;
By the way, you can make your substitutions a little easier to read by avoiding "leaning toothpick syndrome", as the various regex operators allow you to use a variety of delimiters:
$string =~ s{ }{\\ }g;
$string =~ s{'}{\\'}g;
$string =~ s{`}{\\`}g;

Related

Limit the translation to just one word in a phrase?

Coming new to Perl world from Python, and wonder if there is a simple way to limit the translation or replace to just one word in a phrase?
In the example, the 2nd word kind also got changed to lind. Is there a simple way to do the translation without diving into some looping? Thanks.
The first word has been correctly translated to gazelle, but 2nd word has been changed too as you can see.
my $string = 'gazekke is one kind of antelope';
my $count = ($string =~ tr/k/l/);
print "There are $count changes \n";
print $string; # gazelle is one lind of antelope <-- kind becomes lind too!

I don't know of an option for tr to stop translation after the first word.
But you can use a regex with backreferences for this.
use strict;
my $string = 'gazekke is one kind of antelope';
# Match first word in $1 and rest of sentence in $2.
$string =~ m/(\w+)(.*)/;
# Translate all k's to l's in the first word.
(my $translated = $1) =~ tr/k/l/;
# Concatenate the translated first word with the rest
$string = "$translated$2";
print $string;
Outputs: gazelle is one kind of antelope

Pick the first match (a word in this case), precisely what regex does when without /g, and in that word replace all wanted characters, by running code in the replacement side, by /e
$string =~ s{(\w+)}{ $1 =~ s/k/l/gr }e;
In the regex in the replacement side, /r modifier makes it handily return the changed string and doesn't change the original, what also allows a substitution to run on $1 (which can't be modified as is a read-only).

tr is a character class transliterator. For anything else you would use regex.
$string =~ s/gazekke/gazelle/;
You can put a code block as the second half of s/// to do more complicated replacements or transmogrifications.
$string =~ s{([A-Za-z]+)}{ &mangler($1) if $should_be_mangled{$1}; }ge;
Edit:
Here's how you would first locate a phrase and then work on it.
$phrase_regex = qr/(?|(gazekke) is one kind of antelope|(etc))/;
$string =~ s{($phrase_regex)}{
my $match = $1;
my $word = $2;
$match =~ s{$word}{
my $new = $new_word_map{$word};
&additional_mangling($new);
$new;
}e;
$match;
}ge;
Here's the Perl regex documentation.
https://perldoc.perl.org/perlre

How to capture every match in a global regex substitution?

I realize it is possible to achieve this with a slight workaround, but I am hoping there is a simpler way (since I often make use of this type of expression).
Given the example string:
my $str = "An example: sentence!*"
A regex can be used to match each punctuation mark and capture them in an array.
Thereafter, I can simply repeat the regex and replace the matches as in the following code:
push (#matches, $1), while ($str =~ /([\*\!:;])/);
$str =~ s/([\*\!:;])//g;
Would it be possible to combine this into a single step in Perl where substitution occurs globally while also keeping tabs on the replaced matches?

You can embed code to run in your regular expression:
my #matches;
my $str = 'An example: sentence!*';
$str =~ s/([\*\!:;])(?{push #matches, $1})//g;
But with a match this simple, I'd just do the captures and substitution separately.

Yes, it's possible.
my #matches;
$str =~ s/[*!:;]/ push #matches, $&; "" /eg;
However, I'm not convinced that the above is faster or clearer than the following:
my #matches = $str =~ /[*!:;]/g;
$str =~ tr/*!:;//d;

Use:
my $str = "An example: sentence!*";
my #matches = $str =~ /([\*\!:;])/g;
say Dumper \#matches;
$str =~ tr/*!:;//d;
Output:
$VAR1 = [
':',
'!',
'*'
];

Is that what you're looking for ?
my ($str, #matches) = ("An example: sentence!*");
#first method :
($str =~ s/([\*\!:;])//g) && push(#matches, $1);
#second method :
push(#matches, $1) while ($str =~ s/([\*\!:;])//g);

Try:
my $str = "An example: sentence!*";
push(#mys, ($str=~m/([^\w\s])/g));
print join "\n", #mys;
Thanks.

How to replace stuff in a Perl regex

I have a string $text and want to modify it with a regex. The string contains multiple sections like <NAME>John</NAME>.
I want to search for those sections, which I would normally do with something like
$text =~ m/<NAME>(.*?)<\/NAME>/g
but then make sure that there are no leading and trailing blanks and no leading non-word characters, which I would normally ensure with something like
$temp =~ s/^\s+|\s+$//g; # trim leading and trailing whitespaces
$temp = s/^\W*//g; # remove all leading non-word chars
Now my question is: How do I actually make this happen? Is it possible to use a s/// regex instead of the m//?

This is possible in a single substitution, but it's unnecessarily complex. I suggest you do a two-tier substitution using a executable replacement.
my $text = '<NAME> %^John^%
</NAME>';
$text =~ s{ (?<=<NAME>) ([^<>]*) (?=</NAME>) }{
(my $new = $1) =~ s/\A\s+|\s+\z//g;
$new =~ s/\A\W+//;
$new;
}eg;
print $text;
output
<NAME>John^%</NAME>
This is even simpler if you have version 14 or later of Perl 5, and want to use the non-destructive ( /r modifier) substitution mode.
$text =~ s{ (?<=<NAME>) ([^<>]*) (?=</NAME>) }{ $1 =~ s/\A\s+|\s+\z//gr =~ s/\A\W+//r }exg;

If I understand correctly, what you want to do is merely "clean up" the text inside the tag (insofar as it's possible to "parse" XML using regular expressions). This should do the trick:
$text =~ s/(<NAME>)\s*\W*(.*?)\s*(<\/NAME>)/$1$2$3/sgi;

Use variable as RegEx pattern

I'd like to use a variable as a RegEx pattern for matching filenames:
my $file = "test~";
my $regex1 = '^.+\Q~\E$';
my $regex2 = '^.+\\Q~\\E$';
print int($file =~ m/$regex1/)."\n";
print int($file =~ m/$regex2/)."\n";
print int($file =~ m/^.+\Q~\E$/)."\n";
The result (or on ideone.com):
0
0
1
Can anyone explain to me how I can use a variable as a RegEx pattern?

As documentation says:
$re = qr/$pattern/;
$string =~ /foo${re}bar/; # can be interpolated in other patterns
$string =~ $re; # or used standalone
$string =~ /$re/; # or this way
So, use the qr quote-like operator.

You cannot use \Q in a single-quoted / non-interpolated string. It must be seen by the lexer.
Anyway, tilde isn’t a meta-character.
Add use regex "debug" and you will see what is actually happening.

Perl ignore whitespace on replacement side of regular expression substitution

Suppose I have $str = "onetwo".
I would like to write a reg ex substitution command that ignores whitespace (which makes it more readable):
$str =~ s/
one
two
/
three
four
/x
Instead of "threefour", this produces "\nthree\nfour\n" (where \n is a newline). Basically the /x option ignores whitespace for the matching side of the substitution but not the replacement side. How can I ignore whitespace on the replacement side as well?

s{...}{...} is basically s{...}{qq{...}}e. If you don't want qq{...}, you'll need to replace it with something else.
s/
one
two
/
'three' .
'four'
/ex
Or even:
s/
one
two
/
clean('
three
four
')
/ex
A possible implementation of clean:
sub clean {
my ($s) = #_;
$s =~ s/^[ \t]+//mg;
$s =~ s/^\s+//;
$s =~ s/\s+\z//;
return $s;
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Multiple substitutions with a single regular expression in perl - regex

Say I have the following in perl: my $string; $string =~ s/ /\\ /g; $string =~ s/'/\\'/g; $string =~ s/`/\\`/g; Can the above substitutions be performed with a single combined regular expression instead of 3 separate ones?

$string =~ s/([ '`])/\\$1/g; Uses a character class [ '`] to match one of space, ' or ` and uses brackets () to remember the matched character. $1 is then used to include the remembered character in the replacement.

Separate substitutions may be much more efficient than a single complex one (e.g. when working with fixed substrings). In such cases you can make the code shorter, like this: my $string; for ($string) { s/ /\\ /g; s/'/\\'/g; s/`/\\`/g; }

Related

Limit the translation to just one word in a phrase?

How to capture every match in a global regex substitution?

How to replace stuff in a Perl regex

Use variable as RegEx pattern

Perl ignore whitespace on replacement side of regular expression substitution

Categories

Resources