perl: substitute pattern with pattern of different size - regex

here is my string A B C D and I want to replace A by 123 and C by 456 for example. However this doesn't work.
$string=~ s/A|B|C|D/123|B|456|D/;
I want this 123 B 456 D but i get this 123|B|456|D B C D
Probably because the number of characters in my two patterns is different.
Is there a way to substitute patterns of different size using some other piece of code ? Thanks a lot.

You're getting what I'd expect you to get. Your regex looks for one occurrence of either an 'A' or a 'B' or a 'C' or a 'D' and replaces it with the literal string '123|B|456|D'. Hence 'A B C D' -> '123|B|456|D B C D'
So it finds the first occurrence, the 'A' and replaces it with the string you specified. An alternation matches various strings, but the pipe characters mean nothing in the replacement slot.
What you need to do is create a mapping from input to output, like so:
my %map = ( A => '123', C => '456' );
Then you need to use it in the replacements. Let's give you a search expression:
my $search = join( '|', keys %map );
Now let's write the substitution (I prefer braces when I code substitutions that have code in them:
$string =~ s{($search)}{ $map{$1} }g;
The g switch means we match every part of the string we can, and the e switch tells Perl to evaluate the replacement expression as Perl code.
The output is '123 B 456 D'

Something like this using eval (untested).
$string=~ s/(A)|C/ length($1) ? '123': '456'/eg;
Using the eval flag in the s/// form means to evaluate the replacement
side as a line of code that returns a value.
In this case it executes a ternary conditional in the replacement code.
It's sort of like an inline regex callback.
It's much more complicated though since it can be like s///eeg so
better to refer to the docs.
Remember, eval is really evil, misspelled !!

The easiest way to do this is with two substitutions:
$string =~ s/A/123/g;
$string =~ s/B/456/g;
or even (using an inline for loop as a shorthand to apply multiple substitutions to one string):
s/A/123/g, s/B/456/g for $string;
Of course, for more complicated patterns, this may not produce the same results as doing both substitutions in one pass; in particular, this may happen if the patterns can overlap (as in A = YZ, B = XY), or if pattern B can match the string substituted for pattern A.
If you wish to do this in one pass, the most general solution is to use the /e modifier, which causes the substitution to be interpreted as Perl code, as in:
$string =~ s/(A|B)/ $1 eq 'A' ? '123' : '456' /eg;
You can even include multiple expressions, separated by semicolons, inside the substitution; the last expression's value is what will be substituted into the string. If you do this, you may find it useful to use paired delimiters for readability, like this:
$string =~ s{(A|B)}{
my $foo = "";
$foo = '123' if $1 eq 'A';
$foo = '456' if $1 eq 'B';
$foo; # <-- this is what gets substituted for the pattern
}eg;
If your patterns are constant strings (as in the simple examples above), an even more efficient solution is to use a look-up hash, as in:
my %map = ('A' => '123', 'B' => '456');
$string =~ s/(A|B)/$map{$1}/g;
With this method, you don't even need the /e modifier (although, for this specific example, adding it would make no difference). The advantage of using /e is that it lets you implement more complicated rules for choosing the replacement than a simple hash lookup would allow.

String="A B C D"
echo $String | perl - pi - e 's/A/123/' && perl - pi - e 's/C/456/'

Related

perl substitute string characters using a hash and tr

I need an efficient way to replace all chars from a string with another char based on a hash map
Currently I am using regex s/// and that is working fine. Can I use tr instead , because I just need character by character conversion.
This is what I am trying:
my %map = ( a => 9 , b => 4 , c => 8 );
my $str = 'abc';
my $str2 = $str;
$str2 =~ s/(.)/$map{$1}/g; # $str2 =~ tr /(.)/$map{$1}/ Does not work
print " $str => $str2\n";
If you need to replace exacly 1 character by 1 character, tr is ideal for you:
#!/usr/bin/perl
use strict;
use warnings;
my $str = 'abcd';
my $str2 = $str;
$str2 =~ tr /abc/948/;
print " $str => $str2\n";
It didn't delete "d", which will happen with the code from your question. Output:
abcd => 948d
No, one cannot do that with tr. That tool is very different from regex.
Its entry in Quote Like Operators in perlop says
Transliterates all occurrences of the characters found (or not found if the /c modifier is specified) in the search list with the positionally corresponding character in the replacement list [...]
and further down it adds
Characters may be literals, or (if the delimiters aren't single quotes) any of the escape sequences accepted in double-quoted strings. But there is never any variable interpolation, so "$" and "#" are always treated as literals. [...]
So one surely can't have a hash evaluated, nor match on a regex pattern in the first place.
The lack of even basic variable interpolation is explained at the very end
... the transliteration table is built at compile time, ...
We are then told of using eval, with an example, if we must use variables with tr.
In this case you'd need to first build variables, one a sequence of characters to replace (keys) and the other a sequence of their replacement characters (values), and then use them in tr via eval like in the docs. Better yet, you'd build a sub with it as in ikegami's comment. Here is a related page.
But this is the opposite of finding an approach simpler than that basic regex, the question's point.

Swapping letters with regexp

How can I swap the letter o with the letter e and e with o?
I just tried this but I don't think this is a good way of doing this. Is there a better way?
my $str = 'Absolute force';
$str =~ s/e/___eee___/g;
$str =~ s/o/e/g;
$str =~ s/___eee___/o/g;
Output: Abseluto ferco
Use the transliteration operator:
$str =~ y/oe/eo/;
E.g.
$ echo "Absolute force" | perl -pe 'y/oe/eo/'
Abseluto ferco
As has already been said, the way to do this is the transliteration operator
tr/SEARCHLIST/REPLACEMENTLIST/cdsr
y/SEARCHLIST/REPLACEMENTLIST/cdsr
Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is transliterated.
However, I want to commend you on your creative use of regular expressions. Your solution works, although the placeholder string _ee_ would've been sufficient.
tr is only going to help you for character replacements though, so I'd like to quickly teach you how to utilize regular expressions for a more complicated mass replacement. Basically, you just use the /e tag to execute code in the RHS. The following will also do the replacement you were aiming for:
my $str = 'Absolute force';
$str =~ s/([eo])/$1 eq 'e' ? 'o' : 'e'/eg;
print $str;
Outputs:
Abseluto ferco
Note how the LHS (left hand side) matches both o and e, and them the RHS (right hand side) does a test to see which matched and returns the opposite for replacement.
Now, it's common to have a list of words that you want to replace, so it's convenient to just build a hash of your from/to values and then dynamically build the regular expression. The following does that:
my $str = 'Hello, foo. How about baz? Never forget bar.';
my %words = (
foo => 'bar',
bar => 'baz',
baz => 'foo',
);
my $wordlist_re = '(?:' . join('|', map quotemeta, keys %words) . ')';
$str =~ s/\b($wordlist_re)\b/$words{$1}/eg;
Outputs:
Hello, bar. How about foo? Never forget baz.
This above could've worked for your e and o case, as well, but would've been overkill. Note how I use quotemeta to escape the keys in case they contained a regular expression special character. I also intentionally used a non-capturing group around them in $wordlist_re so that variable could be dropped into any regex and behave as desired. I then put the capturing group inside the s/// because it's important to be able to see what's being captured in a regex without having to backtrack to the value of an interpolated variable.
The tr/// operator is best. However, if you wanted to use the s/// operator (to handle more than just single letter substitutions), you could write
$ echo 'Absolute force' | perl -pe 's/(e)|o/$1 ? "o" : "e"/eg'
Abseluto ferco
The capturing parentheses avoid the redundant $1 eq 'e' test in #Miller's answer.
from man sed:
y/source/dest/
Transliterate the characters in the pattern space which appear in source to the corresponding character in dest.
and tr command can do this too:
$ echo "Absolute force" | tr 'oe' 'eo'
Abseluto ferco

Identify which part of a Perl regular expression was used when finding a match in a string

Background
Perl provides all sorts of built-in variables to get bits of a string which matched a regular expression (e.g. $MATCH, $&, or ${^MATCH} for the part of the string that matched the regex, $PREMATCH, $`, and ${^PREMATCH} for the part of the string before the part that matched, etc).
Question
Is there any way to get the portion of the regular expression which actually was used to match $MATCH?
Example
For example, say I have
my $string = "gC rL Ht Ns B lR cG sN tH";
my $re = qr/\b(a|b|c)\b/i;
$string =~ $re;
print "${^PREMATCH}\n";
print "$&\n";
print "${^POSTMATCH}\n";
The output will be
gC rL Ht Ns
B
lR cG sN tH
Desired output
The part of the regex (/\b(a|b|c)\b/i) which matched the string was b, or perhaps more properly \bb\b, with the case-insensitive switch i. How can I get b (ideally) or \bb\b? I can't find any built-in variable which stores any part of the regex that matched, only parts of the string.
Answer
Thanks to the great hint in choroba's answer, it seems that using named capture groups and the %+ built-in variable will work:
$ perl -MData::Dumper -e '
"gC rL Ht Ns B lR cG sN tH" =~ /\b((?<a>a)|(?<b>b)|(?<c>c))\b/i;
print Dumper keys %+;'
$VAR1 = 'b';
It is generally not possible as regular expressions can be very complex. The string bydgijjj matches (?:ax|by)[cd]*(ef|g[hi](?:j{2,}|klm)), what would you like it to return? Can you imagine how complex it is?
You have to construct the regular expression in a way it will tell you:
"gC rL Ht Ns B lR cG sN tH" =~ /\b((a)|(b)|(c))\b/i;
print "a:$2\nb:$3\nc:$4\n"

How do I swap using a single s/// all occurrences of two substrings?

How do I write a regex to replace a with b and b with a so that a&b will become b&a?
UPDATE
It should replace all occurrences of a with b and each b with a. The a&b is just an example to illustrate what I want. Sorry for the confusion.
You can use capturing groups and positional replacements to do this:
pax$ echo 'a&b' | perl -pne 's/([^&]*)&(.*)/\2&\1/'
b&a
The (blah blah) bits capture the "blah blah" and save it in a register for later use in the replacement string. When used with a replacement section like \1, the captured text is placed in the result.
So that regex simply captures all non-ampersand characters into register 1, the an ampersand then all the rest of the string into register 2.
In the substitution, it gives you register 2, an ampersand, then register 1.
If, as you mention in a comment, you want to do & and |, you need to capture and use the operator as well:
pax$ echo 'a&b' | perl -pne 's/([^&|]*)([&|])(.*)/\3\2\1/'
b&a
pax$ echo 'a|b' | perl -pne 's/([^&|]*)([&|])(.*)/\3\2\1/'
b|a
You can see that the positional replacements are slightly different now since you're capturing more groups but the concept is still identical.
The traditional generic way is:
my $string = "abcdefedcba";
my %subst = ( 'a' => 'b', 'b' => 'a' );
$string =~ s/(#{[ join '|', map quotemeta, sort { length($b) <=> length($a) } keys %subst ]})/$subst{$1}/g;
print $string;
Non-generically:
$string =~ s/(a|b)/{a=>'b',b=>'a'}->{$1}/ge;
Per your updated instructions:
$str =~ tr/ab/ba/;
Another possibility is to do 3 s///.
$_ = 'a&b'
# First, change every 'a' to something that does not appear in your string
s/a/\0/g;
# Then, change 'b' to 'a'
s/b/a/g;
# And now change your special character to b
s/\0/b/g;
I'd like to suggest a variation of this answer. Beginner Regex: Multiple Replaces
$text =~ s/(cat|tomatoes)/ ${{ qw<tomatoes cat cat tomatoes> }}{$1} /ge;

Perl regex replace in same case

If you have a simple regex replace in perl as follows:
($line =~ s/JAM/AAA/g){
how would I modify it so that it looks at the match and makes the replacement the same case as the match for example:
'JAM' would become 'AAA'
and 'jam' would become 'aaa'
Unicode-based solution:
use Unicode::UCD qw(charinfo);
my %category_mapping = (
Lu # upper-case Letter
=> 'A',
Ll # lower-case Letter
=> 'a',
);
join q(), map { $category_mapping{charinfo(ord $_)->{category}} } split //, 'jam';
# returns aaa
join q(), map { $category_mapping{charinfo(ord $_)->{category}} } split //, 'JAM';
# returns AAA
Here the unhandled characters resp. their categories are a bit easier to see than in the other answers.
In Perl 5 you can do something like:
$line =~ s/JAM/$_=$&; tr!A-Z!A!; tr!a-z!a!; $_/gie;
It handles all different cases of JAM, like Jam, and it's easy to add other words, eg:
$line =~ s/JAM|SPAM/$_=$&; tr!A-Z!A!; tr!a-z!a!; $_/gie;
Something like this perhaps?
http://perldoc.perl.org/perlfaq6.html#How-do-I-substitute-case-insensitively-on-the-LHS-while-preserving-case-on-the-RHS%3f
Doing it in two-steps is probably a better/simpler idea...
Using the power of google I found this
The :samecase modifier, short :ii (since it's a variant of :i) preserve case.
my $x = 'Abcd';
$x ~~ s:ii/^../foo/;
say $x; # Foocd
$x = 'ABC'
$x ~~ s:ii/^../foo/;
say $x # FOO
This is very useful if you want to globally rename your module Foo, to Bar,
but for example in environment variables it is written as all uppercase.
With the :ii modifier the case is automatically preserved.
$line =~ s/JAM/{$& eq 'jam' ? 'aaa' : 'AAA'}/gie;