perl substitute string characters using a hash and tr - regex

I need an efficient way to replace all chars from a string with another char based on a hash map
Currently I am using regex s/// and that is working fine. Can I use tr instead , because I just need character by character conversion.
This is what I am trying:
my %map = ( a => 9 , b => 4 , c => 8 );
my $str = 'abc';
my $str2 = $str;
$str2 =~ s/(.)/$map{$1}/g; # $str2 =~ tr /(.)/$map{$1}/ Does not work
print " $str => $str2\n";

If you need to replace exacly 1 character by 1 character, tr is ideal for you:
#!/usr/bin/perl
use strict;
use warnings;
my $str = 'abcd';
my $str2 = $str;
$str2 =~ tr /abc/948/;
print " $str => $str2\n";
It didn't delete "d", which will happen with the code from your question. Output:
abcd => 948d

No, one cannot do that with tr. That tool is very different from regex.
Its entry in Quote Like Operators in perlop says
Transliterates all occurrences of the characters found (or not found if the /c modifier is specified) in the search list with the positionally corresponding character in the replacement list [...]
and further down it adds
Characters may be literals, or (if the delimiters aren't single quotes) any of the escape sequences accepted in double-quoted strings. But there is never any variable interpolation, so "$" and "#" are always treated as literals. [...]
So one surely can't have a hash evaluated, nor match on a regex pattern in the first place.
The lack of even basic variable interpolation is explained at the very end
... the transliteration table is built at compile time, ...
We are then told of using eval, with an example, if we must use variables with tr.
In this case you'd need to first build variables, one a sequence of characters to replace (keys) and the other a sequence of their replacement characters (values), and then use them in tr via eval like in the docs. Better yet, you'd build a sub with it as in ikegami's comment. Here is a related page.
But this is the opposite of finding an approach simpler than that basic regex, the question's point.

Related

Perl Regex: How to parse string from " to" without \"?

I have to parse current line "abc\",","\"," by regex in Perl,
and get this result "abc\"," and "\","
I do this
while (/(\s*)/gc) {
if (m{\G(["])([^\1]+)\1,}gc){
say $2;
}
}
but it is wrong, because this regexp go to the last ",
My question is, How can I jump over this \" and stop on first ", ?
The following program performs matches according to your specification:
while (<>) {
#arr = ();
while (/("(?:\\"|[^"])*")/) {
push #arr, $1;
$_ = $';
}
print join(' ', #arr), "\n";
}
Input file input.txt:
"abc", "def"
"abc\",","\","
Output:
$ ./test.pl < input.txt
"abc" "def"
"abc\"," "\","
It can be improved to match more strictly because in this form a lot of input is possible that is maybe not desirable, but it serves as a first pointer. Additionally, it is better matching a CSV file with the corresponding module and not with regular expressions, but you have not stated if your input is really a CSV file.
Don't reinvent the wheel. If you have CSV, use a CSV parser.
use Text::CSV_XS qw( );
my $string = '"abc\",","\","';
my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
$csv->parse($_)
my #fields = $csv->fields();
Regexes aren't the best tool for this task. The standard Text::ParseWords module does this easily.
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Text::ParseWords;
my $line = '"abc\",","\","';
my #fields = parse_line(',', 1, $line);
for (0 .. $#fields) {
say "$_: $fields[$_]"
}
The output is:
0: "abc\","
1: "\","
split /(?<!\\)",(?<!\\)"/, $_
(preceded by cleaning the boundary of $_ with s/^"// && s/"$//; because enclosing external quotes didn't need to be in the definition of the input string, but you have them)
returns directly the array you want (without the need of external loop as the loop is inside the core perl function split, you might add \s* surrounding the comma according to how the string might be provided).
..but (actually just a note as you didn't mention) there could be a deeper case
If you have \" meaning " you possibly have also \\ meaning \, so you might have \\\" and \\", the last one (more generally an even number of \ preceding ") is complicate with one line regexp because look-behind is implemented for fixed size, and the unsupported regexp form (?<!\\(?:\\\\)*)" which would potentially get well also a string delimiter after backslash not intending as escape quote \" from the sequence \\", is inapplicable and a less efficient code that mine would be required, but again this marginal consideration is about the case that \\ has to be hypothetically interpreted too.

Remove all characters except numeric and period

I have a string v1.2NDM. I'm trying to use regex to get 1.2.
my $string = "v1.2NDM";
$string =~ s/[^0-9.]//;
print $string;
output: 1.2NDM but I'm trying to get 1.2.
You can remove characters with the transliteration operator:
$string =~ y/0-9.//cd;
/c means complement - match any character not specified in the search list.
/d means delete characters for which no replacement is specified in the replace list (all matching characters in this case).
Use it like this with g or global flag:
$string =~ s/[^0-9.]+//g;
It will output 1.2 now. Also better to use + after character class for efficiency reasons.

perl: substitute pattern with pattern of different size

here is my string A B C D and I want to replace A by 123 and C by 456 for example. However this doesn't work.
$string=~ s/A|B|C|D/123|B|456|D/;
I want this 123 B 456 D but i get this 123|B|456|D B C D
Probably because the number of characters in my two patterns is different.
Is there a way to substitute patterns of different size using some other piece of code ? Thanks a lot.
You're getting what I'd expect you to get. Your regex looks for one occurrence of either an 'A' or a 'B' or a 'C' or a 'D' and replaces it with the literal string '123|B|456|D'. Hence 'A B C D' -> '123|B|456|D B C D'
So it finds the first occurrence, the 'A' and replaces it with the string you specified. An alternation matches various strings, but the pipe characters mean nothing in the replacement slot.
What you need to do is create a mapping from input to output, like so:
my %map = ( A => '123', C => '456' );
Then you need to use it in the replacements. Let's give you a search expression:
my $search = join( '|', keys %map );
Now let's write the substitution (I prefer braces when I code substitutions that have code in them:
$string =~ s{($search)}{ $map{$1} }g;
The g switch means we match every part of the string we can, and the e switch tells Perl to evaluate the replacement expression as Perl code.
The output is '123 B 456 D'
Something like this using eval (untested).
$string=~ s/(A)|C/ length($1) ? '123': '456'/eg;
Using the eval flag in the s/// form means to evaluate the replacement
side as a line of code that returns a value.
In this case it executes a ternary conditional in the replacement code.
It's sort of like an inline regex callback.
It's much more complicated though since it can be like s///eeg so
better to refer to the docs.
Remember, eval is really evil, misspelled !!
The easiest way to do this is with two substitutions:
$string =~ s/A/123/g;
$string =~ s/B/456/g;
or even (using an inline for loop as a shorthand to apply multiple substitutions to one string):
s/A/123/g, s/B/456/g for $string;
Of course, for more complicated patterns, this may not produce the same results as doing both substitutions in one pass; in particular, this may happen if the patterns can overlap (as in A = YZ, B = XY), or if pattern B can match the string substituted for pattern A.
If you wish to do this in one pass, the most general solution is to use the /e modifier, which causes the substitution to be interpreted as Perl code, as in:
$string =~ s/(A|B)/ $1 eq 'A' ? '123' : '456' /eg;
You can even include multiple expressions, separated by semicolons, inside the substitution; the last expression's value is what will be substituted into the string. If you do this, you may find it useful to use paired delimiters for readability, like this:
$string =~ s{(A|B)}{
my $foo = "";
$foo = '123' if $1 eq 'A';
$foo = '456' if $1 eq 'B';
$foo; # <-- this is what gets substituted for the pattern
}eg;
If your patterns are constant strings (as in the simple examples above), an even more efficient solution is to use a look-up hash, as in:
my %map = ('A' => '123', 'B' => '456');
$string =~ s/(A|B)/$map{$1}/g;
With this method, you don't even need the /e modifier (although, for this specific example, adding it would make no difference). The advantage of using /e is that it lets you implement more complicated rules for choosing the replacement than a simple hash lookup would allow.
String="A B C D"
echo $String | perl - pi - e 's/A/123/' && perl - pi - e 's/C/456/'

Swapping letters with regexp

How can I swap the letter o with the letter e and e with o?
I just tried this but I don't think this is a good way of doing this. Is there a better way?
my $str = 'Absolute force';
$str =~ s/e/___eee___/g;
$str =~ s/o/e/g;
$str =~ s/___eee___/o/g;
Output: Abseluto ferco
Use the transliteration operator:
$str =~ y/oe/eo/;
E.g.
$ echo "Absolute force" | perl -pe 'y/oe/eo/'
Abseluto ferco
As has already been said, the way to do this is the transliteration operator
tr/SEARCHLIST/REPLACEMENTLIST/cdsr
y/SEARCHLIST/REPLACEMENTLIST/cdsr
Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is transliterated.
However, I want to commend you on your creative use of regular expressions. Your solution works, although the placeholder string _ee_ would've been sufficient.
tr is only going to help you for character replacements though, so I'd like to quickly teach you how to utilize regular expressions for a more complicated mass replacement. Basically, you just use the /e tag to execute code in the RHS. The following will also do the replacement you were aiming for:
my $str = 'Absolute force';
$str =~ s/([eo])/$1 eq 'e' ? 'o' : 'e'/eg;
print $str;
Outputs:
Abseluto ferco
Note how the LHS (left hand side) matches both o and e, and them the RHS (right hand side) does a test to see which matched and returns the opposite for replacement.
Now, it's common to have a list of words that you want to replace, so it's convenient to just build a hash of your from/to values and then dynamically build the regular expression. The following does that:
my $str = 'Hello, foo. How about baz? Never forget bar.';
my %words = (
foo => 'bar',
bar => 'baz',
baz => 'foo',
);
my $wordlist_re = '(?:' . join('|', map quotemeta, keys %words) . ')';
$str =~ s/\b($wordlist_re)\b/$words{$1}/eg;
Outputs:
Hello, bar. How about foo? Never forget baz.
This above could've worked for your e and o case, as well, but would've been overkill. Note how I use quotemeta to escape the keys in case they contained a regular expression special character. I also intentionally used a non-capturing group around them in $wordlist_re so that variable could be dropped into any regex and behave as desired. I then put the capturing group inside the s/// because it's important to be able to see what's being captured in a regex without having to backtrack to the value of an interpolated variable.
The tr/// operator is best. However, if you wanted to use the s/// operator (to handle more than just single letter substitutions), you could write
$ echo 'Absolute force' | perl -pe 's/(e)|o/$1 ? "o" : "e"/eg'
Abseluto ferco
The capturing parentheses avoid the redundant $1 eq 'e' test in #Miller's answer.
from man sed:
y/source/dest/
Transliterate the characters in the pattern space which appear in source to the corresponding character in dest.
and tr command can do this too:
$ echo "Absolute force" | tr 'oe' 'eo'
Abseluto ferco

Convert multiple Unicode in a string to character

Problem -- I have a string, say Buna$002C_TexasBuna$002C_Texas' and where $ is followed by Unicode. I want to replace these Unicode with its respective Unicode character representation.
In Perl if any Unicode is in the form of "\x{002C} then it will be converted to it respective Unicode character. Below is the sample code.
#!/usr/bin/perl
my $string = "Hello \x{263A}!\n";
#arr= split //,$string;
print "#arr";
I am processing a file which contain 10 million of records. So I have these strings in a scalar variable. To do the same as above I am substituting $4_digit_unicode to \x{4_digit_unicode} as below.
$str = 'Buna$002C_TexasBuna$002C_Texas';
$str =~s/\$(.{4})/\\x\{$1\}/g;
$str = "$str"
It gives me
Buna\x{002C}_TexasBuna\x{002C}_Texas
It is because at $str = "$str", line $str is being interpolated, but not its value. So \x{002C} is not being interpolated by Perl.
Is there a way to force Perl so that it will also interpolate the contents of $str too?
OR
Is there another method to achieve this? I do not want to take out each of the Unicodes then pack it using pack "U4",0x002C and then substitute it back. But something in one line (like the below unsuccessful attempt) is OK.
$str =~ s/\$(.{4})/pack("U4",$1)/g;
I know the above is wrong; but can I do something like above?
For the input string $str = 'Buna$002C_TexasBuna$002C_Texas', the desired output is Buna,_TexasBuna,_Texas.
This gives the desired result:
use strict;
use warnings;
use feature 'say';
my $str = 'Buna$002C_TexasBuna$002C_Texas';
$str =~s/\$(.{4})/chr(hex($1))/eg;
say $str;
The main interesting item is the e in s///eg. The e means to treat the replacement text as code to be executed. The hex() converts a string of hexadecimal characters to a number. The chr() converts a number to a character. The replace line might be better written as below to avoid trying to convert a dollar followed by non-hexadecimal characters.
$str =~s/\$([0-9a-f]{4})/chr(hex($1))/egi;
You can execute statements such as pack in the replacement string, you just have to use the e regular expression modifier.
Or you can do this
$str =~s/\$(.{4})/"#{[pack("U4",$1)]}/g;
If those two options don't work please let me know, take a look at this Stackoverflow question for more information.
"\x{263A}" (quotes included) is a string literal, a piece of code that produces a string containing the lone character 263A when it's evaluated by the interpreter (by being part of the script passed to perl to be evaluated).
"\\x\{$1\}" (quotes included), on the other hand, produces a string consisting of \, x, {, the contents of $1, and }.
The latter is the string you are producing. You appear to be attempting to produce Perl code, but it's not valid Perl code -- it's missing the quotes -- and you never have the code interpreted by perl.
$str =~ s/\$(.{4})/\\x\{$1\}/g;
is short for
$str =~ s/\$(.{4})/ "\\x\{$1\}" /eg;
which is completely different than
$str =~ s/\$(.{4})/ "\x{263A}" /eg;
It looks like you were going for the following:
$str =~ s/\$(.{4})/ eval qq{"\\x\{$1\}"} /eg;
But there are much simpler ways of producing the desired string, such as
$str =~ s/\$(.{4})/ pack "U4", $1 /eg;
or better yet,
$str =~ s/\$(.{4})/ chr hex $1 /eg;