Consider the following string wizard. I want to find if it is in another string in any order and in any case.
I tried the following
while(<>){if($_=~/(?:([wizard])(?!.*\1)){6}/i){print"0"}else{print"1"}}
For the inputs
Garry Kasparov
Bobby Fischer
Vladimir Kramnik
Wayne Drimaz
Lionel Messi
La Signora
It printed 111111 but it must have been 111011.
So, I tried this instead (for the same inputs)
while(<>){if($_=~/(?=[wizard]{6})(?!.*(.).*\1).*/i){print"0"}else{print"1"}}
It again printed 111111.
In input number 4, we can make WaDriaz but only one a is needed. Anyway, we can spell wizard by rearranging and in any case. Why is it not working?
What is wrong with my code?
The following should be quite fast (especially if you inline the subs):
use feature qw( fc say );
sub make_key {
my %counts;
++$counts{$_} for split //, fc($_[0]) =~ s/\PL//rg;
return \%counts;
}
sub search {
my ($substr, $str) = #_;
$str = make_key($str);
no warnings qw( uninitialized );
return !( grep { $str->{$_} < $substr->{$_} } keys(%$substr) );
}
my $substr = make_key("wizard");
while (<>) {
chomp;
say search($substr, $_) ? 0 : 1;
}
Unlike virtually all of the previous solutions, this one doesn't consider latte to be in late.
The following is a regex based solution (with some prep). This should also be quite fast (especially if you inline the subs).
use feature qw( fc say );
sub make_re {
my $pat = join ".*?", map quotemeta, sort split //, fc($_[0]) =~ s/\PL//rg;
return qr/$pat/s;
}
sub search {
my ($substr, $str) = #_;
return ( join "", sort split //, $str ) =~ $substr;
}
my $substr = make_re("wizard"); # qr/a.*?d.*?i.*?r.*?w.*?z/is
while (<>) {
chomp;
say search($substr, $_) ? 0 : 1;
}
Finally, a purely regexp-based solution.
use feature qw( fc say );
sub make_re {
my %counts;
++$counts{$_} for split //, fc($_[0]) =~ s/\PL//rg;
my $pat =
join "",
map { "(?=".( ( ".*?" . quotemeta($_) ) x $counts{$_} ).")" }
#sort
keys(%counts);
return qr/^$pat/s;
}
my $re = make_re("wizard"); # qr/^(?=.*?a)(?=.*?d)(?=.*?i)(?=.*?r)(?=.*?w)(?=.*?z)/is
while (<>) {
say /$re/ ? 0 : 1;
}
Unlike virtually all of the previous solutions, none of mine consider latte to be in late.
Here is a pure regex: do a positive lookahead for each character
use warnings;
use strict;
use feature 'say';
use List::Util qw(uniq); # before v. 1.45 in List::MoreUtils
my $string = shift // q(wizard);
my $patt = join '', map { qq{(?=.*\Q$_\E)} } uniq split //, $string;
# say $patt;
#--> (?=.*w)(?=.*i)(?=.*z)(?=.*a)(?=.*r)(?=.*d) (for wizard)
while (<DATA>) {
say "Found '$string' in: $_" if /^$patt/is;
}
__DATA__
Garry Kasparov
Bobby Fischer
Vladimir Kramnik
Wayne Drimaz
Lionel Messi
La Signora
Being all in one regex, with an anchored lookahead and no overhead, this should be very fast.
The \Q...\E are there in case the search string contains regex-sensitive characters.
This code considers words with repeated characters (latte, rare) to "fit" a word without that (later). It is clarified in comments that this is indeed the wanted behavior: repeated characters need only be found once in the target (letter matches later etc).
I find canonicalization of the inputs and the pattern to be a more generalizable and understandable approach:
#!/usr/bin/env perl
use strict;
use warnings;
sub canonchars {
my %c;
$c{$_} = undef for map lc, grep /\S/, split //, $_[0];
sort keys %c;
}
sub pattern {
map "$_.*", canonchars($_[0]);
}
my %canonical;
while (my $line = <DATA>) {
last unless $line =~ /\S/;
push $canonical{join '', canonchars($line)}->#*, $line;
}
my $pat = qr/#{[join '', pattern('wizard')]}/;
for my $k (keys %canonical) {
if ($k =~ $pat) {
print for $canonical{$k}->#*;
}
}
__DATA__
Garry Kasparov
Bobby Fischer
Vladimir Kramnik
Wayne Drimaz
Lionel Messi
La Signora
Output:
C:\Temp> perl t.pl
Wayne Drimaz
There is a lot of logic you are trying to fit into a regular expression pattern, and every edge case you find and fix will make it more complicated and more fragile.
It's just a simple matter of using a positive lookahead for every character.
my #stars = (
'Garry Kasparov',
'Bobby Fischer',
'Vladimir Kramnik',
'Wayne Drimaz',
'Lionel Messi',
'La Signora'
);
say /^(?=.*w)(?=.*i)(?=.*z)(?=.*a)(?=.*r)(?=.*d)/i ? 0 : 1 for #stars;
This outputs 111011.
No need for regular expressions... they just complicate things, especially if you're not looking for a string you know ahead of time. Just look for each character in turn after normalizing their cases.
#!/usr/bin/env perl
use strict;
use warnings;
sub contains_chars {
my ($needle, $haystack) = #_;
$haystack = lc $haystack;
my %positions;
for my $char (split //, lc $needle) {
my $p = index $haystack, $char, $positions{$char}//0;
return 1 if $p < 0;
$positions{$char} = $p + 1;
}
return 0;
}
while (<DATA>) {
print contains_chars("wIzArD", $_);
}
print "\n";
__DATA__
Garry Kasparov
Bobby Fischer
Vladimir Kramnik
Wayne Drimaz
Lionel Messi
La Signora
This a black art way that out of order matching can occur.
Each character in the source string is visited only once.
No need to recurse the string from the beginning for each letter.
use strict;
use warnings;
while (my $line = <DATA>) {
if ( $line =~ /
(?:
.*?
(?:
(?(1)(?!))(w)
| (?(2)(?!))(i)
| (?(3)(?!))(z)
| (?(4)(?!))(a)
| (?(5)(?!))(r)
| (?(6)(?!))(d)
)
){6}/ix ) { print $line, "\n" }
}
__DATA__
Garry Kasparov
Bobby Fischer
Vladimir Kramnik
Wayne Drimaz
Lionel Messi
La Signora
Wayne Drimaz
Captures can exist in two states, defined or undefined.
The essence of this black art is to use the state of captures as flags
to insure all the items are matched in the out-of-order state.
The above can also be written this way with the same result.
use strict;
use warnings;
while (my $line = <DATA>) {
if ( $line =~ /
(?im)
^
(?>
.*?
(?:
w ( ) # (1)
| i ( ) # (2)
| z ( ) # (3)
| a ( ) # (4)
| r ( ) # (5)
| d ( ) # (6)
)
)+
(?= \1 \2 \3 \4 \5 \6 )
/x ) { print $line, "\n" }
}
__DATA__
Garry Kasparov
Bobby Fischer
Vladimir Kramnik
Wayne Drimaz
Lionel Messi
La Signora
Just adding a simple variant of /w/ && /i/ && /z/ ... solution I mentioned in comments. If you want this solution to match a lot of different strings, instead of lining the regexes together with &&, you can simply loop over the characters. A useful tool is to use the &&= operator to mimic the behaviour of a long string of conditions. This will also allow us to short-circuit the matching if we find a mismatch, giving us a speed benefit.
For example:
/a/ && /b/ && /c/
Equivalent to
my $match = 1;
for my $w (qw(a b c)) {
$match &&= (/$w/); # $match = ($match && /$w/)
}
To remember the count of letters, i.e. whether latte should be considered to be a substring of late, you can simply use the substitution operator s/// instead of match operator m//. I added the multi-letter criteria, and added two test cases to demonstrate.
I like this solution because of the simplicity, but thereby not said it is the best one.
use strict;
use warnings;
my $word = "wizzard";
while (<DATA>) {
print search($_, $word), " $_";
}
sub search {
my ($str, $substr) = #_;
my $match = 1; # assume true
for my $w (split //, $substr) { # for each char in substr...
$match &&= ($str =~ s/\Q$w//i); # ...remove character
return 0 if not $match; # ...return false if no match found
}
return 1 if $match;
}
__DATA__
wizard
wizzard
Garry Kasparov
Bobby Fischer
Vladimir Kramnik
Wayne Drimaz
Wayne Drimazz
Lionel Messi
La Signora
Output:
0 wizard
1 wizzard
0 Garry Kasparov
0 Bobby Fischer
0 Vladimir Kramnik
0 Wayne Drimaz
1 Wayne Drimazz
0 Lionel Messi
0 La Signora
If you do not care about multi-letter matching, just replace s/// with m//.
I have a very large text file whose lines are comma-separated values. Some values are missing. For each line, I'd like to print the index and value of all the non-empty fields.
For example, a line might look like
,,10.3,,,,5.2,3.1,,,,,,,
in which case the output I want is
2,10.3,6,5.2,7,3.1
I know how to accomplish this by first splitting the input into an array, and then going through the array with a for loop, but these are huge files (multi-gigabyte) and I'm wondering if there is a faster way. (e.g. using some advanced regexp)
I haven't benchmarked it yet, but I would assume
my $line = ",,10.3,,,,5.2,3.1,,,,,,,";
my $index = 0;
print join ",",
map {join ",", #$_}
grep $_->[1],
map {[$index++, $_]}
split ",", $line;
is faster than some advanced regexp.
The problem is that as long as you have to know the index, you still have to keep track of those missing entries somehow.
Something like this might not be too slow though:
my ($i, #vars);
while ($line =~ s/^(,*)([^,]+)//) {
push #vars, $i += length($1), $2;
}
print join ",", #vars;
You could probably leave out the first capturing group and use pos() to work out the index.
Here's a comparison of my two suggestions and sin's with 1M iterations:
Rate flesk1 sin flesk2
flesk1 87336/s -- -8% -27%
sin 94518/s 8% -- -21%
flesk2 120337/s 38% 27% --
Seems like my regex works better than I thought.
You might be able to mix and match regex and code -
$line =~ /(?{($cnt,#ary)=(0,)})^(?:([^,]+)(?{push #ary,$cnt; push #ary,$^N})|,(?{$cnt++}))+/x
and print join( ',', #ary);
expanded -
$line =~ /
(?{($cnt,#ary)=(0,)})
^(?:
([^,]+) (?{push #ary,$cnt; push #ary,$^N})
| , (?{$cnt++})
)+
/x
and print join( ',', #ary);
some benchmarks
With a slight tweak of flesk's and sln's (look for fleskNew and slnNew),
the winner is the fleskNew when the substitution operator is removed.
code -
use Benchmark qw( cmpthese ) ;
$samp = "x,,10.3,,q,,5.2,3.1,,,ghy,g,,l,p";
$line = $samp;
cmpthese( -5, {
flesk1 => sub{
$index = 0;
join ",",
map {join ",", #$_}
grep $_->[1],
map {[$index++, $_]}
split ",", $line;
},
flesk2 => sub{
($i, #vars) = (0,);
while ($line =~ s/^(,*)([^,]+)//) {
push #vars, $i += length($1), $2;
}
$line = $samp;
},
fleskNew => sub{
($i, #vars) = (0,);
while ($line =~ /(,*)([^,]+)/g) {
push #vars, $i += length($1), $2;
}
},
sln1 => sub{
$line =~ /
(?{($cnt,#ary)=(0,)})
^(?:
([^,]+) (?{push #ary,$cnt; push #ary,$^N})
| , (?{$cnt++})
)+
/x
},
slnNew => sub{
$line =~ /
(?{($cnt,#ary)=(0,)})
(?:
(,*) (?{$cnt += length($^N)})
([^,]+) (?{push #ary, $cnt,$^N})
)+
/x
},
} );
numbers -
Rate flesk1 sln1 flesk2 slnNew fleskNew
flesk1 20325/s -- -51% -52% -56% -60%
sln1 41312/s 103% -- -1% -10% -19%
flesk2 41916/s 106% 1% -- -9% -17%
slnNew 45978/s 126% 11% 10% -- -9%
fleskNew 50792/s 150% 23% 21% 10% --
some benchmarks 2
Adds Birei's in-line replacment and trim (all-in-one) solution.
Abberations:
Flesk1 is modified to remove the final 'join' as it is not included in
the other regex solutions. This gives it a chance to bench better.
Birei deviates in the bench as it modifies the original string to be the final solution.
That aspect can't be taken out. The difference between Birei1 and BireiNew is that the
new one removes the final ','.
Flesk2, Birei1 and BireiNew have the additional overhead of restoring the original string
due to the substitution operator.
The winner still looks like FleskNew ..
code -
use Benchmark qw( cmpthese ) ;
$samp = "x,,10.3,,q,,5.2,3.1,,,ghy,g,,l,p";
$line = $samp;
cmpthese( -5, {
flesk1a => sub{
$index = 0;
map {join ",", #$_}
grep $_->[1],
map {[$index++, $_]}
split ",", $line;
},
flesk2 => sub{
($i, #vars) = (0,);
while ($line =~ s/^(,*)([^,]+)//) {
push #vars, $i += length($1), $2;
}
$line = $samp;
},
fleskNew => sub{
($i, #vars) = (0,);
while ($line =~ /(,*)([^,]+)/g) {
push #vars, $i += length($1), $2;
}
},
sln1 => sub{
$line =~ /
(?{($cnt,#ary)=(0,)})
^(?:
([^,]+) (?{push #ary,$cnt; push #ary,$^N})
| , (?{$cnt++})
)+
/x
},
slnNew => sub{
$line =~ /
(?{($cnt,#ary)=(0,)})
(?:
(,*) (?{$cnt += length($^N)})
([^,]+) (?{push #ary, $cnt,$^N})
)+
/x
},
Birei1 => sub{
$i = -1;
$line =~
s/
(?(?=,+)
( (?: , (?{ ++$i }) )+ )
| (?<no_comma> [^,]+ ,? ) (?{ ++$i })
)
/
defined $+{no_comma} ? $i . qq[,] . $+{no_comma} : qq[]
/xge;
$line = $samp;
},
BireiNew => sub{
$i = 0;
$line =~
s/
(?: , (?{++$i}) )*
(?<data> [^,]* )
(?: ,*$ )?
(?= (?<trailing_comma> ,?) )
/
length $+{data} ? "$i,$+{data}$+{trailing_comma}" : ""
/xeg;
$line = $samp;
},
} );
results -
Rate BireiNew Birei1 flesk1a flesk2 sln1 slnNew fleskNew
BireiNew 6030/s -- -18% -74% -85% -86% -87% -88%
Birei1 7389/s 23% -- -68% -82% -82% -84% -85%
flesk1a 22931/s 280% 210% -- -44% -45% -51% -54%
flesk2 40933/s 579% 454% 79% -- -2% -13% -17%
sln1 41752/s 592% 465% 82% 2% -- -11% -16%
slnNew 47088/s 681% 537% 105% 15% 13% -- -5%
fleskNew 49563/s 722% 571% 116% 21% 19% 5% --
Using a regex (although I'm sure it can be simpler):
s/(?(?=,+)((?:,(?{ ++$i }))+)|(?<no_comma>[^,]+,?)(?{ ++$i }))/defined $+{no_comma} ? $i . qq[,] . $+{no_comma} : qq[]/ge;
Explanation:
s/PATTERN/REPLACEMENT/ge # g -> Apply to all occurrences
# e -> Evaluate replacement as a expression.
(?
(?=,+) # Check for one or more commas.
((?:,(?{ ++$i }))+) # If (?=,+) was true, increment variable '$i' with each comma found.
|
(?<no_comma>[^,]+,?)(?{ ++$i }) # If (?=,+) was false, get number between comma and increment the $i variable only once.
)
defined $+{no_comma} # If 'no_comma' was set in 'pattern' expression...
$i . qq[,] . $+{no_comma} # insert the position just before it.
qq[] # If wasn't set, it means that pattern matched only commas, so remove then.
My test:
Content of script.pl:
use warnings;
use strict;
while ( <DATA> ) {
our $i = -1;
chomp;
printf qq[Orig = $_\n];
s/(?(?=,+)((?:,(?{ ++$i }))+)|(?<no_comma>[^,]+,?)(?{ ++$i }))/defined $+{no_comma} ? $i . qq[,] . $+{no_comma} : qq[]/ge;
# s/,\Z//;
printf qq[Mod = $_\n\n];
}
__DATA__
,,10.3,,,,5.2,3.1,,,,,,,
10.3,,,,5.2,3.1,,,,,,,
,10.3,,,,5.2,3.1
,,10.3,5.2,3.1,
Run the script like:
perl script.pl
And output:
Orig = ,,10.3,,,,5.2,3.1,,,,,,,
Mod = 2,10.3,6,5.2,7,3.1,
Orig = 10.3,,,,5.2,3.1,,,,,,,
Mod = 0,10.3,4,5.2,5,3.1,
Orig = ,10.3,,,,5.2,3.1
Mod = 1,10.3,5,5.2,6,3.1
Orig = ,,10.3,5.2,3.1,
Mod = 2,10.3,3,5.2,4,3.1,
As you can see, it keeps last comma. I don't know how to remove it without an extra regex, just uncomment s/,\Z//; in previous code.
I need some Perl regular expression help. The following snippet of code:
use strict;
use warnings;
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L";
my $word = "plus";
my #results = ();
1 while $str =~ s/(.{2}\b$word\b.{2})/push(#results,"$1\n")/e;
print #results;
Produces the following output:
A plus B
D plus E
2 plus F
H plus I
4 plus J
5 plus K
What I want to see is this, where a character already matched can appear in a new match in a different context:
A plus B
D plus E
E plus F
H plus I
I plus J
J plus K
How do I change the regular expression to get this result? Thanks --- Dan
General advice: Don't use s/// when you want m//. Be specific in what you match.
The answer is pos:
#!/usr/bin/perl -l
use strict;
use warnings;
my $str = 'In this example, ' . 'A plus B equals C, ' .
'D plus E plus F equals G ' .
'and H plus I plus J plus K equals L';
my $word = "plus";
my #results;
while ( $str =~ /([A-Z] $word [A-Z])/g ) {
push #results, $1;
pos($str) -= 1;
}
print "'$_'" for #results;
Output:
C:\Temp> b
'A plus B'
'D plus E'
'E plus F'
'H plus I'
'I plus J'
'J plus K'
You can use a m//g instead of s/// and assign to the pos function to rewind the match location before the second term:
use strict;
use warnings;
my $str = 'In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L';
my $word = 'plus';
my #results;
while ($str =~ /(.{2}\b$word\b(.{2}))/g) {
push #results, "$1\n";
pos $str -= length $2;
}
print #results;
Another option is to use a lookahead:
use strict;
use warnings;
my $str = "In this example, A plus B equals C, D plus E "
. "plus F equals G and H plus I plus J plus K equals L";
my $word = "plus";
my $chars = 2;
my #results = ();
push #results, $1
while $str =~ /(?=((.{0,$chars}?\b$word\b).{0,$chars}))\2/g;
print "'$_'\n" for #results;
Within the lookahead, capturing group 1 matches the word along with a variable number of leading and trailing context characters, up to whatever maximum you've set. When the lookahead finishes, the backreference \2 matches "for real" whatever was captured by group 2, which is the same as group 1 except that it stops at the end of the word. That sets pos where you want it, without requiring you to calculate how many characters you actually matched after the word.
Given the "Full Disclosure" comment (but assuming .{0,35}, not .{35}), I'd do
use List::Util qw/max min/;
my $context = 35;
while ( $str =~ /\b$word\b/g ) {
my $pre = substr( $str, max(0, $-[0] - $context), min( $-[0], $context ) );
my $post = substr( $str, $+[0], $context );
my $match = substr( $str, $-[0], $+[0] - $-[0] );
$pre =~ s/.*\n//s;
$post =~ s/\n.*//s;
push #results, "$pre$match$post";
}
print for #results;
You'd skip the substitutions if you really meant (?s:.{0,35}).
Here's one way to do it:
use strict;
use warnings;
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L";
my $word = "plus";
my #results = ();
my $i = 0;
while (substr($str, $i) =~ /(.{2}\b$word\b.{2})/) {
push #results, "$1\n";
$i += $-[0] + 1;
}
print #results;
It's not terribly Perl-ish, but it works and it doesn't use too many obscure regular expression tricks. However, you might have to look up the function of the special variable #- in perlvar.
don't have to use regex. basically, just split up the string, use a loop to go over each items, check for "plus" , then get the word from before and after.
my $str = "In this example, A plus B equals C, D plus E plus F equals G and H plus I plus J plus K equals L";
#s = split /\s+/,$str;
for($i=0;$i<=scalar #s;$i++){
if ( "$s[$i]" eq "plus" ){
print "$s[$i-1] plus $s[$i+1]\n";
}
}
I have the following use case, input present in the file as:
Line1 : AA BB CC DD EE
I want to replace this with
1 2 3 4 5
Output
Line1: 1 2 3 4 5
In one regular expression in Perl, can I do this
I was trying this but was unsucessful
my #arr1 = ("AA", "BB", "CC", "DD", "EE");
open F2, $file;
my $count = 0;
while (<F2>) {
my $str = $_;
$str =~ s/$arr[$count]/$count+1/g;
print to file
}
close(F2);
This doesn't do the trick any ideas
If I understand correctly, you want to replace every word with number (incremented by 1 after every word). Here is program with tests:
#!/usr/bin/perl
use strict;
use warnings;
use Test::More qw(no_plan);
sub replace {
my $str=shift;
my $count=1;
$str=~s/\w+/$count++/ge;
return $str;
}
is(replace('AA AA DD EE'),'1 2 3 4');
is(replace('A B C D E'),'1 2 3 4 5');
You need to do something to modify the file in place, which you are not currently doing. The easiest option would be to use File::Inplace (or to output to a second file).
Additionally you are not looping over the array, but over the lines on the file so it'll replace only $arr[0] for 1 on each line.
use strict;
use warnings;
use File::Inplace;
my #replacees = ("AA", "BB", "CC", "DD", "EE");
my $editor = new File::Inplace(file => "file.txt", regex => "\n");
while (my ($line) = $editor->next_line) {
my $count = 1
for my $replacee (#replacees) {
if ($line =~ m/$replacee/) {
$line =~ s/$replacee/$count/g;
}
$count = $count + 1;
}
$editor->replace_line($line);
}
$editor->commit;
As for writing to the same file, please note Vinko answer. As for replacing strings, please check this snippet:
my #arr1 = ("AA", "BB", "CC", "DD", "EE");
my %replacements = map { ($arr1[$_] => $_ + 1) } (0..$#arr1);
my $regexp = join( '|', sort { length($b) <=> length($a) } #arr1);
open F2, $file;
while (<F2>) {
my $str = $_;
$str =~ s/($regexp)/$replacements{$1}/ge;
print $str;
}
close(F2);
Important parts:
my %replacements = map { ($arr1[$_] => $_ + 1) } (0..$#arr1);
It builds hash with keys from #arr1, and values are the index of given value in #arr1 incremented by 1.
For example, for #arr1 = ("a", "b", "d", "c"); %replacements will be: ("a" => 1, "b", => 2, "c" => 4, "d" => 3);
my $regexp = join( '|', sort { length($b) <=> length($a) } #arr1);
This builds base regexp for finding all words from #arr1. The sort part orders words by their length descending. So, for #arr1 = ("a", "ba", "bac") $regexp will be 'bac|ba|a'.
This ordering is important as otherwise there would be problems if any of the words would be prefix of any other word (as with "ba" and "bac" in my example).
As a last word, usage of filehandles as FH is rather discouraged, as these are globals, and generate "interesting" problems in more complex programs. Instead use open like this:
open my $fh, 'filename';
or better yet:
open my $fh, '<', 'filename';
First, a correction:
while (<F2>) {
my $str = $_;
If you want the line read to end up in $str, there is no reason to involve $_ in the process:
while ( my $str = ) {
which also brings up the point made by depesz that you should use lexical filehandles rather than package global bareword filehandles.
Now, looking at your loop:
my $count = 0;
while (my $str = <$input_fh>) {
$str =~ s/$arr[$count]/$count+1/g;
# ...
}
there seems to be an implicit assumption that there cannot be more lines in the file than the number of elements in #foo. In which case, you need not use $count: $. would do just fine. Say you are on the second line. Your code says you want to replace all occurrences of BB on that line with 2 which is different than what you describe verbally.
This is an important point: Any code you post ought to be consistent with the verbal description.
Anyway, here is one way:
rty.pl
#!/usr/bin/perl
use strict;
use warnings;
use File::Slurp;
my ($input) = #ARGV;
write_file(
$input, [
map { s/( ([A-Z]) \2 )/ord($2) - ord('A') + 1/gex; $_ } read_file $input
]
);
__END__
test.data:
Line1 : AA BB CC DD EE
Line1 : AA BB CC DD EE
Line1 : AA BB CC DD EE
Line1 : AA BB CC DD EE
$ rty.pl test.data
test.data after script invocation:
Line1 : 1 2 3 4 5
Line1 : 1 2 3 4 5
Line1 : 1 2 3 4 5
Line1 : 1 2 3 4 5
Either way it will work
my %arr2 = ('AA'=>1, 'BB'=>2,'CC'=>3,'DD'=>4,'EE'=>5,'FF'=>6);
open F2, "t1.txt";
open F3, ">out.txt";
while () {
my $str = $;
print F3 join ' ' ,map {s/$/$arr2{$}/g; $} split / /,$str;
print F3 "\n";
}
close(F2);
close(F3);
or
my #arr1 = ("AA", "BB", "CC", "DD", "EE","FF");
my %hashArr = map { ($arr1[$] => $ + 1) } (0..$#arr1);
open F2, "t1.txt";
open F3, ">out.txt";
while () {
my $str = $;
print F3 join ' ' ,map {s/$/$hashArr{$}/g; $} split / /,$str;
print F3 "\n";
}
close(F2);
close(F3);