How can I substitute only in matched pattern and put it back in same variable using Perl?
For example:
my $str = "a.b.AA pat1 BB hgf AA pat1 BB jkl CC pat1 don't change pat1";
I want to match pat1 between AA and BB and replace it with Original string PAT2. However, I don't want to replace pat1 anywhere else in the same string
Expected output string:
a.b.AA PAT2 BB hgf AA PAT2 BB jkl CC pat1 don't change pat1
I am sure there should be some good way to do it; please advise.
Original string:
my $ORG_str = 'A.B.C.\\valid.A .\\valid.A.B.C .\\valid.X.Y.Z .p.q.r.s';
Expected String:
my $EXP_op = 'A.B.C.\\valid?A .\\valid?A?B?C .\\valid?X?Y?Z .p.q.r.s';
Substitute character . to ? only if it is between backslash \ and whitespace .
Look into look-around regexes.
s/(?<=AA )pat1(?= BB)/pat2/g
This matches and replaces a pat1 surrounded by AA and BB.
Not very simple with one single regexp, so I used divide and conquer to compute the result. This is a small recursive function that is replacing a single '.' per group of ('\' ' ')
The iteration ends when there is nothing to replace
sub replace {
my ($input) = #_;
my $result = $input;
$result =~ s/(\\\S*?)\.(.*? )/$1?$2/g;
return $result if $result eq $input;
return replace($result);
}
The function with some test cases
use strict;
my $ORG_str= 'A.B.C.\\\\valid.A .\\\\valid.A.B.C .\\\\valid.X.Y.Z .p.q.r.s';
my $EXP_op ='A.B.C.\\\\valid?A .\\\\valid?A?B?C .\\\\valid?X?Y?Z .p.q.r.s';
sub replace {
my ($input) = #_;
my $result = $input;
$result =~ s/(\\\S*?)\.(.*? )/$1?$2/g;
return $result if $result eq $input;
return replace($result);
}
my $check;
my $result;
my $expected;
$check = 'abcd'; $expected = $check;
$result = replace($check);
assert($result eq $expected, "'$check' gives '$expected'");
$check = 'ab\xxx. cd'; $expected = 'ab\xxx? cd';
$result = replace($check);
assert($result eq $expected, "'$check' gives '$expected'");
$check = 'ab\x.x.x. cd'; $expected = 'ab\x?x?x? cd';
$result = replace($check);
assert($result eq $expected, "'$check' gives '$expected'");
$check = 'ab\x.x.x. cd\y.y.y.'; $expected = 'ab\x?x?x? cd\y.y.y.';
$result = replace($check);
assert($result eq $expected, "'$check' gives '$expected'");
$check = 'ab\x.x.x. cd\xxx.xxx..xxx...x \y.y.y.'; $expected = 'ab\x?x?x? cd\xxx?xxx??xxx???x \y.y.y.';
$result = replace($check);
assert($result eq $expected, "'$check' gives '$expected'");
$check = '. ..\.. ...\.. ...\.. ...\..'; $expected = '. ..\?? ...\?? ...\?? ...\..';
$result = replace($check);
assert($result eq $expected, "'$check' gives '$expected'");
$check = $ORG_str; $expected = $EXP_op;
$result = replace($check);
assert($result eq $expected, "'$check' gives '$expected'");
sub assert {
my ($cond, $mesg) = #_;
print "checking $mesg ... ";
die "\nFAIL: $mesg" unless $cond;
print "OK\n";
}
The result
checking 'abcd' gives 'abcd' ... OK
checking 'ab\xxx. cd' gives 'ab\xxx? cd' ... OK
checking 'ab\x.x.x. cd' gives 'ab\x?x?x? cd' ... OK
checking 'ab\x.x.x. cd\y.y.y.' gives 'ab\x?x?x? cd\y.y.y.' ... OK
checking 'ab\x.x.x. cd\xxx.xxx..xxx...x \y.y.y.' gives 'ab\x?x?x? cd\xxx?xxx??xxx???x \y.y.y.' ... OK
checking '. ..\.. ...\.. ...\.. ...\..' gives '. ..\?? ...\?? ...\?? ...\..' ... OK
checking 'A.B.C.\\valid.A .\\valid.A.B.C .\\valid.X.Y.Z .p.q.r.s' gives 'A.B.C.\\valid?A .\\valid?A?B?C .\\valid?X?Y?Z .p.q.r.s' ... OK
\\\\[^. ]*\K|(?!^)\G\.([^. ]*)
You can try this.Replace by ?$1.See demo.
https://regex101.com/r/mT0iE7/28
The resultant string will not be exactly same as you want but you can easily do a clean up.
\?(?=\?)
Replace by empty string and you have what you want.See demo.
https://regex101.com/r/mT0iE7/29
Related
if ($search =~ /\W/){ #if search pattern has any special character
$sentence =~ s/\Q$search\E\b/$replace/g; #\Q..\E will consider special characters. \b is for word boundary
}
else {
$sentence =~ s/\b$search\b/$replace/g; #no need \Q..\E if not spl characters
}
}
print $sentence;
the else part is causing a problem as it is replacing +time1 as well as time1. Is there a single regex expression to take care of this situation so that I need not use if else?
my %map = (
"time1" => "...",
"+time1" => "...",
"time11" => "...",
"+time11" => "...",
);
my $pat =
join "|",
map quotemeta,
sort { length($b) <=> length($a) }
keys(%map);
s/($pat)/$map{$1}/g
#!/usr/bin/perl
use Data::Dumper;
$text = 'time1+............time1.............time11.............+time11...............';
print Dumper $text;
$text =~ s/(.*)(time1)([^0-9].*)/$1$3/g;
print Dumper $text;
Output is:
$VAR1 = 'time1+............time1.............time11.............+time11...............';
$VAR1 = 'time1+.........................time11.............+time11...............';
If depends on the expected results.
Alternatively, to replace only time1:
#!/usr/bin/perl
use Data::Dumper;
$text = '............time1............+time1.............time11.............+time11...............';
print Dumper $text;
$text =~ s/(.*)([^\+]time1)([^0-9].*)/$1$3/g;
print Dumper $text;
Outputs:
$VAR1 = '............time1............+time1.............time11.............+time11...............';
$VAR1 = '.......................+time1.............time11.............+time11...............';
The problem:
Find pieces of text in a file enclosed by # and replace the inside
Input:
#abc# abc #ABC#
cba #cba CBA#
Deisred output:
абц abc АБЦ
cba цба ЦБА
I have the following:
#!/usr/bin/perl
use strict;
use warnings;
use Encode;
my $output;
open FILE,"<", 'test.txt';
while (<FILE>) {
chomp(my #chars = split(//, $_));
for (#chars) {
my #char;
$_ =~ s/a/chr(0x430)/eg;
$_ =~ s/b/chr(0x431)/eg;
$_ =~ s/c/chr(0x446)/eg;
$_ =~ s/d/chr(0x434)/eg;
$_ =~ s/e/chr(0x435)/eg;
$_ =~ s/A/chr(0x410)/eg;
$_ =~ s/B/chr(0x411)/eg;
$_ =~ s/C/chr(0x426)/eg;
push #char, $_;
$output = join "", #char;
print encode("utf-8",$output);}
print "\n";
}
close FILE;
But I'm stuck on how to process further
Thanks for help in advance!
Kluther
Here my solution. (you will fixed it, yes. It is prototype)
for (my $data = <DATA>){
$data=~s/[#]([\s\w]+)[#]/func($1)/ge;
print $data;
# while($data=~m/[#]([\s\w]+)[#]/g){
# print "marked: ",$1,"\n";
# print "position:", pos();
# }
# print "not marked: ";
}
sub func{
#do your magic here ;)
return "<< #_ >>";
}
__DATA__
#abc# abc #ABC# cba #cba CBA#
What happens here?
First, I read data. You can do it yourself.
for (my $data = <DATA>){...}
Next, I need to search your pattern and replace it.
What should I do?
Use substition operator: s/pattern/replace/
But in interesting form:
s/pattern/func($1)/ge
Key g mean Global Search
Key e mean Evaluate
So, I think, that you need to write your own func function ;)
Maybe better to use transliteration operator: tr/listOfSymbolsToBeReplaced/listOfSymbolsThatBePlacedInstead/
With minimal changes to your algorithm you need to keep track of whether you are inside the #marks or not. so add something like this
my $bConvert = 0;
chomp(my #chars = split(//, $_));
for (#chars) {
my $char = $_;
if (/#/) {
$bConvert = ($bConvert + 1) % 2;
next;
}
elsif ($bConvert) {
$char =~ s/a/chr(0x430)/eg;
$char =~ s/b/chr(0x431)/eg;
$char =~ s/c/chr(0x446)/eg;
$char =~ s/d/chr(0x434)/eg;
$char =~ s/e/chr(0x435)/eg;
$char =~ s/A/chr(0x410)/eg;
$char =~ s/B/chr(0x411)/eg;
$char =~ s/C/chr(0x426)/eg;
}
print encode("utf-8",$char);
}
Try this after $output is processed.
$output =~ s/\#//g;
my #split_output = split(//, $output);
$output = "";
my $len = scalar(#split_output) ;
while ($len--) {
$output .= shift(#split_output);
}
print $output;
It can be done with a single regex and no splitting of the string:
use strict;
use warnings;
use Encode;
my %chars = (
a => chr(0x430),
b => chr(0x431),
c => chr(0x446),
d => chr(0x434),
e => chr(0x435),
A => chr(0x410),
B => chr(0x411),
C => chr(0x426),
);
my $regex = '(' . join ('|', keys %chars) . ')';
while (<DATA>) {
1 while ($_ =~ s|\#(?!\s)[^#]*?\K$regex(?=[^#]*(?!\s)\#)|$chars{$1}|eg);
print encode("utf-8",$_);
}
It does require repeated runs of the regex due to the overlapping nature of the matches.
I have a string like "ab.cde.fg.hi", and I want to split it into two strings.
"ab.cde.fg"
".hi"
How to do so? I got some code written that will get me the 2nd string but how do I retrieve the remaining?
$mystring = "ab.cde.fg";
$mystring =~ m/.*(\..+)/;
print "$1\n";
my ($first, $second) = $string =~ /(.*)(\..*)/;
You can also use split:
my ($first, $second) = split /(?=\.[^.]+$)/, $string;
Are you sure you aren’t looking for...
($name,$path,$suffix) = File::Basename::fileparse($fullname,#suffixlist);
my #parts = /(.*)\.(.*)/s;
my #parts = split /\.(?!.*\.)/s;
my #parts = split /\.(?=[^.]*\z)/s;
Update: I misread. The "." should be included in the second part, but it's not in the above. The above should be:
my #parts = /(.*)(\..*)/s;
my #parts = split /(?=\.(?!.*\.))/s;
my #parts = split /(?=\.[^.]*\z)/s;
To promote my idea to use rindex to get
1) "ab.cde.fg"
2) ".hi"
from "ab.cde.fg.hi", I wrote this script to make experiments easier:
use strict;
use diagnostics;
use warnings;
use English;
my #tests = (
[ 'ab.cde.fg.hi', 'ab.cde.fg|.hi' ]
, [ 'abxcdexfg.hi', 'abxcdexfg|.hi' ]
);
for my $test (#tests) {
my $src = $test->[0];
my $exp = $test->[1];
printf "-----|%s| ==> |%s|-----\n", $src, $exp;
for my $op (
[ 'ikegami 1' , sub { shift =~ /(.*)\.(.*)/s; } ]
, [ 'ikegami 2' , sub { split( /\.(?!.*\.\z)/s, shift) } ]
, [ 'rindex' , sub { my $p = rindex( $_[0], '.' );
( substr($_[0], 0, $p)
, substr($_[0], $p)
); }
]
) {
my ($head, $tail) = $op->[1]( $src );
my $res = join '|', ($head, $tail);
my $ok = $exp eq $res ? 'ok' : "fail: $exp expected.";
printf "%12s: %-20s => %-20s : %s\n", $op->[0], $src, $res, $ok;
}
}
output:
-----|ab.cde.fg.hi| ==> |ab.cde.fg|.hi|-----
ikegami 1: ab.cde.fg.hi => ab.cde.fg|hi : fail: ab.cde.fg|.hi expected.
ikegami 2: ab.cde.fg.hi => ab|cde : fail: ab.cde.fg|.hi expected.
rindex: ab.cde.fg.hi => ab.cde.fg|.hi : ok
-----|abxcdexfg.hi| ==> |abxcdexfg|.hi|-----
ikegami 1: abxcdexfg.hi => abxcdexfg|hi : fail: abxcdexfg|.hi expected.
ikegami 2: abxcdexfg.hi => abxcdexfg|hi : fail: abxcdexfg|.hi expected.
rindex: abxcdexfg.hi => abxcdexfg|.hi : ok
Following up from an earlier question on extracting the n'th regex match, I now need to substitute the match, if found.
I thought that I could define the extraction subroutine and call it in the substitution with the /e modifier. I was obviously wrong (admittedly, I had an XY problem).
use strict;
use warnings;
sub extract_quoted { # à la codaddict
my ($string, $index) = #_;
while($string =~ /'(.*?)'/g) {
$index--;
return $1 if(! $index);
}
return;
}
my $string = "'How can I','use' 'PERL','to process this' 'line'";
extract_quoted ( $string, 3 );
$string =~ s/&extract_quoted($string,2)/'Perl'/e;
print $string; # Prints 'How can I','use' 'PERL','to process this' 'line'
There are, of course, many other issues with this technique:
What if there are identical matches at different positions?
What if the match isn't found?
In light of this situation, I'm wondering in what ways this could be implemented.
EDIT: leonbloy came up with this solution first. If your tempted to upvote it, upvote leonbloy's first.
Somewhat inspired by leonbloy's (earlier) answer:
$line = "'How can I','use' 'PERL' 'to process this';'line'";
$n = 3;
$replacement = "Perl";
print "Old line: $line\n";
$z = 0;
$line =~ s/'(.*?)'/++$z==$n ? "'$replacement'" : "'$1'"/ge;
print "New line: $line\n";
Old line: 'How can I','use' 'PERL' 'to process this';'line'
New line: 'How can I','use' 'Perl' 'to process this';'line'
Or you can do something as this
use strict;
use warnings;
my $string = "'How can I','use' .... 'perl','to process this' 'line'";
my $cont =0;
sub replacen { # auxiliar function: replaces string if incremented counter equals $index
my ($index,$original,$replacement) = #_;
$cont++;
return $cont == $index ? $replacement: $original;
}
#replace the $index n'th match (1-based counting) from $string by $rep
sub replace_quoted {
my ($string, $index,$replacement) = #_;
$cont = 0; # initialize match counter
$string =~ s/'(.*?)'/replacen($index,$1,$replacement)/eg;
return $string;
}
my $result = replace_quoted ( $string, 3 ,"PERL");
print "RESULT: $result\n";
A little ugly the "global" $cont variable, that could be polished, but you get the idea.
Update: a more compact version:
use strict;
my $string = "'How can I','use' .... 'perl','to process this' 'line'";
#replace the $index n'th match (1-based counting) from $string by $replacement
sub replace_quoted {
my ($string, $index,$replacement) = #_;
my $cont = 0; # initialize match counter
$string =~ s/'(.*?)'/$cont++ == $index ? $replacement : $1/eg;
return $string;
}
my $result = replace_quoted ( $string, 3 ,"PERL");
print "RESULT: $result\n";
If the regex isn't too much more complicated than what you have, you could follow a split with an edit and a join:
$line = "'How can I','use' 'PERL','to process this' 'line'";
$n = 3;
$new_text = "'Perl'";
#f = split /('.*?')/, $line;
# odd fields of #f contain regex matches
# even fields contain the text between matches
$f[2*$n-1] = $new_text;
$new_line = join '', #f;
See perldoc perlvar:
use strict; use warnings;
use Test::More tests => 5;
my %src = (
q{'I want to' 'extract the word' 'PERL','from this string'}
=> q{'I want to' 'extract the word' 'Perl','from this string'},
q{'What about', 'getting','PERL','from','here','?'}
=> q{'What about', 'getting','Perl','from','here','?'},
q{'How can I','use' 'PERL','to process this' 'line'}
=> q{'How can I','use' 'Perl','to process this' 'line'},
q{Invalid} => q{Invalid},
q{'Another invalid string'} => q{'Another invalid string'}
);
while ( my ($src, $target) = each %src ) {
ok($target eq subst_n($src, 3, 'Perl'), $src)
}
sub subst_n {
my ($src, $index, $replacement) = #_;
return $src unless $index > 0;
while ( $src =~ /'.*?'/g ) {
-- $index or return join(q{'},
substr($src, 0, $-[0]),
$replacement,
substr($src, $+[0])
);
}
return $src;
}
Output:
C:\Temp> pw
1..5
ok 1 - 'Another invalid string'
ok 2 - 'How can I','use' 'PERL','to process this' 'line'
ok 3 - Invalid
ok 4 - 'What about', 'getting','PERL','from','here','?'
ok 5 - 'I want to' 'extract the word' 'PERL','from this string'
Of course, you need to decide what happens if an invalid $index is passed or if the required match is not found. I just return the original string in the code above.
Reworking an answer to an earlier question, match n-1 times and then replace the next. Memoizing patterns spares poor Perl having to recompile the same patterns over and over.
my $_quoted = qr/'[^']+'/; # ' fix Stack Overflow highlighting
my %_cache;
sub replace_nth_quoted {
my($string,$index,$replace) = #_;
my $pat = $_cache{$index} ||=
qr/ ^
( # $1
(?:.*?$_quoted.*?) # match quoted substrings...
{#{[$index-1]}} # $index-1 times
)
$_quoted # the ${index}th match
/x;
$string =~ s/$pat/$1$replace/;
$string;
}
For example
my $string = "'How can I','use' 'PERL','to process this' 'line'";
print replace_nth_quoted($string, 3, "'Perl'"), "\n";
outputs
'How can I','use' 'Perl','to process this' 'line'
I have a string such as 'xxox-x' that I want to mask each line in a file against as such:
x's are ignored (or just set to a known value)
o's remain unchanged
the - is a variable length field that will keep everything else unchanged
therefore mask 'xxox-x' against 'deadbeef' would yield 'xxaxbeex'
the same mask 'xxox-x' against 'deadabbabeef' would yield 'xxaxabbabeex'
How can I do this succinctly preferrably using s operator?
$mask =~ s/-/'o' x (length $str - length $mask)/e;
$str =~ s/(.)/substr($mask, pos $str, 1) eq 'o' ? $1 : 'x'/eg;
$ perl -pe 's/^..(.).(.+).$/xx$1x$2x/;'
deadbeef
xxaxbeex
deadabbabeef
xxaxabbabeex
Compile your pattern into a Perl sub:
sub compile {
use feature 'switch';
my($pattern) = #_;
die "illegal pattern" unless $pattern =~ /^[-xo]+$/;
my($search,$replace);
my $i = 0;
for (split //, $pattern) {
given ($_) {
when ("x") {
$search .= "."; $replace .= "x";
}
when ("o") {
$search .= "(?<sub$i>.)";
$replace .= "\$+{sub$i}";
++$i;
}
when ("-") {
$search .= "(?<sub$i>.*)";
$replace .= "\$+{sub$i}";
++$i;
}
}
}
my $code = q{
sub {
local($_) = #_;
s/^SEARCH$/REPLACE/s;
$_;
}
};
$code =~ s/SEARCH/$search/;
$code =~ s/REPLACE/$replace/;
#print $code;
local $#;
my $sub = eval $code;
die $# if $#;
$sub;
}
To be more concise, you could write
sub _patref { '$+{sub' . $_[0]++ . '}' }
sub compile {
my($pattern) = #_;
die "illegal pattern" unless $pattern =~ /^[-xo]+$/;
my %gen = (
'x' => sub { $_[1] .= '.'; $_[2] .= 'x' },
'o' => sub { $_[1] .= "(?<sub$_[0]>.)"; $_[2] .= &_patref },
'-' => sub { $_[1] .= "(?<sub$_[0]>.*)"; $_[2] .= &_patref },
);
my($i,$search,$replace) = (0,"","");
$gen{$1}->($i,$search,$replace)
while $pattern =~ /(.)/g;
eval "sub { local(\$_) = \#_; s/\\A$search\\z/$replace/; \$_ }"
or die $#;
}
Testing it:
use v5.10;
my $replace = compile "xxox-x";
my #tests = (
[ deadbeef => "xxaxbeex" ],
[ deadabbabeef => "xxaxabbabeex" ],
);
for (#tests) {
my($input,$expect) = #$_;
my $got = $replace->($input);
print "$input => $got : ", ($got eq $expect ? "PASS" : "FAIL"), "\n";
}
Output:
deadbeef => xxaxbeex : PASS
deadabbabeef => xxaxabbabeex : PASS
Note that you'll need Perl 5.10.x for given ... when.
x can be translated to . and o to (.) whereas - becomes (.+?):
#!/usr/bin/perl
use strict; use warnings;
my %s = qw( deadbeef xxaxbeex deadabbabeef xxaxabbabeex);
for my $k ( keys %s ) {
(my $x = $k) =~ s/^..(.).(.+?).\z/xx$1x$2x/;
print +($x eq $s{$k} ? 'good' : 'bad'), "\n";
}
heres a quick stab at a regex generator.. maybe somebody can refactor something pretty from it?
#!/usr/bin/perl
use strict;
use Test::Most qw( no_plan );
my $mask = 'xxox-x';
is( mask( $mask, 'deadbeef' ), 'xxaxbeex' );
is( mask( $mask, 'deadabbabeef' ), 'xxaxabbabeex' );
sub mask {
my ($mask, $string) = #_;
my $regex = $mask;
my $capture_index = 1;
my $mask_rules = {
'x' => '.',
'o' => '(.)',
'-' => '(.+)',
};
$regex =~ s/$_/$mask_rules->{$_}/g for keys %$mask_rules;
$mask =~ s/$_/$mask_rules->{$_}/g for keys %$mask_rules;
$mask =~ s/\./x/g;
$mask =~ s/\([^)]+\)/'$' . $capture_index++/eg;
eval " \$string =~ s/^$regex\$/$mask/ ";
$string;
}
Here's a character by character solution using substr rather that split. It should be efficient for long strings since it skips processing the middle part of the string (when there is a dash).
sub apply_mask {
my $mask = shift;
my $string = shift;
my ($head, $tail) = split /-/, $mask;
for( 0 .. length($head) - 1 ) {
my $m = substr $head, $_, 1;
next if $m eq 'o';
die "Bad char $m\n" if $m ne 'x';
substr($string, $_, 1) = 'x';
}
return $string unless defined $tail;
$tail = reverse $tail;
my $last_char = length($string) - 1;
for( 0 .. length($tail) - 1 ) {
my $m = substr $tail, $_, 1;
next if $m eq 'o';
die "Bad char $m\n" if $m ne 'x';
substr($string, $last_char - $_, 1) = 'x';
}
return $string;
}
sub mask {
local $_ = $_[0];
my $mask = $_[1];
$mask =~ s/-/'o' x (length($_)-(length($mask)-1))/e;
s/(.)/substr($mask, pos, 1) eq 'o' && $1/eg;
return $_;
}
Used tidbits from a couple answers ... this is what I ended up with.
EDIT: update from comments