Unable to replace a string via regex through a subroutine - regex

I am trying to replace square brackets in a string with an empty string. which means if a string is [SECTION], I want to convert it to SECTION.
I have tried this and it works,
my $sectionName =~ tr/[]//d;
print "$sectionName\n";
However, when I tried to create a general subroutine for replacing strings, it didn't work out. Here's what I tried,
sub strReplace
{
my $string = shift;
my $target = shift;
my $replacement = shift;
$target = quotemeta $target;
$replacement = quotemeta $replacement;
$string =~ tr/$target/$replacement/d;
return $string;
}
I am calling the sub like this,
# the string "[SECTION]" below is intended to be replaced by a variable
my $sectionName = strReplace("[SECTION]", "[]", "");
print "$sectionName\n";
However, instead of getting the replaced string, I am still getting the old one, i.e, [SECTION]. What am I doing wrong? (PS: Perl version 5.14.2)

Perl's tr/// operator does not support variables. You can find various strategies to work around this here: Perl's tr/// is not doing what I want
To summarize, you have two main options:
Wrap your tr/// in an eval.
Convert your tr/// into a substitution using s///.

If your main case for strReplace is actually just to remove characters, I'd write a less-general-purpose sub that does that. Otherwise, a s/// conversion that can both remove and replace looks like this:
sub strReplace
{
my $string = shift;
my $target = shift;
my $replacement = shift;
my %replacement;
#replacement{ split //, $target } = split //, $replacement;
$string =~ s{ ([\Q$target\E]) }{ $replacement{$1} // '' }gxe;
return $string;
}
The substitution repeatedly (because of the /g flag) looks for [\Q$target\E] (a character in a class of any the characters in $target, any special characters automatically escaped if necessary by \Q...\E), and replaces it with the value found by looking in the hash, or just removes it if it wasn't found in the hash.

Related

how can I take a dynamic variable in value of JSON?

I need use the find and replace through regular expression like following
use strict;
no strict 'refs';
use warnings;
use JSON;
use Encode qw( encode decode encode_utf8 decode_utf8);
my $data =
{
"find_replace" => [
{ "find" => "(.+?)&",
"replace"=> "$1"
}
]
};
my $find_replace_arr = $data->{'find_replace'};
my $string = "http://www.website.com/test.html&code=236523";
my $find = $find_replace_arr->[0]->{find};
my $replace = $find_replace_arr->[0]->{replace};
$string =~ s/$find/$replace/isge;
print $string;
exit();
in this code, I only want to "http://www.website.com/test.html" from the string.
I am not able to get replace (key)'s value dynamically, which is $1.
You can Run the above code.
This code throw the Error Use of uninitialized value $1 in string
Some things to consider. First, the regex ([^&]+) may not give the desired result, as it is really going to capture and replace with the same capture.. resulting in the same output string (confusing I bet).
Next, the replace string "$1"has to be quoted again and e modifier has to be doubled.
So try this:
my $data =
{
"find_replace" => [
{ "find" => "^(.+?)&.*",
"replace"=> '"$1"'
}
]
};
my $find_replace_arr = $data->{'find_replace'};
my $string = "http://www.website.com/test.html&code=236523";
my $find = $find_replace_arr->[0]->{find};
my $replace = $find_replace_arr->[0]->{replace};
$string =~ s/$find/$replace/isgee;
print $string;
exit();
Notice the new regex, ^(.+?)&.* will match the entire string, but the capture (...) will be the result to replace.

Matching a variable in a string in Perl from the end

I want to match a variable character in a given string, but from the end.
Ideas on how to do this action?
for example:
sub removeCharFromEnd {
my $string = shift;
my $char = shift;
if($string =~ m/$char/){ // I want to match the char, searching from the end, $doesn't work
print "success";
}
}
Thank you for your assistance.
There is no regex modifier that would force Perl regex engine to parse the string from right to left. Thus, the most convenient way to achieve that is via a negative lookahead:
m/$char(?!.*$char)/
The (?!.*$char) negative lookahead will require the absence (=will fail the match if found) of a $char after any 0+ chars other than linebreak chars (use s modifier if you are running the regex against a multiline string input).
The regex engine works from left to right.
You can use the natural greediness of quantifiers to reach the end of the string and find the last char with the backtracking mechanism:
if($string =~ m/.*\K$char/s) { ...
\K marks the position of the match result beginning.
Other ways:
you can also reverse the string and use your previous pattern.
you can search all occurrences and take the last item in the list
I'm having trouble understanding what you want. Your subroutine is called removeCharFromEnd, so perhaps you want to remove $char from $string if it appears at the end of the string
You can do that like this
sub removeCharFromEnd {
my ( $string, $char ) = #_;
if ( $string =~ s/$char\z// ) {
print "success";
}
$string;
}
Or perhaps you want to remove the last occurrence of $char wherever it is. You can do that with
s/.*\K$char//
The subroutine I have written returns the modified string, so you would have to assign the result to a variable to save it. You can write
my $s = 'abc';
$s = removeCharFromEnd($s, 'c');
say $s;
output
ab
If you just want to modify the string in place then you should write
$ARGV[0] =~ s/$char\z//
using whichever substitution you choose. Then you can do this
my $s = 'abc';
removeCharFromEnd($s, 'c');
say $s;
This produces the same output
To get Perl to search from the end of a string, reverse the string.
sub removeCharFromEnd {
my $string = reverse shift #_;
my $char = quotemeta reverse shift #_;
$string =~ s/$char//;
$string = reverse $string;
return $string;
}
print removeCharFromEnd(qw( abcabc b )), "\n";
print removeCharFromEnd(qw( abcdefabcdef c )), "\n";
print removeCharFromEnd(qw( !"/$%?&*!"/$%?&* $ )), "\n";

Regex and the characters case

Okay, I got a rather simple one (at least seems simple). I have a multi lined string and I am just playing around with replacing different words with something else. Let me show you...
#!/usr/bin/perl -w
use strict;
$_ = "That is my coat.\nCoats are very expensive.";
s/coat/Hat/igm;
print;
The output would be
That is my Hat
Hats are very expensive...
The "hat" on the first line shouldn't be capitalized. Are there any tricks that can make the casing compliant with how english is written? Thanks :)
see how-to-replace-string-and-preserve-its-uppercase-lowercase
For more detail go to How do I substitute case insensitively on the LHS while preserving case on the RHS?
You can use the e modifier to s/// to do the trick:
s/(coat)/ucfirst($1) eq $1 ? 'Hat' : 'hat'/igme;
For one, you should use \b (word boundary) to match only the whole word. For example s/hat/coat/ would change That to Tcoat without leading \b. Now for your question. With the flag /e you can use Perl code in the replacement part of the regex. So you can write a Perl function that checks the case of the match and then set the case of the replacement properly:
my $s = "That is my coat.\nCoats are very expensive.";
$s =~ s/(\bcoat)/&same_case($1, "hat")/igme;
print $s, "\n";
sub same_case {
my ($match, $replacement) = #_;
# if match starts with uppercase character, apply ucfirst to replacement
if($match =~ /^[A-Z]/) {
return ucfirst($replacement);
}
else {
return $replacement;
}
}
Prints:
That is my hat.
Hats are very expensive.
This may solve your problem:
#!/usr/bin/perl -w
use strict;
sub smartSubstitute {
my $target = shift;
my $pattern = shift;
my $replacement = shift;
$pattern = ucfirst $pattern;
$replacement = ucfirst $replacement;
$target =~ s/$pattern/$replacement/gm;
$pattern = lcfirst $pattern;
$replacement = lcfirst $replacement;
$target =~ s/$pattern/$replacement/gm;
return $target;
}
my $x = "That is my coat.\nCoats are very expansive.";
my $y = smartSubstitute($x, "coat", "Hat");
print $y, "\n";

How can I escape meta-characters when I interpolate a variable in Perl's match operator?

Suppose I have a file containing lines I'm trying to match against:
foo
quux
bar
In my code, I have another array:
foo
baz
quux
Let's say we iterate through the file, calling each element $word, and the internal list we are checking against, #arr.
if( grep {$_ =~ m/^$word$/i} #arr)
This works correctly, but in the somewhat possible case where we have an test case of fo. in the file, the . operates as a wildcard operator in the regex, and fo. then matches foo, which is not acceptable.
This is of course because Perl is interpolating the variable into a regex.
The question:
How do I force Perl to use the variable literally?
Use \Q...\E to escape special symbols directly in perl string after variable value interpolation:
if( grep {$_ =~ m/^\Q$word\E$/i} #arr)
From perlfaq6's answer to How do I match a regular expression that's in a variable?:
We don't have to hard-code patterns into the match operator (or anything else that works with regular expressions). We can put the pattern in a variable for later use.
The match operator is a double quote context, so you can interpolate your variable just like a double quoted string. In this case, you read the regular expression as user input and store it in $regex. Once you have the pattern in $regex, you use that variable in the match operator.
chomp( my $regex = <STDIN> );
if( $string =~ m/$regex/ ) { ... }
Any regular expression special characters in $regex are still special, and the pattern still has to be valid or Perl will complain. For instance, in this pattern there is an unpaired parenthesis.
my $regex = "Unmatched ( paren";
"Two parens to bind them all" =~ m/$regex/;
When Perl compiles the regular expression, it treats the parenthesis as the start of a memory match. When it doesn't find the closing parenthesis, it complains:
Unmatched ( in regex; marked by <-- HERE in m/Unmatched ( <-- HERE paren/ at script line 3.
You can get around this in several ways depending on our situation. First, if you don't want any of the characters in the string to be special, you can escape them with quotemeta before you use the string.
chomp( my $regex = <STDIN> );
$regex = quotemeta( $regex );
if( $string =~ m/$regex/ ) { ... }
You can also do this directly in the match operator using the \Q and \E sequences. The \Q tells Perl where to start escaping special characters, and the \E tells it where to stop (see perlop for more details).
chomp( my $regex = <STDIN> );
if( $string =~ m/\Q$regex\E/ ) { ... }
Alternately, you can use qr//, the regular expression quote operator (see perlop for more details). It quotes and perhaps compiles the pattern, and you can apply regular expression flags to the pattern.
chomp( my $input = <STDIN> );
my $regex = qr/$input/is;
$string =~ m/$regex/ # same as m/$input/is;
You might also want to trap any errors by wrapping an eval block around the whole thing.
chomp( my $input = <STDIN> );
eval {
if( $string =~ m/\Q$input\E/ ) { ... }
};
warn $# if $#;
Or...
my $regex = eval { qr/$input/is };
if( defined $regex ) {
$string =~ m/$regex/;
}
else {
warn $#;
}
The correct answer is - don't use regexps. I'm not saying regexps are bad, but using them for (what equals to) simple equality check is overkill.
Use: grep { lc($_) eq lc($word) } #arr and be happy.
Quotemeta
Returns the value of EXPR with all non-"word" characters backslashed.
http://perldoc.perl.org/functions/quotemeta.html
I don't think you want a regex in this case since you aren't matching a pattern. You're looking for a literal sequence of characters that you already know. Build a hash with the values to match and use that to filter #arr:
open my $fh, '<', $filename or die "...";
my %hash = map { chomp; lc($_), 1 } <$fh>;
foreach my $item ( #arr )
{
next unless exists $hash{ lc($item) };
print "I matched [$item]\n";
}

Using Perl, how can I build a dynamic regexp by passing in an argument to a subroutine?

I would like to create subroutine with a dynamically created regxp. Here is what I have so far:
#!/usr/bin/perl
use strict;
my $var = 1234567890;
foreach (1 .. 9){
&theSub($_);
}
sub theSub {
my $int = #_;
my $var2 = $var =~ m/(??{$int})/;
print "$var2\n";
}
It looks like it will work, but it seems that once the $int in the regex gets evaluated for the first time, it's there forever.
Is there anyway to do something similar to this, but have the regex pick up the new argument each time the sub is called?
The easiest way to fix your code is to add parentheses around my, and remove ??{. Here is the fixed program:
#!/usr/bin/perl
use strict;
my $var = 1234567890;
foreach (1 .. 9){
theSub($_);
}
sub theSub {
my($int) = #_;
my($var2) = $var =~ m/($int)/;
print "$var2\n";
}
One of the problematic lines in your code was my $int = #_, which was equivalent to my $int = 1, because it evaluated #_ in scalar context, yielding the number of elements in #_. To get the first argument of your sub, use my($int) = #_;, which evaluates #_ in list context, or fetch the first element using my $int = $_[0];, or fetch+remove the first element using my $int = shift;
There was a similar problem in the my $var2 = line, you need the parentheses there as well to evaluate the regexp match in list context, yielding the list of ($1, $2, ...), and assigning $var2 = $1.
The construct (??{...}) you were trying to use had the opposite effect to what you wanted: (among doing other things) it compiled your regexp the first time it was used for matching. For regexps containing $ or #, but not containing ??{...}, Perl recompiles the regexp automatically for each match, unless you specify the o flag (e.g. m/$int/o).
The construct (??{...}) means: use Perl code ... to generate a regexp, and insert that regexp here. To get more information, search for ??{ on http://perldoc.perl.org/perlre.html . The reason why it didn't work in your example is that you would have needed an extra layer of parentheses to capture $1, but even with my ($var2) = $var =~ m/((??{$int}))/ it wouldn't have worked, because ??{ has an undocumented property: it forces the compilation of its argument the first time the regexp is used for matching, so my ($var2) = $var =~ m/((??{$int + 5}))/ would have always matched 6.
my $int = #_;
This will give you the count of parameters, always '1' in your case.
I think you want
my $int = shift;
To dynamically pass a regexp to a function, rather than dynamically build it in the function, use qr//.
#!/usr/bin/perl
use strict;
my $var = 1234567890;
foreach (1 .. 9){
&theSub(qr/$int/);
}
sub theSub {
my($regexp) = #_;
my($var2) = ($var =~ $regexp);
print "$var2\n";
}
qr// accepts the same trailing arguments that m// does: i, m, s, and x
my $int is the scalar context, he has ($int) for the list context and that puts $_[0] into $int. In the following only 10 is put into $int and the rest 11 to 99 are lost.
my ($int)=(10..99);
print $int;
10