Working from an example found else where on stackoverflow.com I am attempting to replace on the Nth instance of a regex match on a string in Perl. My code is as follows:
#!/usr/bin/env perl
use strict;
use warnings;
my $num_args = $#ARGV +1;
if($num_args != 3) {
print "\nUsage: replace_integer.pl occurance replacement to_replace";
print "\nE.g. `./replace_integer.pl 1 \"INTEGER_PLACEHOLDER\" \"method(0 , 1, 6);\"`";
print "\nWould output: \"method(INTEGER_PLACEMENT , 1, 6);\"\n";
exit;
}
my $string =$ARGV[2];
my $cont =0;
sub replacen {
my ($index,$original,$replacement) = #_;
$cont++;
return $cont == $index ? $replacement: $original;
}
sub replace_quoted {
my ($string, $index,$replacement) = #_;
$cont = 0; # initialize match counter
$string =~ s/[0-9]+/replacen($index,$1,$replacement)/eg;
return $string;
}
my $result = replace_quoted ( $string, $ARGV[0] ,$ARGV[1]);
print "RESULT: $result\n";
For
./replace_integer.pl 3 "INTEGER_PLACEHOLDER" "method(0, 1 ,6);"
I'd expect an output of
RESULT: method(0, 1 ,INTEGER_PLACEHOLDER);
Unfortunately I get an output of
RESULT: method(, ,INTEGER_PLACEHOLDER);
With these warnings/errors
Use of uninitialized value in substitution iterator at ./replace_integer.pl line 26.
Use of uninitialized value in substitution iterator at ./replace_integer.pl line 26.
Line 26 is the following line:
$string =~ s/[0-9]+/replacen($index,$1,$replacement)/eg;
I have determined this is due to $1 being uninitialised. To my understanding $1 should have the value of the last match. Given my very simple regex ([0-9]+) I see no reason why it should be uninitialised.
I am aware there are easier ways to find and replace the Nth instance in sed but I will require Perl's back and forward references once this hurdle is overcome (not supported by sed)
Does anyone know the cause of this error and how to fix it?
I am using Perl v5.18.2 , built for x86_64-linux-gnu-thread-multi
Thank you for your time.
$1 is only set after you capture a pattern, for example:
$foo =~ /([0-9]+)/;
# $1 equals whatever was matched between the parens above
Try wrapping your matching in parens to capture it to $1
I would write it like this
The while loop iterates over occurrences of the \d+ pattern in the string. When the Nth occurrence is found the last match is replaced using a call to substr using the values in built-in arrays #- (the start of the last match) and #+ (the end of the last match)
#!/usr/bin/env perl
use strict;
use warnings;
#ARGV = ( 3, 'INTEGER_PLACEHOLDER', 'method(0, 1, 6);' );
if ( #ARGV != 3 ) {
print qq{\nUsage: replace_integer.pl occurrence replacement to_replace};
print qq{\nE.g. `./replace_integer.pl 1 "INTEGER_PLACEHOLDER" "method(0 , 1, 6);"`};
print qq{\nWould output: "method(INTEGER_PLACEMENT , 1, 6);"\n};
exit;
}
my ( $occurrence, $replacement, $string ) = #ARGV;
my $n;
while ( $string =~ /\d+/g ) {
next unless ++$n == $occurrence;
substr $string, $-[0], $+[0]-$-[0], $replacement;
last;
}
print "RESULT: $string\n";
output
$ replace_integer.pl 3 INTEGER_PLACEHOLDER 'method(0, 1, 6);'
RESULT: method(0, 1, INTEGER_PLACEHOLDER);
$ replace_integer.pl 2 INTEGER_PLACEHOLDER 'method(0, 1, 6);'
RESULT: method(0, INTEGER_PLACEHOLDER, 6);
$ replace_integer.pl 1 INTEGER_PLACEHOLDER 'method(0, 1, 6);'
RESULT: method(INTEGER_PLACEHOLDER, 1, 6);
Related
I want to match a variable character in a given string, but from the end.
Ideas on how to do this action?
for example:
sub removeCharFromEnd {
my $string = shift;
my $char = shift;
if($string =~ m/$char/){ // I want to match the char, searching from the end, $doesn't work
print "success";
}
}
Thank you for your assistance.
There is no regex modifier that would force Perl regex engine to parse the string from right to left. Thus, the most convenient way to achieve that is via a negative lookahead:
m/$char(?!.*$char)/
The (?!.*$char) negative lookahead will require the absence (=will fail the match if found) of a $char after any 0+ chars other than linebreak chars (use s modifier if you are running the regex against a multiline string input).
The regex engine works from left to right.
You can use the natural greediness of quantifiers to reach the end of the string and find the last char with the backtracking mechanism:
if($string =~ m/.*\K$char/s) { ...
\K marks the position of the match result beginning.
Other ways:
you can also reverse the string and use your previous pattern.
you can search all occurrences and take the last item in the list
I'm having trouble understanding what you want. Your subroutine is called removeCharFromEnd, so perhaps you want to remove $char from $string if it appears at the end of the string
You can do that like this
sub removeCharFromEnd {
my ( $string, $char ) = #_;
if ( $string =~ s/$char\z// ) {
print "success";
}
$string;
}
Or perhaps you want to remove the last occurrence of $char wherever it is. You can do that with
s/.*\K$char//
The subroutine I have written returns the modified string, so you would have to assign the result to a variable to save it. You can write
my $s = 'abc';
$s = removeCharFromEnd($s, 'c');
say $s;
output
ab
If you just want to modify the string in place then you should write
$ARGV[0] =~ s/$char\z//
using whichever substitution you choose. Then you can do this
my $s = 'abc';
removeCharFromEnd($s, 'c');
say $s;
This produces the same output
To get Perl to search from the end of a string, reverse the string.
sub removeCharFromEnd {
my $string = reverse shift #_;
my $char = quotemeta reverse shift #_;
$string =~ s/$char//;
$string = reverse $string;
return $string;
}
print removeCharFromEnd(qw( abcabc b )), "\n";
print removeCharFromEnd(qw( abcdefabcdef c )), "\n";
print removeCharFromEnd(qw( !"/$%?&*!"/$%?&* $ )), "\n";
I am trying to find out the number of occurrences of "The/the". Below is the code I tried"
print ("Enter the String.\n");
$inputline = <STDIN>;
chop($inputline);
$regex="\[Tt\]he";
if($inputline ne "")
{
#splitarr= split(/$regex/,$inputline);
}
$scalar=#splitarr;
print $scalar;
The string is :
Hello the how are you the wanna work on the project but i the u the
The
The output that it gives is 7. However with the string :
Hello the how are you the wanna work on the project but i the u the
the output is 5. I suspect my regex. Can anyone help in pointing out what's wrong.
I get the correct number - 6 - for the first string
However your method is wrong, because if you count the number of pieces you get by splitting on the regex pattern it will give you different values depending on whether the word appears at the beginning of the string. You should also put word boundaries \b into your regular expression to prevent the regex from matching something like theory
Also, it is unnecessary to escape the square brackets, and you can use the /i modifier to do a case-independent match
Try something like this instead
use strict;
use warnings;
print 'Enter the String: ';
my $inputline = <>;
chomp $inputline;
my $regex = 'the';
if ( $inputline ne '' ) {
my #matches = $inputline =~ /\b$regex\b/gi;
print scalar #matches, " occurrences\n";
}
With split, you're counting the substrings between the the's. Use match instead:
#!/usr/bin/perl
use warnings;
use strict;
my $regex = qr/[Tt]he/;
for my $string ('Hello the how are you the wanna work on the project but i the u the The',
'Hello the how are you the wanna work on the project but i the u the',
'the theological cathedral'
) {
my $count = () = $string =~ /$regex/g;
print $count, "\n";
my #between = split /$regex/, $string;
print 0 + #between, "\n";
print join '|', #between;
print "\n";
}
Note that both methods return the same number for the two inputs you mentioned (and the first one returns 6, not 7).
The following snippet uses a code side-effect to increment a counter, followed by an always-failing match to keep searching. It produces the correct answer for matches that overlap (e.g. "aaaa" contains "aa" 3 times, not 2). The split-based answers don't get that right.
my $i;
my $string;
$i = 0;
$string = "aaaa";
$string =~ /aa(?{$i++})(?!)/;
print "'$string' contains /aa/ x $i (should be 3)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the The";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 6)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 5)\n";
What you need is 'countof' operator to count the number of matches:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/[Tt]he/g;
print $count;
If you want to select only the word the or The, add word boundary:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/\b[Tt]he\b/g;
print $count;
I am using the following code to search for a substring and print it out with a few characters before and after it. Somehow Perl takes issue with me using $1 and complains about
Use of uninitialized value $1 in concatenation (.) or string.
I cannot figure out why...can you?
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/\b$word\b/g ) { # as long as pattern is found...
print "$word\ ";
print "$1";
print substr ($lines, max(pos($lines)-length($1)-$context, 0), length($1)+$context); # check: am I possibly violating any boundaries here
}
You have to capture $word into regex group $1 by using parentheses,
while ($lines =~ m/\b($word)\b/g)
When you use $1, you are asking the code to use the first captured group from the regex and since your regex doesn't have any, well, that variable won't exist.
You can either refer to the whole match with $& or you add a capture group to your regex and keep using $1.
i.e. Either:
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/\b$word\b/g ) { # as long as pattern is found...
print "$word\ ";
print "$&";
print substr ($lines, max(pos($lines)-length($&)-$context, 0), length($&)+$context); # check: am I possibly violating any boundaries here
}
Or
use List::Util qw[min max];
my $word = "test";
my $lines = "this is just a test to find something out";
my $context = 3;
while ($lines =~ m/(\b$word\b)/g ) { # as long as pattern is found...
print "$word\ ";
print "$1";
print substr ($lines, max(pos($lines)-length($1)-$context, 0), length($1)+$context); # check: am I possibly violating any boundaries here
}
Note: It doesn't matter whether you use (\b$word\b) or (\b$word)\b or \b($word\b) or \b($word)\b here because \b is a 'string' of 0 length.
When you want to address a matched part in regex, put it in parenthes. Than you'll be able to address this mathced part via $1 variable (for first pair of parenthes), $2 (for the second pair) and so on.
The values $1, $2 and so on hold the strings found by capture groups. When a match is performed all of these variables are set to undef. The code in the question does not have any capture groups and hence $1 is never given a value, it is undefined.
Running the code below shows the effect. Initially $1, $2 and $3 are not defined. The first match sets $1 and $2 but not $3. The second match sets only $1 but not that $2 is cleared to be undefined. The third match has no capture groups and all three are undefined.
use strict;
use warnings;
sub show
{
printf "\$1: %s\n", (defined $1 ? $1 : "-undef-");
printf "\$2: %s\n", (defined $2 ? $2 : "-undef-");
printf "\$3: %s\n", (defined $3 ? $3 : "-undef-");
print "\n";
}
my $text = "abcdefghij";
show();
$text =~ m/ab(cd)ef(gh)ij/; # First match
show();
$text =~ m/ab(cd)efghij/; # Second match
show();
$text =~ m/abcdefghij/; # Third match
show();
$1 will have no value unless you are actually capturing something.
You can adjust your boundary collection method to using lookahead and lookbehinds.
use strict;
use warnings;
my $lines = "this is just a test to find something out";
my $word = "test";
my $extra = 10;
while ($lines =~ m/(?:(?<=(.{$extra}))|(.{0,$extra}))\b(\Q$word\E)\b(?=(.{0,$extra}))/gs ) {
my $pre = $1 // $2;
my $word = $3;
my $post = $4;
print "'...$pre<$word>$post...'\n";
}
Outputs:
'...is just a <test> to find s...'
This conditional must match either telco_imac_city or telco_hier_city. When it succeeds I need to extract up to the second underscore of the value that was matched.
I can make it work with this code
if ( ($value =~ /(telco_imac_)city/) || ($value =~ /(telco_hier_)city/) ) {
print "value is: \"$1\"\n";
}
But if possible I would rather use a single regex like this
$value = $ARGV[0];
if ( $value =~ /(telco_imac_)city|(telco_hier_)city/ ) {
print "value is: \"$1\"\n";
}
But if I pass the value telco_hier_city I get this output on testing the second value
Use of uninitialized value $1 in concatenation (.) or string at ./test.pl line 19.
value is: ""
What am I doing wrong?
while (<$input>){
chomp;
print "$1\n" if /(telco_hier|telco_imac)_city/;
}
Perl capture groups are numbered based on the matches in a single statement. Your input, telco_hier_city, matches the second capture of that single regex (/(telco_imac_)city|(telco_hier_)city/), meaning you'd need to use $2:
my $value = $ARGV[0];
if ( $value =~ /(telco_imac_)city|(telco_hier_)city/ ) {
print "value is: \"$2\"\n";
}
Output:
$> ./conditionalIfRegex.pl telco_hier_city
value is: "telco_hier_"
Because there was no match in your first capture group ((telco_imac_)), $1 is uninitialized, as expected.
To fix your original code, use FlyingFrog's regex:
my $value = $ARGV[0];
if ( $value =~ /(telco_hier_|telco_imac_)city/ ) {
print "value is: \"$1\"\n";
}
Output:
$> ./conditionalIfRegex.pl telco_hier_city
value is: "telco_hier_"
$> ./conditionalIfRegex.pl telco_imac_city
value is: "telco_imac_"
I'm looping through a series of regexes and matching it against lines in a file, like this:
for my $regex (#{$regexs_ref}) {
LINE: for (#rawfile) {
/#$regex/ && do {
# do something here
next LINE;
};
}
}
Is there a way for me to know how many matches I've got (so I can process it accordingly..)?
If not maybe this is the wrong approach..? Of course, instead of looping through every regex, I could just write one recipe for each regex. But I don't know what's the best practice?
If you do your matching in list context (i.e., basically assigning to a list), you get all of your matches and groupings in a list. Then you can just use that list in scalar context to get the number of matches.
Or am I misunderstanding the question?
Example:
my #list = /$my_regex/g;
if (#list)
{
# do stuff
print "Number of matches: " . scalar #list . "\n";
}
You will need to keep track of that yourself. Here is one way to do it:
#!/usr/bin/perl
use strict;
use warnings;
my #regexes = (
qr/b/,
qr/a/,
qr/foo/,
qr/quux/,
);
my %matches = map { $_ => 0 } #regexes;
while (my $line = <DATA>) {
for my $regex (#regexes) {
next unless $line =~ /$regex/;
$matches{$regex}++;
}
}
for my $regex (#regexes) {
print "$regex matched $matches{$regex} times\n";
}
__DATA__
foo
bar
baz
In CA::Parser's processing associated with matches for /$CA::Regex::Parser{Kills}{all}/, you're using captures $1 all the way through $10, and most of the rest use fewer. If by the number of matches you mean the number of captures (the highest n for which $n has a value), you could use Perl's special #- array (emphasis added):
#LAST_MATCH_START
#-
$-[0] is the offset of the start of the last successful match. $-[n] is the offset of the start of the substring matched by n-th subpattern, or undef if the subpattern did not match.
Thus after a match against $_, $& coincides with substr $_, $-[0], $+[0] - $-[0]. Similarly, $n coincides with
substr $_, $-[n], $+[n] - $-[n]
if $-[n] is defined, and $+ coincides with
substr $_, $-[$#-], $+[$#-] - $-[$#-]
One can use $#- to find the last matched subgroup in the last successful match. Contrast with $#+, the number of subgroups in the regular expression. Compare with #+.
This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope. $-[0] is the offset into the string of the beginning of the entire match. The n-th element of this array holds the offset of the nth submatch, so $-[1] is the offset where $1 begins, $-[2] the offset where $2 begins, and so on.
After a match against some variable $var:
$` is the same as substr($var, 0, $-[0])
$& is the same as substr($var, $-[0], $+[0] - $-[0])
$' is the same as substr($var, $+[0])
$1 is the same as substr($var, $-[1], $+[1] - $-[1])
$2 is the same as substr($var, $-[2], $+[2] - $-[2])
$3 is the same as substr($var, $-[3], $+[3] - $-[3])
Example usage:
#! /usr/bin/perl
use warnings;
use strict;
my #patterns = (
qr/(foo(bar(baz)))/,
qr/(quux)/,
);
chomp(my #rawfile = <DATA>);
foreach my $pattern (#patterns) {
LINE: for (#rawfile) {
/$pattern/ && do {
my $captures = $#-;
my $s = $captures == 1 ? "" : "s";
print "$_: got $captures capture$s\n";
};
}
}
__DATA__
quux quux quux
foobarbaz
Output:
foobarbaz: got 3 captures
quux quux quux: got 1 capture
How about below code:
my $string = "12345yx67hjui89";
my $count = () = $string =~ /\d/g;
print "$count\n";
It prints 9 here as expected.