How can I keep only the numbers in a variable in Perl? - regex

I have a variable like this below:
G12345(##)
How can I keep in the variable only the numbers 12345. I have done it before in PHP but I cannot find a way in Perl.

$v =~ s/\D//g; should do the trick.
(Regular expression substitute "Not a number" with "nothing", globally)

This can also be done without regular expressions: Transliterate: tr///
use warnings;
use strict;
my $s = 'G12345(##)';
$s =~ tr/0-9//cd;
print "$s\n";
__END__
12345

Substitute any non-numeric characters with an empty string (\D is non-numeric):
$var =~ s/\D+//g;

You can also do it this way:
my ( $number ) = $string =~ /(\d+)/;
This means that were there some other digits to occur after the '(##)' --for whatever reason, that you would not suddenly concatenate those digits to the number that lies between 'G' and '('. So the capture method makes sure you get the first set of contiguous digits.

Related

Limit the translation to just one word in a phrase?

Coming new to Perl world from Python, and wonder if there is a simple way to limit the translation or replace to just one word in a phrase?
In the example, the 2nd word kind also got changed to lind. Is there a simple way to do the translation without diving into some looping? Thanks.
The first word has been correctly translated to gazelle, but 2nd word has been changed too as you can see.
my $string = 'gazekke is one kind of antelope';
my $count = ($string =~ tr/k/l/);
print "There are $count changes \n";
print $string; # gazelle is one lind of antelope <-- kind becomes lind too!
I don't know of an option for tr to stop translation after the first word.
But you can use a regex with backreferences for this.
use strict;
my $string = 'gazekke is one kind of antelope';
# Match first word in $1 and rest of sentence in $2.
$string =~ m/(\w+)(.*)/;
# Translate all k's to l's in the first word.
(my $translated = $1) =~ tr/k/l/;
# Concatenate the translated first word with the rest
$string = "$translated$2";
print $string;
Outputs: gazelle is one kind of antelope
Pick the first match (a word in this case), precisely what regex does when without /g, and in that word replace all wanted characters, by running code in the replacement side, by /e
$string =~ s{(\w+)}{ $1 =~ s/k/l/gr }e;
In the regex in the replacement side, /r modifier makes it handily return the changed string and doesn't change the original, what also allows a substitution to run on $1 (which can't be modified as is a read-only).
tr is a character class transliterator. For anything else you would use regex.
$string =~ s/gazekke/gazelle/;
You can put a code block as the second half of s/// to do more complicated replacements or transmogrifications.
$string =~ s{([A-Za-z]+)}{ &mangler($1) if $should_be_mangled{$1}; }ge;
Edit:
Here's how you would first locate a phrase and then work on it.
$phrase_regex = qr/(?|(gazekke) is one kind of antelope|(etc))/;
$string =~ s{($phrase_regex)}{
my $match = $1;
my $word = $2;
$match =~ s{$word}{
my $new = $new_word_map{$word};
&additional_mangling($new);
$new;
}e;
$match;
}ge;
Here's the Perl regex documentation.
https://perldoc.perl.org/perlre

extract string between two dots

I have a string of the following format:
word1.word2.word3
What are the ways to extract word2 from that string in perl?
I tried the following expression but it assigns 1 to sub:
#perleval $vars{sub} = $vars{string} =~ /.(.*)./; 0#
EDIT:
I have tried several suggestions, but still get the value of 1. I suspect that the entire expression above has a problem in addition to parsing. However, when I do simple assignment, I get the correct result:
#perleval $vars{sub} = $vars{string} ; 0#
assigns word1.word2.word3 to variable sub
. has a special meaning in regular expressions, so it needs to be escaped.
.* could match more than intended. [^.]* is safer.
The match operator (//) simply returns true/false in scalar context.
You can use any of the following:
$vars{sub} = $vars{string} =~ /\.([^.]*)\./ ? $1 : undef;
$vars{sub} = ( $vars{string} =~ /\.([^.]*)\./ )[0];
( $vars{sub} ) = $vars{string} =~ /\.([^.]*)\./;
The first one allows you to provide a default if there's no match.
Try:
/\.([^\.]+)\./
. has a special meaning and would need to be escaped. Then you would want to capture the values between the dots, so use a negative character class like ([^\.]+) meaning at least one non-dot. if you use (.*) you will get:
word1.stuff1.stuff2.stuff3.word2 to result in:
stuff1.stuff2.stuff3
But maybe you want that?
Here is my little example, I do find the perl one liners a little harder to read at times so I break it out:
use strict;
use warnings;
if ("stuff1.stuff2.stuff3" =~ m/\.([^.]+)\./) {
my $value = $1;
print $value;
}
else {
print "no match";
}
result
stuff2
. has a special meaning: any character (see the expression between your parentheses)
Therefore you have to escape it (\.) if you search a literal dot:
/\.(.*)\./
You've got to make sure you're asking for a list when you do the search.
my $x= $string =~ /look for (pattern)/ ;
sets $x to 1
my ($x)= $string =~ /look for (pattern)/ ;
sets $x to pattern.

Match the nth longest possible string in Perl

The pattern matching quantifiers of a Perl regular expression are "greedy" (they match the longest possible string). To force the match to be "ungreedy", a ? can be appended to the pattern quantifier (*, +).
Here is an example:
#!/usr/bin/perl
$string="111s11111s";
#-- greedy match
$string =~ /^(.*)s/;
print "$1\n"; # prints 111s11111
#-- ungreedy match
$string =~ /^(.*?)s/;
print "$1\n"; # prints 111
But how one can find the second, third and .. possible string match in Perl? Make a simple example of yours --if need a better one.
Utilize a conditional expression, a code expression, and backtracking control verbs.
my $skips = 1;
$string =~ /^(.*)s(?(?{$skips-- > 0})(*FAIL))/;
The above will use greedy matching, but will cause the largest match to intentionally fail. If you wanted the 3rd largest, you could just set the number of skips to 2.
Demonstrated below:
#!/usr/bin/perl
use strict;
use warnings;
my $string = "111s11111s11111s";
$string =~ /^(.*)s/;
print "Greedy match - $1\n";
$string =~ /^(.*?)s/;
print "Ungreedy match - $1\n";
my $skips = 1;
$string =~ /^(.*)s(?(?{$skips-- > 0})(*FAIL))/;
print "2nd Greedy match - $1\n";
Outputs:
Greedy match - 111s11111s11111
Ungreedy match - 111
2nd Greedy match - 111s11111
When using such advanced features, it is important to have a full understanding of regular expressions to predict the results. This particular case works because the regex is fixed on one end with ^. That means that we know that each subsequent match is also one shorter than the previous. However, if both ends could shift, we could not necessarily predict order.
If that were the case, then you find them all, and then you sort them:
use strict;
use warnings;
my $string = "111s11111s";
my #seqs;
$string =~ /^(.*)s(?{push #seqs, $1})(*FAIL)/;
my #sorted = sort {length $b <=> length $a} #seqs;
use Data::Dump;
dd #sorted;
Outputs:
("111s11111s11111", "111s11111", 111)
Note for Perl versions prior to v5.18
Perl v5.18 introduced a change, /(?{})/ and /(??{})/ have been heavily reworked, that enabled the scope of lexical variables to work properly in code expressions as utilized above. Before then, the above code would result in the following errors, as demonstrated in this subroutine version run under v5.16.2:
Variable "$skips" will not stay shared at (re_eval 1) line 1.
Variable "#seqs" will not stay shared at (re_eval 2) line 1.
The fix for older implementations of RE code expressions is to declare the variables with our, and for further good coding practices, to localize them when initialized. This is demonstrated in this modified subroutine version run under v5.16.2, or as put below:
local our #seqs;
$string =~ /^(.*)s(?{push #seqs, $1})(*FAIL)/;
Start by getting all possible matches.
my $string = "111s1111s11111s";
local our #matches;
$string =~ /^(.*)s(?{ push #matches, $1 })(?!)/;
This finds
111s1111s11111
111s1111
111
Then, it's just a matter of finding out which one is the second longuest and filtering out the others.
use List::MoreUtils qw( uniq );
my $target_length = ( sort { $b <=> $a } uniq map length, #matches )[1];
#matches = uniq grep { length($_) == $target_length } #matches
if $target_length;

How can I extract a substring up to the first digit?

How can I find the first substring until I find the first digit?
Example:
my $string = 'AAAA_BBBB_12_13_14' ;
Result expected: 'AAAA_BBBB_'
Judging from the tags you want to use a regular expression. So let's build this up.
We want to match from the beginning of the string so we anchor with a ^ metacharacter at the beginning
We want to match anything but digits so we look at the character classes and find out this is \D
We want 1 or more of these so we use the + quantifier which means 1 or more of the previous part of the pattern.
This gives us the following regular expression:
^\D+
Which we can use in code like so:
my $string = 'AAAA_BBBB_12_13_14';
$string =~ /^\D+/;
my $result = $&;
Most people got half of the answer right, but they missed several key points.
You can only trust the match variables after a successful match. Don't use them unless you know you had a successful match.
The $&, $``, and$'` have well known performance penalties across all regexes in your program.
You need to anchor the match to the beginning of the string. Since Perl now has user-settable default match flags, you want to stay away from the ^ beginning of line anchor. The \A beginning of string anchor won't change what it does even with default flags.
This would work:
my $substring = $string =~ m/\A(\D+)/ ? $1 : undef;
If you really wanted to use something like $&, use Perl 5.10's per-match version instead. The /p switch provides non-global-perfomance-sucking versions:
my $substring = $string =~ m/\A\D+/p ? ${^MATCH} : undef;
If you're worried about what might be in \D, you can specify the character class yourself instead of using the shortcut:
my $substring = $string =~ m/\A[^0-9]+/p ? ${^MATCH} : undef;
I don't particularly like the conditional operator here, so I would probably use the match in list context:
my( $substring ) = $string =~ m/\A([^0-9]+)/;
If there must be a number in the string (so, you don't match an entire string that has no digits, you can throw in a lookahead, which won't be part of the capture:
my( $substring ) = $string =~ m/\A([^0-9]+)(?=[0-9])/;
$str =~ /(\d)/; print $`;
This code print string, which stand before matching
perl -le '$string=q(AAAA_BBBB_12_13_14);$string=~m{(\D+)} and print $1'
AAAA_BBBB_

What regular expression do I use for replacing numbers with leading zeros?

I have a bunch of strings like this:
my $string1 = "xg0000";
my $string2 = "fx0015";
What do I do to increase the number in the string by 1 but also maintain the leading zeros to keep the length of the string the same.
I tried this:
$string =~ s/(\d+)/0 x length(int($1)) . ($1+1)/e;
It doesn't seem to work on all numbers. Is regex what I'm supposet to use to do this or is there a better way?
How about a little perl magic? The ++ operator will work even on strings, and 0000 will magically turn into 0001.
Now, we can't modify $1 since it is readonly, but we can use an intermediate variable.
use strict;
use warnings;
my $string = "xg0000";
$string =~ s/(\d+)/my $x=$1; ++$x/e;
Update:
I didn't think of this before, but it actually works without a regex:
C:\perl>perl -we "$str = 'xg0000'; print ++$str;"
xg0001
Still does not solve the problem DavidO pointed out, with 9999. You would have to decide what to do with those numbers. Perl has a rather interesting solution for it:
C:\perl>perl -we "$str = 'xg9999'; print ++$str;"
xh0000
You can do it with sprintf too, and use the length you compute from the number of digits that you capture:
use strict;
use warnings;
my $string = "xg00000";
foreach ( 0 .. 9 ) {
$string =~ s/([0-9]+)\z/
my $l = length $1;
sprintf "%0${l}d", $1 + 1;
/e;
print "$string\n";
}
This is a really bad task to solve with a regexp. Increasing a number can change an unlimited number of digits, and can in fact also change the number of non-zero digits! Unless you have sworn an oath to use only regexes for a year, use regex to extract the number and then sprintf "%06d" $x+1 to regenerate the new number with the desired width.