Perl alphabetic sort with alphanumeric string and regex - regex

I am trying to sort alphanumeric string with perl and am facing the following problem:
Strings are like this: "XXXXX-1.0.0" where numbers can be on one or two digits and are representing release versions.
My problem is that when I have the two following strings:
- XXXXX-1.9.9
- XXXXX-1.10.0
XXXXX-1.9.9 is considered as greater than XXXXX-1.10.0 since 9 is greater than 1.
So I am trying to force numbers to be on two digits with a regexp.
There is the piece of code I am testing:
my $string = "XXXXXX-1.9.9";
my $pre = "0";
$string =~ s/(\.|-)+(\d{1})($|\.)/$1$pre$2$3/g;
print "$string\n";
This gives me the result "XXXXXX-01.9.09" which is not what I am looking for since it will not be sorted correctly. So I have to do that:
my $string = "XXXXXX-1.9.9";
my $pre = "0";
$string =~ s/(\.|-)+(\d{1})($|\.)/$1$pre$2$3/g;
$string =~ s/(\.|-)+(\d{1})($|\.)/$1$pre$2$3/g;
print "$string\n";
To get "XXXXXX-01.09.09"
My question is double:
- Is there a way to sort my strings with Perl without using regex?
- If I have to use regex is there a way to write it so that I do not have to execute it twice?
Thank you in advance.

You can use look around assertions to make sure there are not digits around the digit you are replacing without moving the position over the neighbouring characters.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #strings = qw( XXXXX-1.9.9
XXXXX-1.9.10
XXXXX-1.10.0 );
s/(?<=[^0-9]) ([0-9]) (?=[^0-9]|$) /0$1/xg for #strings;
#strings = sort #strings;
s/(?<=[^0-9]) 0+ ([0-9]) /$1/xg for #strings;
say for #strings;

Related

Replace only the second occurance of string in a line in perl regex

I have a string like "ven|ven|vett|vejj|ven|ven". Treat each "|" delimiter for each column.
By splitting the string with "|" saving all the columns in array and reading each column into $str
So, I'm trying to do this as
$string =~ s/$str/venky/g if $str =~ /ven/i; # it will do globally.
Which not met the requirement.
On-demand basis, I need to replace string at the particular number of occurrence of the string.
For example, I've a request to change 2nd occurrence of "ven" to venky.
Then how can I met this requirement simply? Is it some-thing like
$string =~ s/ven/venky/2;
As of my knowledge we have 'o' for replace once and 'g' for globally. I'm struggling for the solution to get the replacement at particular occurrence. And I should not use pos() to get the position, because string keeps on change. It becomes difficult to trace it every-time. That's my intention.
Please help me on this regard.
There is no flag that you can add to the regex that will do this.
The easiest way would be to split and loop. However, if you insist to use one regex, it is doable:
/^(?:[^v]|v[^e]|ve[^n])*ven(?:[^v]|v[^e]|ve[^n])*\Kven/
If you want to replace the Nth occurrence instead of the second, you can do:
/^(?:(?:[^v]|v[^e]|ve[^n])*ven){N-1}(?:[^v]|v[^e]|ve[^n])*\Kven/
The general idea:
(?:[^v]|v[^e]|ve[^n])* - matches any string that isn't part of ven
\K is a cool matcher that drops everything matched so far, so you can sort of use it as a lookbehind with variable length
Currently you're replacing every instance of'ven' with 'venky' if your string contains a match for ven, which of course it does.
What I assume you're trying to do is to substitute 'ven' for 'venky' within your string if it's the second element:
my $string = 'ven|ven|vett|vejj|ven|ven';
my #elements = split(/\|/, $string);
my $count;
foreach (#elements){
$count++;
s/$_/venky/g if /ven/i and $count == 2;
}
print join('|', #elements);
print "\n";
Your approach was already pretty good. What you described makes sense, but I think you are having trouble implementing it.
I created a function to do the work. It takes 4 arguments:
$string is the string we want to work on
$n is the nth occurance you want to replace
$needle is the thing you want to replace – thing needle in a haystack
Note that right now we allow to pass stuff that might contain regular expressions. So you would have to use quotemeta on it or match with /\Q$needle\E/
$replacement is the replacement for the $needle
The idea is to split up the string, then check each element if it matches the pattern ($needle) and keep track of how many have matched. If the nth one is reached, replace it and stop processing. Then put the string back together.
use strict;
use warnings;
use feature 'say';
say replace_nth_occurance("ven|ven|vett|vejj|ven|ven", 2, 'ven', 'venky');
sub replace_nth_occurance {
my ($string, $n, $needle, $replacement) = #_;
# take the string appart
my #elements = split /\|/, $string;
my $count = 0; # keep track of ...
foreach my $e (#elements) {
$count++ if $e =~ m/$needle/; # ... how many matches we've found
if ($count == $n) {
$e =~ s/$needle/$replacement/; # replace
last; # and stop processing
}
}
# put it back into the pipe-separated format
return join '|', #elements;
}
Output:
ven|venky|vett|vejj|ven|ven
To replace the n'th occurrence of "ven" to "venky":
my $n = 3;
my $test = "seven given ravens";
$test =~ s/ven/--$n == 0 ? "venky" : $&/eg;
This uses the ability with the /e flag to specify the substitution part as an expression.

Match the nth longest possible string in Perl

The pattern matching quantifiers of a Perl regular expression are "greedy" (they match the longest possible string). To force the match to be "ungreedy", a ? can be appended to the pattern quantifier (*, +).
Here is an example:
#!/usr/bin/perl
$string="111s11111s";
#-- greedy match
$string =~ /^(.*)s/;
print "$1\n"; # prints 111s11111
#-- ungreedy match
$string =~ /^(.*?)s/;
print "$1\n"; # prints 111
But how one can find the second, third and .. possible string match in Perl? Make a simple example of yours --if need a better one.
Utilize a conditional expression, a code expression, and backtracking control verbs.
my $skips = 1;
$string =~ /^(.*)s(?(?{$skips-- > 0})(*FAIL))/;
The above will use greedy matching, but will cause the largest match to intentionally fail. If you wanted the 3rd largest, you could just set the number of skips to 2.
Demonstrated below:
#!/usr/bin/perl
use strict;
use warnings;
my $string = "111s11111s11111s";
$string =~ /^(.*)s/;
print "Greedy match - $1\n";
$string =~ /^(.*?)s/;
print "Ungreedy match - $1\n";
my $skips = 1;
$string =~ /^(.*)s(?(?{$skips-- > 0})(*FAIL))/;
print "2nd Greedy match - $1\n";
Outputs:
Greedy match - 111s11111s11111
Ungreedy match - 111
2nd Greedy match - 111s11111
When using such advanced features, it is important to have a full understanding of regular expressions to predict the results. This particular case works because the regex is fixed on one end with ^. That means that we know that each subsequent match is also one shorter than the previous. However, if both ends could shift, we could not necessarily predict order.
If that were the case, then you find them all, and then you sort them:
use strict;
use warnings;
my $string = "111s11111s";
my #seqs;
$string =~ /^(.*)s(?{push #seqs, $1})(*FAIL)/;
my #sorted = sort {length $b <=> length $a} #seqs;
use Data::Dump;
dd #sorted;
Outputs:
("111s11111s11111", "111s11111", 111)
Note for Perl versions prior to v5.18
Perl v5.18 introduced a change, /(?{})/ and /(??{})/ have been heavily reworked, that enabled the scope of lexical variables to work properly in code expressions as utilized above. Before then, the above code would result in the following errors, as demonstrated in this subroutine version run under v5.16.2:
Variable "$skips" will not stay shared at (re_eval 1) line 1.
Variable "#seqs" will not stay shared at (re_eval 2) line 1.
The fix for older implementations of RE code expressions is to declare the variables with our, and for further good coding practices, to localize them when initialized. This is demonstrated in this modified subroutine version run under v5.16.2, or as put below:
local our #seqs;
$string =~ /^(.*)s(?{push #seqs, $1})(*FAIL)/;
Start by getting all possible matches.
my $string = "111s1111s11111s";
local our #matches;
$string =~ /^(.*)s(?{ push #matches, $1 })(?!)/;
This finds
111s1111s11111
111s1111
111
Then, it's just a matter of finding out which one is the second longuest and filtering out the others.
use List::MoreUtils qw( uniq );
my $target_length = ( sort { $b <=> $a } uniq map length, #matches )[1];
#matches = uniq grep { length($_) == $target_length } #matches
if $target_length;

Perl regexp to match random character/number combination?

I'm trying to match patterns like these with perl regexp:
_b04it4_
_bg4n5p_
_qp9bp_
_hp32z7_
...that is, underscore followed by some combination of characters and numbers.
I guess the "rule" is that there are >=1 [a-z] characters and >=1 [0-9] character/number, and no spaces, "mixed in any combination", between two underscore-characters.
And want to replace this with something, eg. "_X_".
I'd appreciate some help with this .. My own attempts are looking horrible and don't work very well :)
For at least 1 letter and number:
_(?=[^_]*[a-z])(?=[^_]*\d)[a-z\d]+_
RegExr Example
(?=[^_]*[a-z]) checks for the presence of a letter between the two _
(?=[^_]*\d) checks for the presence on a number between the two _
_[a-z\d]+_ does the actual match
How about:
_(?=.*[a-z])(?=.*[0-9])[0-9a-z]+_
Another way without lookaheads:
_([a-z]+[0-9]|[0-9]+[a-z])[a-z0-9]*_
Something like this is easily solved if you separate the conditions into multiple regex's. The first matching the basic constraints, and the 2nd to ensure that at least 1 char and 1 digit are in the match.
use strict;
use warnings;
while (<DATA>) {
chomp;
my $before = my $after = $_;
$after =~ s{_([a-z0-9]+)_}{
my $chars = $1;
# Require 1 digit and 1 letter in the match before replacing.
($chars =~ /[a-z]/ && $chars =~ /[0-9]/) ? "_X_" : "_${chars}_"
}e;
printf "%-12s -> %-12s\n", $before, $after;
}
__DATA__
_b04it4_
_bg4n5p_
_qp9bp_
_hp32z7_
_nonumbers_
_012345_
_1 space_
How about this:
use strict;
my ($replacement, #input) = ('X', qw(_b04it4_ _bg4n5p_ _qp9bp_ _hp32z7_));
my #output = map {'_'.$replacement.'_'} grep {/^_[a-z0-9]+_$/ && /[a-z]+/ && /[0-9]+/} #input;
print "$_\n" foreach #output;

Interpreting powers in perl using regex

I'm trying to read in two numbers separated by a : and perform a comparison.
Below is some code that illustrates my problems:
use strict;
use warnings;
my #nums = qw(1.23:2.13 0.1:2.11 1.17772e+06:1.32 2:10.2);
for my $number (#nums){
print "actual numbers $number\n";
my ($c, $e) = ($1, $2) if $number =~ /(\d+\.\d+|\d+):(\d+\.\d+|\d+)/;
print "regex matches: $c:$e\n";
}
Which outputs:
actual numbers 1.23:2.13
regex matches: 1.23:2.13
actual numbers 0.1:2.11
regex matches: 0.1:2.11
actual numbers 1.17772e+06:1.32
regex matches: 06:1.32 # not capturing 1.17772e+06
actual numbers 2:10.2
regex matches: 2:10.2
My question is: How can I a) capture 1.17772e+06 and b) evaluate it as a number?
From perldata:
/^(?:[+-]?)(?=\d|\.\d)\d*(?:\.\d*)?(?:[Ee](?:[+-]?\d+))?$/
Or,
use Regexp::Common;
/$RE{num}{real}/
(These assume you want Perl's definition of a number.)
I would just use the split function (split /:/) here.

What regular expression do I use for replacing numbers with leading zeros?

I have a bunch of strings like this:
my $string1 = "xg0000";
my $string2 = "fx0015";
What do I do to increase the number in the string by 1 but also maintain the leading zeros to keep the length of the string the same.
I tried this:
$string =~ s/(\d+)/0 x length(int($1)) . ($1+1)/e;
It doesn't seem to work on all numbers. Is regex what I'm supposet to use to do this or is there a better way?
How about a little perl magic? The ++ operator will work even on strings, and 0000 will magically turn into 0001.
Now, we can't modify $1 since it is readonly, but we can use an intermediate variable.
use strict;
use warnings;
my $string = "xg0000";
$string =~ s/(\d+)/my $x=$1; ++$x/e;
Update:
I didn't think of this before, but it actually works without a regex:
C:\perl>perl -we "$str = 'xg0000'; print ++$str;"
xg0001
Still does not solve the problem DavidO pointed out, with 9999. You would have to decide what to do with those numbers. Perl has a rather interesting solution for it:
C:\perl>perl -we "$str = 'xg9999'; print ++$str;"
xh0000
You can do it with sprintf too, and use the length you compute from the number of digits that you capture:
use strict;
use warnings;
my $string = "xg00000";
foreach ( 0 .. 9 ) {
$string =~ s/([0-9]+)\z/
my $l = length $1;
sprintf "%0${l}d", $1 + 1;
/e;
print "$string\n";
}
This is a really bad task to solve with a regexp. Increasing a number can change an unlimited number of digits, and can in fact also change the number of non-zero digits! Unless you have sworn an oath to use only regexes for a year, use regex to extract the number and then sprintf "%06d" $x+1 to regenerate the new number with the desired width.