regex for a number, number incremented by one and two - regex

Give a number x, I wonder if there is any regex that matches for x and x+1 and x+2.
Thanks,

The best approach would probably be to do something like:
my $x = 3;
my $regex = join "|", $x, $x+1, $x+2;
for (0 .. 10) {
print "$_\n" if /$regex/;
}
But if you want, you can use interpolation directly within the regexp:
my $x = 3;
for (0 .. 10) {
print "$_\n" if /$x|${\($x+1)}|${\($x+2)}/;
}
Output for both:
3
4
5
I personally think the latter is a lot less readable though.

String contains:
my $pat = join '|', $x, $x+1, $x+2;
$s =~ /(?<![0-9])(?:$pat)(?![0-9])/ # Assumes non-negative integers
Exact match:
my $pat = join '|', $x, $x+1, $x+2;
$y =~ /^(?:$pat)\z/
$y == $x || $y == $x+1 || $y == $x+2 # Most straightfoward
$x <= $y && $y <= $x+2 # Possibly clearest
Exact match (More exotic):
grep $y == $x + $_, 0..2
$y ~~ [ map $x_+$_, 0..2 ]

You could use (??{...}):
use re qw'eval';
/^ (?: $x | (??{ $x+1 }) | (??{ $x+2 }) ) $/x;
I would like to say that it make more sense to use $":
local $" #" # fix highlighting
= '|';
/^#{[ $x, $x+1, $x+2 ]}$/;
/^#{[ $x .. $x+2 ]}$/;
my #match = ( $x, $x+1, $x+2 );
/^#match$/;

I first thought of using index like so:
index( $source, $x + 2 );
But then, for $x=1, it just looks for a '3' anywhere in the string, matching 31, 23. So it appears that you might want to use a regex to make sure that it's an isolated string of digits.
/(?<!\d)${\( $x + 2 )}(?!\d)/

Given your other variable--let's call it $y--why not just check whether or not $y==$x, $y-$x==1, or $y-$x==2?
As ghoti pointed out in his/her comment, regular expressions aren't the tool for this.

For X = 10, regex should be \b(?:11|12)\b

Related

Getting first two strings between slashes

I have a string, alpha/beta/charlie/delta
I'm trying to extract out the string alpha/beta including the forward slash.
I'm able to accomplish this with split and joining the first and second result, but I feel like a regex might be better suited.
Depending on how many slashes there are as well will determine how many strings I need to grab, e.g. if there's 4 slashes get the first two strings, if there's 5, then grab first three. Again, my problem is extracting the slash with the string.
As Mathias already noticed - Split+Join is a perfectly valid solution:
$StringArray = #(
'alpha/beta/charlie/delta',
'alpha/beta/charlie/delta/omega'
'alpha/beta/charlie/gamma/delta/omega'
)
foreach ($String in $StringArray) {
$StringSplit = $String -split '/'
($StringSplit | Select-Object -First ($StringSplit.Count - 2) ) -join '/'
}
A little long, but I did it without regex:
$string = 'alpha/beta/charlie/delta/gamma'
# Count number of '/'
$count = 0
for( $i = 0; $i -lt $string.Length; $i++ ) {
if( $string[ $i ] -eq '/' ) {
$count = $count + 1
}
}
# Depending on the number of '/' you can create a mathematical equation, or simply do an if-else ladder.
# In this case, if count of '/' = 3, get first 2 strings, if count = 4, get first 3 strings.
function parse-strings {
Param (
$number_of_slashes,
$string
)
$all_slash = $number_of_slashes
$to_get = $number_of_slashes - 1
$counter = 0
for( $j = 0; $j -lt $string.Length; $j++ ) {
if( $string[ $j ] -eq '/' ) {
$counter = $counter + 1
}
if( $counter -eq $to_get ) {
( $string[ 0 .. ( $j - 1 ) ] -join "" )
break
}
}
}
parse-strings -number_of_slashes $count -string $string
You can try the .split() .net method where you define in parentheses where to split (on which character).
Then use the join operator “-join” to join your elements from the array
For your matter of concern use it like this:
$string = 'alpha/beta/charlie/delta/gamma'
$string = $string.split('/')
$string = "$($string[0])" + "/" + "$($string[1])"
$string
And so on...

Matching all characters in a string except one in any position

How to match (preferably in perl) all strings that match the query string except one character?
Query: TLAQLLLDK
Want to match: xLAQLLLDK, TxAQLLLDK, TLxQLLLDK, etc.
Where 'x' is any capital letter '[A-Z]'.
Use alternation operator.
^(?:[A-Z]LAQLLLDK|T[A-Z]AQLLLDK|TL[A-Z]QLLLDK|.....)$
Likewise fill all..
You can do that by writing a terrible regular expression, which will be horribly slow to build and even slower to execute, or you can just don't use regexes for things like these and write a function that just compares both strings character after character, allows for one "mistake" and returns True only if there was exactly one mistake.
How to match (preferably in perl) all strings that match the query string except one character?
Expanding the answer of #Avinash, by generating the required regular expression dynamically at run time:
my $query = 'TLAQLLLDK';
my $re_proto = '(' . join( '|', map { (my$x=$query)=~s/^(.{$_})./$1\[A-Za-z]/; $x; } (0 .. length($query)-1) ) . ')';
my $re = qr/^$re_proto$/;
my #input = qw(xLAQLLLDK TxAQLLLDK TLxQLLLDK);
my #matches = grep { /$re/ } #input;
print "#matches\n";
(I had to include the [a-z] too, since your example input uses the x as the marker.)
If you need to do that very often, I would advise to cache the generated regular expressions.
Is this what you are looking for?
#!/usr/bin/perl
use strict;
my #str = ("ULAQLLLDK","TAAQLLLDK","TLCQLLLDK","TLAQLLLDK");
while(<#str>){
if (/[A-S,U-Z]LAQLLLDK|T[A-K,M-Z]AQLLLDK|TL[B-Z]QLLLDK/ ){
print "$_\n";
}
}
output:
ULAQLLLDK
TAAQLLLDK
TLCQLLLDK
There are only 9 x 25 = 225 such strings, so you may as well generate them all and put them in a hash for comparison
use strict;
use warnings;
use 5.010;
my %matches;
my $s = 'TLAQLLLDK';
for my $i (0 .. length($s) - 1) {
my $c = substr $s, $i, 1;
for my $cc ('A' .. 'Z') {
substr(my $ss = $s, $i, 1) = $cc;
++$matches{$ss} unless $cc eq $c;
}
}
printf "%d matches found\n", scalar keys %matches;
for ( qw/ TLAQLLLDK TLAQLXLDK / ) {
printf "\$matches{%s} = %s\n", $_, $matches{$_} // 'undef';
}
output
225 matches found
$matches{TLAQLLLDK} = undef
$matches{TLAQLXLDK} = 1

Replace only up to N matches on a line

In Perl, how to write a regular expression that replaces only up to N matches per string?
I.e., I'm looking for a middle ground between s/aa/bb/; and s/aa/bb/g;. I want to allow multiple substitutions, but only up to N times.
I can think of three reliable ways. The first is to replace everything after the Nth match with itself.
my $max = 5;
$s =~ s/(aa)/ $max-- > 0 ? 'bb' : $1 /eg;
That's not very efficient if there are far more than N matches. For that, we need to move the loop out of the regex engine. The next two methods are ways of doing that.
my $max = 5;
my $out = '';
$out .= $1 . 'bb' while $max-- && $in =~ /\G(.*?)aa/gcs;
$out .= $1 if $in =~ /\G(.*)/gcs;
And this time, in-place:
my $max = 5;
my $replace = 'bb';
while ($max-- && $s =~ s/\G.*?\Kaa/$replace/s) {
pos($s) = $-[0] + length($replace);
}
You might be tempted to do something like
my $max = 5;
$s =~ s/aa/bb/ for 1..$max;
but that approach will fail for other patterns and/or replacement expressions.
my $max = 5;
$s =~ s/aa/ba/ for 1..$max; # XXX Turns 'aaaaaaaa'
# into 'bbbbbaaa'
# instead of 'babababa'
And of course, starting from the beginning of the string every time could be expensive.
What you want is not posible in regular expressions. But you can put the replacement in a for-loop:
my $i;
my $aa = 'aaaaaaaaaaaaaaaaaaaa';
for ($i=0;$i<4;$i++) {
$aa =~ s/aa/bb/;
}
print "$aa\n";
result:
bbbbbbbbaaaaaaaaaaaa
You can use the /e flag which evaluates the right side as an expression:
my $n = 3;
$string =~ s/(aa)/$n-- > 0 ? "bb" : $1/ge;
Here's a solution using the /e modifier, with which you can use
perl code to generate the replacement string:
my $count = 0;
$string =~ s{ $pattern }
{
$count++;
if ($count < $limit ) {
$replace;
} else {
$&; # faking a no-op, replacing with the original match.
}
}xeg;
With perl 5.10 or later you can drop the $& (which has weird
performance complications) and use ${^MATCH} via the /p modifier
$string =~ s{ $pattern }
{
$count++;
if ($count < $limit ) {
$replace;
} else {
${^MATCH};
}
}xegp;
It's too bad you can't just do this, but you can't:
last if $count >= $limit;

How can I count characters in Perl?

I have the following Perl script counting the number of Fs and Ts in a string:
my $str = "GGGFFEEIIEETTGGG";
my $ft_count = 0;
$ft_count++ while($str =~ m/[FT]/g);
print "$ft_count\n";
Is there a more concise way to get the count (in other words, to combine line 2 and 3)?
my $ft_count = $str =~ tr/FT//;
See perlop.
If the REPLACEMENTLIST is empty, the
SEARCHLIST is replicated. This latter is useful for counting
characters in a class …
$cnt = $sky =~ tr/*/*/; # count the stars in $sky
$cnt = tr/0-9//; # count the digits in $_
Here's a benchmark:
use strict; use warnings;
use Benchmark qw( cmpthese );
my ($x, $y) = ("GGGFFEEIIEETTGGG" x 1000) x 2;
cmpthese -5, {
'tr' => sub {
my $cnt = $x =~ tr/FT//;
},
'm' => sub {
my $cnt = ()= $y =~ m/[FT]/g;
},
};
Rate tr m
Rate m tr
m 108/s -- -99%
tr 8118/s 7440% --
With ActiveState Perl 5.10.1.1006 on 32 Windows XP.
The difference seems to be starker with
C:\Temp> c:\opt\strawberry-5.12.1\perl\bin\perl.exe t.pl
Rate m tr
m 88.8/s -- -100%
tr 25507/s 28631% --
When the "m" operator has the /g flag AND is executed in list context, it returns a list of matching substrings. So another way to do this would be:
my #ft_matches = $str =~ m/[FT]/g;
my $ft_count = #ft_matches; # count elements of array
But that's still two lines. Another weirder trick that can make it shorter:
my $ft_count = () = $str =~ m/[FT]/g;
The "() =" forces the "m" to be in list context. Assigning a list with N elements to a list of zero variables doesn't actually do anything. But then when this assignment expression is used in a scalar context ($ft_count = ...), the right "=" operator returns the number of elements from its right-hand side - exactly what you want.
This is incredibly weird when first encountered, but the "=()=" idiom is a useful Perl trick to know, for "evaluate in list context, then get size of list".
Note: I have no data on which of these are more efficient when dealing with large strings. In fact, I suspect your original code might be best in that case.
Yes, you can use the CountOf secret operator:
my $ft_count = ()= $str =~ m/[FT]/g;
You can combine line 2, 3 and 4 into one like so:
my $str = "GGGFFEEIIEETTGGG";
print $str =~ s/[FT]//g; #Output 4;

Parsing attributes with regex in Perl

Here's a problem I ran into recently. I have attributes strings of the form
"x=1 and y=abc and z=c4g and ..."
Some attributes have numeric values, some have alpha values, some have mixed, some have dates, etc.
Every string is supposed to have "x=someval and y=anotherval" at the beginning, but some don't. I have three things I need to do.
Validate the strings to be certain that they have x and y.
Actually parse the values for x and y.
Get the rest of the string.
Given the example at the top, this would result in the following variables:
$x = 1;
$y = "abc";
$remainder = "z=c4g and ..."
My question is: Is there a (reasonably) simple way to parse these and validate with a single regular expression? i.e.:
if ($str =~ /someexpression/)
{
$x = $1;
$y = $2;
$remainder = $3;
}
Note that the string may consist of only x and y attributes. This is a valid string.
I'll post my solution as an answer, but it doesn't meet my single-regex preference.
Assuming you also want to do something with the other name=value pairs this is how I would do it ( using Perl version 5.10 ):
use 5.10.0;
use strict;
use warnings;
my %hash;
while(
$string =~ m{
(?: ^ | \G ) # start of string or previous match
\s*
(?<key> \w+ ) # word characters
=
(?<value> \S+ ) # non spaces
\s* # get to the start of the next match
(?: and )?
}xgi
){
$hash{$+{key}} = $+{value};
}
# to make sure that x & y exist
die unless exists $hash{x} and exists $hash{y};
On older Perls ( at least Perl 5.6 );
use strict;
use warnings;
my %hash;
while(
$string =~ m{
(?: ^ | \G ) # start of string or previous match
\s*
( \w+ ) = ( \S+ )
\s* # get to the start of the next match
(?: and )?
}xgi
){
$hash{$1} = $2;
}
# to make sure that x & y exist
die unless exists $hash{x} and exists $hash{y};
These have the added benefit of continuing to work if you need to work with more data.
I'm not the best at regular expressions, but this seems pretty close to what you're looking for:
/x=(.+) and y=([^ ]+)( and (.*))?/
Except you use $1, $2, and $4. In use:
my #strs = ("x=1 and y=abc and z=c4g and w=v4l",
"x=yes and y=no",
"z=nox and w=noy");
foreach (#strs) {
if ($_ =~ /x=(.+) and y=([^ ]+)( and (.*))?/) {
$x = $1;
$y = $2;
$remainder = $4;
print "x: $x; y: $y; remainder: $remainder\n";
} else {
print "Failed.\n";
}
}
Output:
x: 1; y: abc; remainder: z=c4g and w=v4l
x: yes; y: no; remainder:
Failed.
This of course leaves out plenty of error checking, and I don't know everything about your inputs, but this seems to work.
As a fairly simple modification to Rudd's version,
/^x=(.+) and y=([^ ]+)(?: and (.*))?/
will allow you to use $1, $2 and $3 (the ?: makes it a noncapturing group), and will ensure that the string starts with "x=" rather than allowing a "not_x=" to match
If you have better knowledge of what the x and y values will be, this should be used to tighten the regex further:
my #strs = ("x=1 and y=abc and z=c4g and w=v4l",
"x=yes and y=no",
"z=nox and w=noy",
"not-x=nox and y=present",
"x=yes and w='there is no and y=something arg here'");
foreach (#strs) {
if ($_ =~ /^x=(.+) and y=([^ ]+)(?: and (.*))?/) {
$x = $1;
$y = $2;
$remainder = $3;
print "x: {$x}; y: {$y}; remainder: {$remainder}\n";
} else {
print "$_ Failed.\n";
}
}
Output:
x: {1}; y: {abc}; remainder: {z=c4g and w=v4l}
x: {yes}; y: {no}; remainder: {}
z=nox and w=noy Failed.
not-x=nox and y=present Failed.
x: {yes and w='there is no}; y: {something}; remainder: {}
Note that the missing part of the last test is due to the current version of the y test requiring no spaces, if the x test had the same restriction that string would have failed.
Rudd and Cebjyre have gotten you most of the way there but they both have certain problems:
Rudd suggested:
/x=(.+) and y=([^ ]+)( and (.*))?/
Cebjyre modified it to:
/^x=(.+) and y=([^ ]+)(?: and (.*))?/
The second version is better because it will not confuse "not_x=foo" with "x=foo" but will accept things such as "x=foo z=bar y=baz" and set $1 = "foo z=bar" which is undesirable.
This is probably what you are looking for:
/^x=(\w+) and y=(\w+)(?: and (.*))?/
This disallows anything between the x= and y= options, places and allows and optional " and..." which will be in $3
Here's basically what I did to solve this:
($x_str, $y_str, $remainder) = split(/ and /, $str, 3);
if ($x_str !~ /x=(.*)/)
{
# error
}
$x = $1;
if ($y_str !~ /y=(.*)/)
{
# error
}
$y = $1;
I've omitted some additional validation and error handling. This technique works, but it's not as concise or pretty as I would have liked. I'm hoping someone will have a better suggestion for me.