Checking version number validity with Perl - regex

I'm struggling with checking the validity of version numbers in Perl. Correct version number is like this:
Starts with either v or ver,
After that a number, if it is 0, then no other numbers are allowed in this part (e.g. 10, 3993 and 0 are ok, 01 is not),
After that a full stop, a number, full stop, number, full stop and number.
I.e. a valid version number could look something like v0.123.45.678 or ver18.493.039.1.
I came up with the following regexp:
if ($ver_string !~ m/^v(er)?(0{1}\.)|([1-9]+\d*\.)\d+\.\d+\.\d+/)
{
#print error
}
But this does not work, because a version number like verer01.34.56.78 gets accepted. I can't understand this, I know Perl tends to be greedy, but shouldn't ^v(er)? make sure that there can be a max of one "er"? And why doesn't 0{1}. match only "0.", instead of accepting "01." as well?
This regex actually catched the "rere" thing: m/^v(er)?[0-9.]+/ but I can't see where I allow it in my attempt.

Your problem is that the or - | - you are using is splitting the whole pattern in two. A | will scope to brackets or the end of an expression rather than just on the two neighbouring items.
You need to put some extra brackets two show which part of the expression you want or-ed. So a first step to fixing your pattern would be:
^v(er)?((0{1}\.)|([1-9]+\d*\.))\d+\.\d+\.\d+
You also want to put a $ at the end to ensure there are no spurious characters at the end of the version number.
Also, putting {1} is unnecessary is it means the previous item exactly once which is the default. However you could use {3} at the end of your pattern as you want three dot-digit groups at the end.
Similarly, you don't need the + after the [1-9] as other digits will be grabbed by the \d*.
And we can also remove the unnessary brackets.
So you can simplify your patten to the following:
^v(er)?(0|[1-9]\d*)(.\d+){3}$

You could do it with a single regexp, or you could do it in 2 steps, the second step being to check that the first number doesn't start with a 0.
BTW, I tend to use [0-9] instead of \d for numbers, there are lots of characters that are classified as numbers in the Unicode standard (and thus in Perl) that you may not want to deal with.
Here is a sample code, the version_ok sub is where everything happens.
#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 7;
while( <DATA>)
{ chomp;
my( $version, $expected)= split /\s*=>\s*/;
is( version_ok( $version), $expected, $version);
}
sub version_ok
{ my( $version)=#_;
if( $version=~ m{^v(?:er)? # starts with v or ver
([0-9]+) # the first number
(?:\.[0-9]+){3} # 3 times dot number
$}x) # end
{ if( $1 =~ m{^0[0-9]})
{ return 0; } # no good: first number starts with 0
else
{ return 1; }
}
else
{ return 0; }
}
__DATA__
v0.123.45.678 => 1
ver18.493.039.1 => 1
verer01.34.56.78 => 0
v01.5.5.5 => 0
ver101.5.5.5 => 1
ver101.5.5. => 0
ver101.5.5 => 0

The regex may work for your test cases, but the CPAN module Perl::Version would seem like the best option, with two caveats:
haven't tried it myself
seems like the latest module release was in 2007 - kind of makes it a recursive problem

Related

How can I match only integers in Perl?

So I have an array that goes like this:
my #nums = (1,2,12,24,48,120,360);
I want to check if there is an element that is not an integer inside that array without using loop. It goes like this:
if(grep(!/[^0-9]|\^$/,#nums)){
die "Numbers are not in correct format.";
}else{
#Do something
}
Basically, the format should not be like this (Empty string is acceptable):
1A
A2
#A
#
#######
More examples:
1,2,3,A3 = Unacceptable
1,2,###,2 = unacceptable
1,2,3A,4 = Unacceptable
1, ,3,4=Acceptable
1,2,3,360 = acceptable
I know that there is another way by using look like a number. But I can't use that for some reason (outside of my control/setup reasons). That's why I used the regex method.
My question is, even though the numbers are in not correct format (A60 for example), the condition always return False. Basically, it ignores the incorrect format.
You say in the comments that you don't want to use modules because you can't install them, but there are many core modules that should come with Perl (although some systems screw this up).
zdim's answer in the comments is to look for anything that is not 0, 1, 2, 3, 4, 5, 6, 7, 8, or 9. That's the negated character class [^0-9]. A grep in scalar context returns the number of items that match:
my $found_non_ints = grep { /[^0-9]/ } #items;
Instead of that, I'd go back to the non-negated character class and match string that only has zero or more digits. To do this, anchor the pattern to the absolute start and end of the string:
my $found_non_ints = grep { ! /\A[0-9]*\z/ } #items;
But, this doesn't really match integers. It matches positive whole numbers (and zero). If you want to match negative numbers as well, allow an optional - at the start of the string:
my $found_non_ints = grep { ! /\A-?[0-9]*\z/ } #items;
That - would be a problem in the negated character class.
Also, you don't want the $ anchor here: that allows a possible newline to match at the end, and that's a non-digit (the \Z is the same for the end of the string). Also, the meaning of $ can change based on the setting of the /m flag, which might be set with default regex flags.
Here's a short program with your sample data. Note that you need to decide how to split up the list; does whitespace matter? I decided to remove whitespace around the comma:
#!perl
use v5.10;
while( <DATA> ) {
chomp;
my $found_non_ints = grep { ! /\A[0-9]*\z/ } split /\s*,\s*/;
say "$_ => Found $found_non_ints non-ints";
}
__DATA__
1A
A2
#A
#
1,2,3,A3
1,2,###,2
1,2,3A,4
1, ,3,4
1,,3,4
1,2,3,360
The solution proposed in the question gets close, except that the logic got reversed and there is an error in a regex pattern. One way for it:
if ( grep { /[^0-9] | ^$/x } #nums ) { say 'not all integers' }
Regex explanation
[] is a character class: it matches any one of the characters listed inside (so [abc] matches either of a, b, or c) -- but when it starts with a ^ it matches any character not listed; so [^abc] matches any char not being either of a, b, or c. The pattern 0-9 inside a character class specifies all digits in that range (and we can also use a-z and A-Z)
So [^0-9] matches any character that is not a digit
Then that is or-ed by | with a ^$: ^ matches beginning of the string and $ is for the end of it. So ^$ match a string without anything -- an empty string! We need to account for that as [^0-9] doesn't while an array element can be an empty string. (It can also be a undef but from my understanding that is not possible with actual data, and a regex on undef would draw a warning.)
Note that $ allows for a newline as well, and that ^ and $ may change their meaning if /m modifier is in use, matching on linefeeds inside a string. However, in all these cases we'd be matching a non-digit, which is precisely the point here
/x modifier makes it disregard literal spaces inside so we can space things out for easier reading. (It also allows for newlines and comments with #, so complex patterns can be organized and documented very nicely)
So that's all -- the regex tries to match anything that shouldn't be in an integer (assumed to be strictly positive in OP's data).
If it matches any such, in any one of the array elements, then grep returns a list which isn't empty (but has at least one element) and that is "true" under if. So we caught a non-integer and we go into if's block to deal with that.
A little aside: we can also declare and populate an array right inside the if condition, to catch all those non-integers:
if ( my #non_ints = grep { /[^0-9] | ^$/x } #nums ) {
say 'Non-integers: ', join ' ', map { "|$_|" } #non_ints;
}
This also reads more nicely, telling by the array name what we're after in that complicated condition: "non_ints." I put || around each item in print to be able to see an empty string.†
Now, when you put an exclamation mark in front of that regex, it reverses the true/false return from the regex and our code goes haywire. So drop that !.
The other error is in escaping the ^ by having \^. This would match a literal ^ character, robbing ^ of its special meaning as a pattern in regex, explained above. So drop that \.
One other way is in using an extremely useful List::Util library, which is "core" (so it is normally installed with Perl, even though that can get messed up).
Among a number of essential functions it gives us any, and with it we have
use List::Util qw(any);
if ( any { /[^0-9]|^$/ } #nums ) { say 'not all integers' }
I like any firstly because the name of the function includes at least a part of the needed logic, making code that much clearer and easier to comprehend: is there any element of #nums for which the code in the block is true? So any element which contains a non-digit? Precisely what is needed here.
Then, another advantage is that any will quit as soon as it finds one match, while grep continues through the whole list. But this efficiency advantage shows only on very large arrays or a lot of repeated checks. Also, on the other hand sometimes we want to count all instances.
I'd also like to point out some of any's siblings: none and notall. These names themselves also capture a good deal of logic, making otherwise possibly convoluted code that much clearer. Browse through this library to get accustomed to what is in there.
† A program with your test data
use warnings;
use strict;
use feature 'say';
while (<DATA>) {
chomp;
my #nums = split /\s*,\s*/;
say "Data: #nums";
if ( my #non_ints = grep { /[^0-9] | ^$/x } #nums ) {
say 'Non-ints: ', join ' ', map { "|$_|" } #non_ints;
}
say '---';
}
__DATA__
1A
A2
#A
#
1,2,3,A3
1,2,###,2
1,2,3A,4
1, ,3,4
1,2,3,360

Using perl Regular expressions I want to make sure a number comes in order

I want to use a regular expression to check a string to make sure 4 and 5 are in order. I thought I could do this by doing
'$string =~ m/.45./'
I think I am going wrong somewhere. I am very new to Perl. I would honestly like to put it in an array and search through it and find out that way, but I'm assuming there is a much easier way to do it with regex.
print "input please:\n";
$input = <STDIN>;
chop($input);
if ($input =~ m/45/ and $input =~ m/5./) {
print "works";
}
else {
print "nata";
}
EDIT: Added Info
I just want 4 and 5 in order, but if 5 comes before at all say 322195458900023 is the number then where 545 is a problem 5 always have to come right after 4.
Assuming you want to match any string that contains two digits where the first digit is smaller than the second:
There is an obscure feature called "postponed regular expressions". We can include code inside a regular expression with
(??{CODE})
and the value of that code is interpolated into the regex.
The special verb (*FAIL) makes sure that the match fails (in fact only the current branch). We can combine this into following one-liner:
perl -ne'print /(\d)(\d)(??{$1<$2 ? "" : "(*FAIL)"})/ ? "yes\n" :"no\n"'
It prints yes when the current line contains two digits where the first digit is smaller than the second digit, and no when this is not the case.
The regex explained:
m{
(\d) # match a number, save it in $1
(\d) # match another number, save it in $2
(??{ # start postponed regex
$1 < $2 # if $1 is smaller than $2
? "" # then return the empty string (i.e. succeed)
: "(*FAIL)" # else return the *FAIL verb
}) # close postponed regex
}x; # /x modifier so I could use spaces and comments
However, this is a bit advanced and masochistic; using an array is (1) far easier to understand, and (2) probably better anyway. But it is still possible using only regexes.
Edit
Here is a way to make sure that no 5 is followed by a 4:
/^(?:[^5]+|5(?=[^4]|$))*$/
This reads as: The string is composed from any number (zero or more) characters that are not a five, or a five that is followed by either a character that is not a four or the five is the end of the string.
This regex is also a possibility:
/^(?:[^45]+|45)*$/
it allows any characters in the string that are not 4 or 5, or the sequence 45. I.e., there are no single 4s or 5s allowed.
You just need to match all 5 and search fails, where preceded is not 4:
if( $str =~ /(?<!4)5/ ) {
#Fail
}

Is there a way to evaluate the number of times a Perl regular expression has matched?

I've been poring over perldoc perlre as well as the Regular Expressions Cookbook and related questions on Stack Overflow and I can't seem to find what appears to be a very useful expression: how do I know the number of current match?
There are expressions for the last closed group match ($^N), contents of match 3 (\g{3} if I understood the docs correctly), $', $& and $`. But there doesn't seem to be a variable I can use that simply tells me what the number of the current match is.
Is it really missing? If so, is there any explained technical reason why it is a hard thing to implement, or am I just not reading the perldoc carefully enough?
Please note that I'm interested in a built-in variable, NOT workarounds like using (${$count++}).
For context, I'm trying to build a regular expression that would match only some instances of a match (e.g. match all occurrences of character "E" but do NOT match occurrences 3, 7 and 10 where 3, 7 and 10 are simply numbers in an array). I ran into this when trying to construct a more idiomatic answer to this SO question.
I want to avoid evaluating regexes as strings to actually insert 3, 7 and 10 into the regex itself.
I'm completely ignoring the actually utility or wisdom of using this for the other question.
I thought #- or #+ might do what you want since they hold the offsets of the numbered matches, but it looks like the regex engine already knows what the last index will be:
use v5.14;
use Data::Printer;
$_ = 'abc123abc345abc765abc987abc123';
my #matches = m/
([0-9]+)
(?{
print 'Matched \$' . $#+ . " group with $^N\n";
say p(#+);
})
.*?
([0-9]+)
(?{
print 'Matched \$' . $#+ . " group with $^N\n";
say p(#+);
})
/x;
say "Matches: #matches";
This gives strings that show the last index as 2 even though it hasn't matched $2 yet.
Matched \$2 group with 123
[
[0] 6,
[1] 6,
[2] undef
]
Matched \$2 group with 345
[
[0] 12,
[1] 6,
[2] 12
]
Matches: 123 345
Notice that the first time around, $+[2] is undef, so that one hasn't been filled in yet. You might be able to do something with that, but I think that's probably getting away from the spirit of your question. If you were really fancy, you could create a tied scalar that has the value of the last defined index in #+, I guess.
I played around with this for a bit. Again, I know that this is not really what you are looking for, but I don't think that exists in the way you want it.
I had two thoughts. First, with a split using separator retention mode, you get the interstitial bits as the odd numbered elements in the output list. With the list from the split, you count which match you are on and put it back together how you like:
use v5.14;
$_ = 'ab1cdef2gh3ij4k5lmn6op7qr8stu9vw10xyz';
my #bits = split /(\d+)/; # separator retention mode
my #skips = qw(3 7 10);
my $s;
while( my( $index, $value ) = each #bits ) {
# shift indices to match number ( index = 2 n - 1 )
if( $index % 2 and ! ( ( $index + 1 )/2 ~~ #skips ) ) {
$s .= '^';
}
else {
$s .= $value;
}
}
I get:
ab^cdef^gh3ij^k^lmn^op7qr^stu^vw10xyz
I thought I really liked my split answer until I had the second thought. Does state work inside a substitution? It appears that it does:
use v5.14;
$_ = 'ab1cdef2gh3ij4k5lmn6op7qr8stu9vw10xyz';
my #skips = qw(3 7 10);
s/(\d+)/
state $n = 0;
$n++;
$n ~~ #skips ? $1 : '$'
/eg;
say;
This gives me:
ab$cdef$gh3ij$k$lmn$op7qr$stu$vw10xyz
I don't think you can get much simpler than that, even if that magic variable existed.
I had a third thought which I didn't try. I wonder if state works inside a code assertion. It might, but then I'd have to figure out how to use one of those to make a match fail, which really means it has to skip over the bit that might have matched. That seems really complicated, which is probably what Borodin was pressuring you to show even in pseudocode.

How to replace lookahead in regex?

I wrote a regex that validates an input string. It must have a minimum length of 8 chars (composed by alphanumeric and punctuation chars) and it must have at least one digit and one alphabetic char. So I've come up with the regex:
^(?=.*[0-9])(?=.*[a-zA-Z])[a-zA-Z0-9-,._;:]{8,}$
Now I have to rewrite this regex in a language that doesn't support lookahead, how should I rewrite that regex?
Valid inputs are:
1foo,bar
foo,bar1
1fooobar
foooobar1
fooo11bar
1234x567
a1234567
Invalid inputs:
fooo,bar
1234-567
.1234567
There are two approaches. One is to compose a single expression which handles all possible alternatives:
^[a-zA-Z][0-9][a-zA-Z0-9-,._;:]{6,}$
|
^[a-zA-Z][a-zA-Z0-9-,._;:][0-9][a-zA-Z0-9-,._;:]{5,}$
|
^[a-zA-Z][a-zA-Z0-9-,._;:]{2}[0-9][a-zA-Z0-9-,._;:]{4,}$
etc. This is a combinatoric nightmare, but it would work.
A much simpler approach is to validate the same string twice using two expressions:
^[a-zA-Z0-9-,._;:]{8,}$ # check length and permitted characters
and
[a-zA-Z].*[0-9]|[0-9].*[a-zA-Z] # check required characters
EDIT: #briandfoy correctly points out that it will be more efficient to search for each required character separately:
[a-zA-Z] # check for required alpha
and
[0-9] # check for required digit
This question was original tagged as perl, and that's how I answered it. For the oracle stuff, I have no idea how you'd do the same thing. However, I'd try to validate this stuff before it got that far.
I wouldn't do this in one regular expression. When you decide to change the rules, you'll have the same amount of work to craft the new regular expression. I wouldn't use lookarounds for this even if they were available since I wouldn't want to tolerate all the backtracking.
This looks like it's a lot of code, but the part that addresses your problem is just the subroutine. It has very simple patterns. When the password rules change, you add or delete patterns. It might be worth it to use study, but I didn't investigate that:
use v5.10;
use strict;
use Test::More;
my #valids = qw(
1foo,bar
foo,bar1
1fooobar
foooobar1
fooo11bar
);
my #invalids = qw(
fooo,bar
short
nodigitbutlong
12345678
,,,,,,,,
);
sub is_good_password {
my( $password ) = #_;
state $rules = [
qr/\A[A-Z0-9,._;:-]{8,}\z/i,
qr/[0-9]/,
qr/[A-Z]/i,
];
foreach my $rule ( #$rules ) {
return 0 unless $password =~ $rule;
}
return 1;
}
foreach my $valid ( #valids ) {
ok( is_good_password( $valid ), "Password $valid is valid" );
}
foreach my $invalid ( #invalids ) {
ok( ! is_good_password( $invalid ), "Password $invalid is invalid" );
}
done_testing();
The best I can come up with right now is
(.*[a-zA-Z].*[0-9].*|.*[0-9].*[a-zA-Z].*)
But you have to check the length of the string separately.
I would play around with these ideas to get the best performance:
should be faster for short valid inputs, but will be slower (backtrack) for inputs like "0a000000000000000000" or "aaaaaaaaaaaaaaa":
regexp_like(regexp_substr(input_string, '^[a-zA-Z0-9_,.;:-]{8,}$'),
'[0-9].*[a-zA-Z]|[a-zA-Z].*[0-9]')
should be faster if there are a lot of invalid inputs (don't miss [^...] on the 2nd line):
(length(input_string) >= 8 and
not regexp_like(input_string, '[^a-zA-Z0-9_,.;:-]') and
regexp_like(input_string, '[a-zA-Z]') and
regexp_like(input_string, '[0-9]'))

Regex to check fix length field with packed space

Say I have a text file to parse, which contains some fixed length content:
123jackysee 45678887
456charliewong 32145644
<3><------16------><--8---> # Not part of the data.
The first three characters is ID, then 16 characters user name, then 8 digit phone number.
I would like to write a regular expression to match and verify the input for each line, the one I come up with:
(\d{3})([A-Za-z ]{16})(\d{8})
The user name should contains 8-16 characters. But ([A-Za-z ]{16}) would also match null value or space. I think of ([A-Za-z]{8,16} {0,8}) but it would detect more than 16 characters. Any suggestions?
No, no, no, no! :-)
Why do people insist on trying to pack so much functionality into a single RE or SQL statement?
My suggestion, do something like:
Ensure the length is 27.
Extract the three components into separate strings (0-2, 3-18, 19-26).
Check that the first matches "\d{3}".
Check that the second matches "[A-Za-z]{8,} *".
Check that the third matches "\d{8}".
If you want the entire check to fit on one line of source code, put it in a function, isValidLine(), and call it.
Even something like this would do the trick:
def isValidLine(s):
if s.len() != 27 return false
return s.match("^\d{3}[A-za-z]{8,} *\d{8}$"):
Don't be fooled into thinking that's clean Python code, it's actually PaxLang, my own proprietary pseudo-code. Hopefully, it's clear enough, the first line checks to see that the length is 27, the second that it matches the given RE.
The middle field is automatically 16 characters total due to the first line and the fact that the other two fields are fixed-length in the RE. The RE also ensures that it's eight or more alphas followed by the right number of spaces.
To do this sort of thing with a single RE would be some monstrosity like:
^\d{3}(([A-za-z]{8} {8})
|([A-za-z]{9} {7})
|([A-za-z]{10} {6})
|([A-za-z]{11} {5})
|([A-za-z]{12} )
|([A-za-z]{13} )
|([A-za-z]{14} )
|([A-za-z]{15} )
|([A-za-z]{16}))
\d{8}$
You could do it by ensuring it passes two separate REs:
^\d{3}[A-za-z]{8,} *\d{8}$
^.{27}$
but, since that last one is simply a length check, it's no different to the isValidLine() above.
I would use the regex you suggested with a small addition:
(\d{3})([A-Za-z]{3,16} {0,13})(\d{8})
which will match things that have a non-whitespace username but still allow space padding. The only addition is that you would then have to check the length of each input to verify the correct number of characters.
Hmm... Depending on the exact version of Regex you're running, consider:
(?P<id>\d{3})(?=[A-Za-z\s]{16}\d)(?P<username>[A-Za-z]{8,16})\s*(?P<phone>\d{8})
Note 100% sure this will work, and I've used the whitespace escape char instead of an actual space - I get nervous with just the space character myself, but you may want to be more restrictive.
See if it works. I'm only intermediate with RegEx myself, so I might be in error.
Check out the named groups syntax for your version of RegEx a) exists and b) matches the standard I've used above.
EDIT:
Just to expand what I'm trying to do (sorry to make your eyes bleed, Pax!) for those without a lot of RegEx experience:
(?P<id>\d{3})
This will try to match a named capture group - 'id' - that is three digits in length. Most versions of RegEx let you use named capture groups to extract the values you matched against. This lets you do validation and data capture at the same time. Different versions of RegEx have slightly different syntaxes for this - check out http://www.regular-expressions.info/named.html for more detail regarding your particular implementation.
(?=[A-Za-z\s]{16}\d)
The ?= is a lookahead operator. This looks ahead for the next sixteen characters, and will return true if they are all letters or whitespace characters AND are followed by a digit. The lookahead operator is zero length, so it doesn't actually return anything. Your RegEx string keeps going from the point the Lookahead started. Check out http://www.regular-expressions.info/lookaround.html for more detail on lookahead.
(?P<username>[A-Za-z]{8,16})\s*
If the lookahead passes, then we keep counting from the fourth character in. We want to find eight-to-sixteen characters, followed by zero or more whitespaces. The 'or more' is actually safe, as we've already made sure in the lookahead that there can't be more than sixteen characters in total before the next digit.
Finally,
(?P<phone>\d{8})
This should check the eight-digit phone number.
I'm a bit nervous that this won't exactly work - your version of RegEx may not support the named group syntax or the lookahead syntax that I'm used to.
I'm also a bit nervous that this Regex will successfully match an empty string. Different versions of Regex handle empty strings differently.
You may also want to consider anchoring this Regex between a ^ and $ to ensure you're matching against the whole line, and not just part of a bigger line.
Assuming you mean perl regex and if you allow '_' in the username:
perl -ne 'exit 1 unless /(\d{3})(\w{8,16})\s+(\d{8})/ && length == 28'
#OP,not every problem needs a regex. your problem is pretty simple to check. depending on what language you are using, they would have some sort of built in string functions. use them.
the following minimal example is done in Python.
import sys
for line in open("file"):
line=line.strip()
# check first 3 char for digit
if not line[0:3].isdigit(): sys.exit()
# check length of username.
if len(line[3:18]) <8 or len(line[3:18]) > 16: sys.exit()
# check phone number length and whether they are digits.
if len(line[19:26]) == 8 and not line[19:26].isdigit(): sys.exit()
print line
I also don't think you should try to pack all the functionality into a single regex. Here is one way to do it:
#!/usr/bin/perl
use strict;
use warnings;
while ( <DATA> ) {
chomp;
last unless /\S/;
my #fields = split;
if (
( my ($id, $name) = $fields[0] =~ /^([0-9]{3})([A-Za-z]{8,16})$/ )
and ( my ($phone) = $fields[1] =~ /^([0-9]{8})$/ )
) {
print "ID=$id\nNAME=$name\nPHONE=$phone\n";
}
else {
warn "Invalid line: $_\n";
}
}
__DATA__
123jackysee 45678887
456charliewong 32145644
678sdjkfhsdjhksadkjfhsdjjh 12345678
And here is another way:
#!/usr/bin/perl
use strict;
use warnings;
while ( <DATA> ) {
chomp;
last unless /\S/;
my ($id, $name, $phone) = unpack 'A3A16A8';
if ( is_valid_id($id)
and is_valid_name($name)
and is_valid_phone($phone)
) {
print "ID=$id\nNAME=$name\nPHONE=$phone\n";
}
else {
warn "Invalid line: $_\n";
}
}
sub is_valid_id { ($_[0]) = ($_[0] =~ /^([0-9]{3})$/) }
sub is_valid_name { ($_[0]) = ($_[0] =~ /^([A-Za-z]{8,16})\s*$/) }
sub is_valid_phone { ($_[0]) = ($_[0] =~ /^([0-9]{8})$/) }
__DATA__
123jackysee 45678887
456charliewong 32145644
678sdjkfhsdjhksadkjfhsdjjh 12345678
Generalizing:
#!/usr/bin/perl
use strict;
use warnings;
my %validators = (
id => make_validator( qr/^([0-9]{3})$/ ),
name => make_validator( qr/^([A-Za-z]{8,16})\s*$/ ),
phone => make_validator( qr/^([0-9]{8})$/ ),
);
INPUT:
while ( <DATA> ) {
chomp;
last unless /\S/;
my %fields;
#fields{qw(id name phone)} = unpack 'A3A16A8';
for my $field ( keys %fields ) {
unless ( $validators{$field}->($fields{$field}) ) {
warn "Invalid line: $_\n";
next INPUT;
}
}
print "$_ : $fields{$_}\n" for qw(id name phone);
}
sub make_validator {
my ($re) = #_;
return sub { ($_[0]) = ($_[0] =~ $re) };
}
__DATA__
123jackysee 45678887
456charliewong 32145644
678sdjkfhsdjhksadkjfhsdjjh 12345678
You can use lookahead: ^(\d{3})((?=[a-zA-Z]{8,})([a-zA-Z ]{16}))(\d{8})$
Testing:
123jackysee 45678887 Match
456charliewong 32145644 Match
789jop 12345678 No Match - username too short
999abcdefghijabcde12345678 No Match - username 'column' is less that 16 characters
999abcdefghijabcdef12345678 Match
999abcdefghijabcdefg12345678 No Match - username column more that 16 characters