Perl comprehensive phone number regex [duplicate] - regex

This question already has answers here:
How to validate phone numbers using regex
(43 answers)
Closed 4 years ago.
I have a file that contains phone numbers of the following formats:
(xxx) xxx.xxxx
(xxx).xxx.xxxx
(xxx) xxx-xxxx
(xxx)-xxx-xxxx
xxx.xxx.xxxx
xxx-xxx-xxxx
xxx xxx-xxxx
xxx xxx.xxxx
I must parse the file for phone numbers of those and ONLY those formats, and output them to a separate file. I'm using perl, and so far I have what I think is a valid regex for two of these numbers
my $phone_regex = qr/^(\d{3}\-)?(\(\d{3}\))?\d{3}\-\d{4}$/;
But I'm not sure if this is correct, or how to do the rest all in one regex. Thank you!

Here you go
\(?\d{3}\)?[-. ]\d{3}[-. ]\d{4}
See a demo on regex101.com.
Broken down this is
\(? # "(", optional
\d{3} # three digits
\)? # ")", optional
[-. ] # one of "-", "." or " "
\d{3} # three digits
[-. ] # same as above
\d{4} # four digits
If you want, you can add word boundaries on the right site (\b), some potential matches may be filtered out then.

You haven't escaped parenthesis properly and have uselessly escaped hyphen which isn't needed. The regex you are trying to create is this,
^\(?\d{3}\)?[ .-]\d{3}[ .-]\d{4}$
Explanation:
^ -
\(? - Optional starting parenthesis (
\d{3} - Followed by three digits
\)? - Optional closing parenthesis )
[ .-] - A single character either a space or . or -
\d{3} - Followed by three digits
[ .-] - Again a single character either a space or . or -
\d{4} - Followed by four digits
$ - End of string
Demo

Your current regex allows too much, as it will allow xxx-(xxx) at the beginning. It also doesn't handle any of the . or space separated cases. You want to have only three sets of digits, and then allow optional parentheses around the first set which you can use an alternation for, and then you can make use of character classes to indicate the set of separators you want to allow.
Additionally, don't use \d as it will match any unicode digit. Since you likely only want to allow ASCII digits, use the character class [0-9] (there are other options, but this is the simplest).
Finally, $ allows a newline at the end of the string, so use \z instead which does not. Make sure if you are reading these from a file that you chomp them so they do not contain trailing newlines.
This leaves us with:
qr/^(?:[0-9]{3}|\([0-9]{3}\))[-. ][0-9]{3}[-.][0-9]{4}\z/
If you want to ensure that the two separators are the same if the first is a . or -, it is easiest to do this in multiple regex checks (these can be more lenient since we already validated the general format):
if ($str =~ m/^[0-9()]+ /
or $str =~ m/^[0-9()]+\.[0-9]{3}\./
or $str =~ m/^[0-9()]+-[0-9]{3}-/) {
# allowed
}

Related

Accept only numbers but ignore if two number groups have spaces in between them [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
I'm trying to develop a regex with the following rules:
it should accept solely numbers,
if the string contains any letters or any other special characters, the whole string should be rejected,
regarding spaces, there should only be one consecutive number group, which can be surrounded by spaces,
if there are more than one consecutive number group, with spaces in between the groups, that whole string should be rejected.
Example Cases:
accepted:
1234
[SPACE][SPACE]111[SPACE]
[SPACE]111[SPACE][SPACE]
declined:
1a234
aa1234aa
1234a
12#4
[SPACE]11[SPACE]111
[SPACE]11[SPACE]111#
So far, I've come up with this ([0-9]+[^\s]*) which can be seen here.
What modifications do I have to do to achieve the scenario I want above?
Use this:
^\s*\d+\s*$
All we need to do is accept one or more digits bounded by zero or more spaces on either side.
EDIT:
Just add a capturing group around the digits to use them later:
^\s*(\d+)\s*$
Demo
The pattern you tried ([0-9]+[^\s]*) matches 1+ digits and 0+ times a non whitespace character using a negated character class [^\s]* matching any character except a whitespace char (So it would match aa)
It can match multiple times in the same string as there are no anchors asserting the start ^ and the end $ of the string.
If you want to match spaces, instead of matching \s which could also match newlines, you could match a single space and repeat that 0+ times on the left and on the right side.
^ *[0-9]+ *$
Regex demo
If you only need the digits, you could use a capturing group
^ *([0-9]+) *$
Regex demo
^\s*[0-9]+\s*$
notice that I've used [0-9] instead of \d
[0-9] will accept only Arabic number (Western Arabic Number)
\d may accept all form of digit in unicode like Eastern Arabic Number, Thai,...etc like (١,٢,٣, ๑,๒,๓, ...etc) at least this is the case in XSD regex when its validate XML file.

Only allow 2 digits in a string using regex

I need regex that only allows a maximum of 2 digits (or whatever the desired limit is actually) to be entered into an input field.
The requirements for the field are as follows:
Allow a-z A-Z
Allow 0-9
Allow - and . characters
Allow spaces (\s)
Do not allow more than 2 digits
Do not allow any other special characters
I have managed to put together the following regex based on several answers on SO:
^(?:([a-zA-z\d\s\.\-])(?!([a-zA-Z]*\d.*){3}))*$
The above regex is really close. It works successfully for the following:
test 12 test
test12
test-test.12
But it allows an input of:
123 (but not 1234, so it's close).
It only needs to allow an input of 12 when only digits are entered into the field.
I would like some help in finding a more efficient and cleaner (if possible) solution than my current regex - but it must still be regex, no JS.
You could use a positive lookahead like
(?=^(?:\D*\d\D*){2}$) # only two digits
^[- .\w]+$ # allowed characters
See a demo on regex101.com.
You may use a negative lookahead anchored at the start that will make the match fail once there are 3 digits found anywhere in the string:
^(?!(?:[^0-9]*[0-9]){3})[a-zA-Z0-9\s.-]*$
^^^^^^^^^^^^^^^^^^^^^^^
See the regex demo
Details:
^ - start of string
(?!(?:[^0-9]*[0-9]){3}) - the negative lookahead failing the match if exactly 3 following sequences are found:
[^0-9]* - zero or more chars other than digits
[0-9] - a digit (thus, the digits do not have to be adjoining)
[a-zA-Z0-9\s.-]* - 0+ ASCII letters, digits, whitespace, . or - symbols
$ - end of string.

Regex perl with letters and numbers

I need to extract a strings from a text file that contains both letters and numbers. The lines start like this
Report filename: ABCL00-67900010079415.rpt ______________________
All I need is the last 8 numbers so in this example that would be 10079415
while(<DATA>){
if (/Report filename/) {
my ($bagID) = ( m/(\d{8}+)./ );
print $bagID;
}
Right now this prints out the first 8 but I want the last 8.
You just need to escape the dot, so that it would match the 8 digit characters which exists before the dot charcater.
my ($bagID) = ( m/(\d{8}+)\./ );
. is a special character in regex which matches any character. In-order to match a literal dot, you must need to escape that.
To match the last of anything, just precede it with a wildcard that will match as many characters as possible
my ($bag_id) = / .* (\d{8}) /x
Note that I have also use the /x modifier so that the regex can contain insignificant whitespace for readability. Also, your \d{8}+ is what is called a possessive quantifier; it is used for optimising some regex constructions and makes no difference at the end of the pattern

Regexp pattern Optional character [duplicate]

This question already has answers here:
Regex to allow numbers and only one hyphen in the middle
(3 answers)
Closed last year.
I want to match a string like 19740103-0379 or 197401030379, i.e the dash is optional.
How do I accomplish this with regexp?
Usually you can just use -?. Alternatively, you can use -{0,1} but you should find that ? for "zero or one occurrences of" is supported just about everywhere.
pax> echo 19740103-0379 | egrep '19740103\-?0379'
19740103-0379
pax> echo 197401030379 | egrep '19740103\-?0379'
197401030379
If you want to accept 12 digits with any number of dashes in there anywhere, you might have to do something like:
-*([0-9]-*){12}
which is basically zero or more dashes followed by 12 occurrences of (a digit followed by zero or more dashes) and will capture all sorts of wonderful things like:
--3-53453---34-4534---
(of course, you should use \d instead of [0-9] if your regex engine has support for that).
You could try different ones:
\d* matches a string consisting only of digits
\d*-\d* matches a string of format digits - dash - digits
[0-9\-]* matches a string consisting of only dashes and digits
You can combine them via | (or), so that you have for example (\d*)|(\d*-\d*): matches formats just digits and digits-dash-digits.

BEGINNER: REGEX Match numeric sequence except where the word "CODE" exists on a line

I've been able to stumble my way through regular expressions for quite some time, but alas, I cannot help a friend in need.
My "friend" is trying to match all lines in a text file that match the following criteria:
Only a 7 to 10 digit number (0123456 or 0123456789)
Only a 7 to 10 digit number, then a dash, then another two digits (0123456-01 or 0123456789-01)
Match any of the above except where the words Code/code or Passcode/passcode is before the numbers to match (Such as "Access code: 16434629" or "Passcode 5253443-12")
EDIT: Only need the numbers that match, nothing else.
Here is the nastiest regex I have ever seen that "he" gave me:
^(?=.*?[^=/%:]\b\d{7,10}((\d?\d?)|(-\d\d))?\b)((?!Passcode|passcode|Code|code).)*$
...
Question: Is there a way to use a short regex to find all lines that meet the above criteria?
Assume PCRE. My friend thanks you in advance. ;-)
BTW - I have not been able to find any other questions listed in stackoverflow.com or superuser.com which can answer this question accurately.
EDIT: I'm using Kodos Python Regex Debugger to validate and test the regex.
(?<!(?:[Pp]asscode|[Cc]ode).*)[0-9]{7,10}(?:-[0-9]{2})?
Commented version:
(?<! # Begin zero-width negative lookbehind. (Makes sure the following pattern can't match before this position)
(?: # Begin non-matching group
[Pp]asscode # Either Passcode or passcode
| # OR
[Cc]ode # Either Code or code
) # End non-matching group
.* # Any characters
) # End lookbehind
[0-9]{7,10} # 7 to 10 digits
(?: # Begin non-matching group
-[0-9]{2} # dash followed by 2 digits
) # End non-matching group
? # Make last group optional
Edit: final version after comment discussion -
/^(?!\D*(?:[Pp]asscode|[Cc]ode))\D*([0-9]{7,10}(?:-[0-9]{2})?)/
(result in first capture buffer)
You can get by with a nasty regex you have to get help with ...
... or you can use two simple regexes. One that matches what you want, and one that filters what you don't want. Simpler and more readable.
Which one would you like to read?
$foo =~ /(?<!(?:[Pp]asscode|[Cc]ode).*)[0-9]{7,10}(?:-[0-9]{2})?/
or
$foo =~ /\d{7,10}(-\d{2})?/ and $foo !~ /(access |pass)code/i;
Edit: case-insensitivity.