RegEx with counting digits and allow special chars - regex

I've done some searching but cant find the right regex.
i would like a regex for a text that only contains digits, whitespaces and plus signs.
like: [0-9 +]
But with a min/max limit for only the digits in that text.
My suggestions ended up with something like this:
^[0-9 \+](?=(.*[0-9]){5,8})$
Should be OK:
"123 456 7"
"12345"
"+ 123 456 78"
Should not be ok:
"123456789"
"+ 124 578a"
"+123456789"
Anyone got a solution that might do the trick?
Edit:
I can see that i was to short on my explanation what i'm aiming for.
My regex conditions should be:
Must include between 5-8 digits
Allow whitespaces and plus signs

I'm guessing from your own regex that between 5 and 8 digits in a row without a whitespace in between are allowed. If that's true, than the following regex might do the trick (example written in Python). It allows single digit groups being between 5 and 8 digits long. If there is more than one group, it allows each group to have exactly 3 digits except for the last group which can be between 1 and 3 digits long. One single plus sign on the left is optional.
Are you parsing phone numbers? :)
In [176]: regex = re.compile(r"""
^ # start of string
(?: \+\s )? # optional plus sign followed by whitespace
(?:
(?: \d{3}\s )+ # one or more groups of three digits followed by whitespace
\d{1,3} # one group of between one and three digits
| # ALTERNATIVE
\d{5,8} # one group of between five and eight digits
)
$ # end of string
""", flags=re.X)
# --- MATCHES ---
In [177]: regex.findall('123 456 7')
Out[177]: ['123 456 7']
In [178]: regex.findall('12345')
Out[178]: ['12345']
In [179]: regex.findall('+ 123 456 78')
Out[179]: ['+ 123 456 78']
In [200]: regex.findall('12345678')
Out[200]: ['12345678']
# --- NON-MATCHES ---
In [180]: regex.findall('123456789')
Out[180]: []
In [181]: regex.findall('+ 124 578a')
Out[181]: []
In [182]: regex.findall('+123456789')
Out[182]: []
In [198]: regex.findall('123')
Out[198]: []
In [24]: regex.findall('1234 556')
Out[24]: []

You can do something like this:
^(?:[ +]*[0-9]){5}(?:(?:[ +]*[0-9])?){3}$
See it here on Regexr
The first group (?:[ +]*[0-9]){5} are the 5 minimum digits, with any amount of spaces and plus before, the second part (?:(?:[ +]*[0-9])?){3} matches the optional digits, with any amount of spaces and plus before.

You were very close - you need to anchor the lookahead to the start of input, and add a second negative lookahead for the upper bound of the quantity of digits:
^(?=(.*\d){5,8})(?!(.*\d){9,})[\d +]+$
Also, fyi you don't need to escape the plus sign within the character class, and [0-9] is \d

Related

Match all type of numbers

I need regular expression which extracts all numbers with different delimiters (single whitespace, comma, dot). Each number can use none or all of them.
Example:
text: 'numbers: 3.14 2 544 345,345.55 506 test 120 100 100'
output: '3.14', '2 544', '345,345.55', '506', '120 100 100'
I created re: \d+[(.|,|\s)\d+]+, but it not works properly.
I assume the numbers you need to extract are separated with 2 or more whitespaces, else it would be impossible to differentiate between the end of the previous number and the start of a new one.
If you need to extract the numbers in the formats as shown above, XXX XXX.XXX or XXX,XXX,XXX.XX or XXX or XXX XXX XXX, you may use
\b\d{1,3}(?:[, ]\d{3})*(?:\.\d+)?\b
See the regex demo
Details:
\b - leading word boundary
\d{1,3} - 1 to 3 digits
(?:[, ]\d{3})* - 0+ sequences of a comma or space ([, ]) and 3 digits (\d{3})
(?:\.\d+)? - an optional sequence of a dot followed with 1+ digits
\b - trailing word boundary
A less restrictive pattern would be the same as above, but with limiting quantifiers replaced with a +:
\b\d+(?:[, ]\d+)*(?:\.\d+)?\b
See this regex demo
It will also match numbers like 1234566 and 124354354.343344.

Phone regex validation for Argentina

I figured out a regular expresion for my country's phone but I've something missing.
The rule here is: (Area Code) Prefix - Sufix
Area Code could be 3 to 5 digits
Prefix could be 2 to 4 digits.
Area Code + Prefix is 7 digits long.
Sufix is always 4 digits long
Total digits are 11.
I figured I could have 3 simple regex chained with an OR "|" like this:
/(\(?\d{3}\)?[- .]?\d{4}[- .]?\d\d\d\d)|(\(?\d{4}\)?[- .]?\d{3}[- .]?\d\d\d\d)|(\(?\d{5}\)?[- .]?\d{2}[- .]?\d\d\d\d)/
The thing I'm doing wrong is that \d\d\d\d doesn't match only 4 digits for the sufix, for example: (011) 4740-5000 which is a valid phone number, works ok but if put extra digits it will also return as a valid phone number, ie: (011) 4740-5000000000
You should use ^ and $ to match whole string
For example ^\d{4}$ will match exactly 4 digits not more not less.
Here is the complete regex pattern
^((\(?\d{3}\)? \d{4})|(\(?\d{4}\)? \d{3})|(\(?\d{5}\)? \d{2}))-\d{4}$
Online demo
As per your regex pattern delimiter can be -,. or single space then try
^((\(?\d{3}\)?[-. ]?\d{4})|(\(?\d{4}\)?[-. ]?\d{3})|(\(?\d{5}\)?[-. ]?\d{2}))[-. ]?\d{4}$
This pattern works fine for me:
/^\\(?(\d{3,5})?\\)?\s?(15)?[\s|-]?(4)\d{2,3}[\s|-]?\d{4}$/
I've tested this in regex101:
/^((?:\(?\d{3}\)?[- .]?\d{4}|\(?\d{4}\)?[- .]?\d{3}|\(?\d{5}\)?[- .]?\d{2})[- .]?\d{4})$/
RegEx Demo
^ Matches the beginning of a string
( Beginning of capture group
(?: Beginning of non-capturing group
Your different options for area code & prefix
) End non-capturing group
[- .]?\d{4} The last four digits of the phone number
) End capture group
$ Matches the end of a string
If you're trying to validate such a phone number, then the following one should suit your needs:
^(?=.{15}$)[(]\d{3,5}[)] \d{2,4}-\d{4}$
Debuggex Demo
You need to match the complete expression by indicating the start and end with anchors. You also don't need alternation for the different lengths.
/^(?=(\D*\d){11}$)\(?\d{3,5}\)?[- .]?\d{2,4}[- .]?\d{4}$/
Here's the breakdown:
(?=(\D*\d){11}$) is a non-capturing group ensuring that there are 11 digits total,
with any number of non-digits amongst them
\(?\d{3,5}\)?[- .]? matches 3-5 digits in parens (area code), followed by a separator
\d{2,4}[- .]? matches 2-4 digits (prefix), followed by a separator
\d{4} matches the suffix

Regex preg_match to neutralize a pricelist, keeping only digits, dots and commas*

I am using preg_match (PHP version 5.5.*) and want to ignore all alphabetic letters [a-zA-Z] and special symbols such as $ and -, only to match numbers, commas, dots. Whitespaces between numbers such as 6 000 should be matched. Commas after a number that is not followed by another number should be ignored, such as 6, would only match 6
Note that this is used in a single string and never in a list, like the sample below. I use the list to show what input and desired output is, "per line".
Sample input:
1
1,99
1.99
10
100
5999 dollars
2 USD
$2,99
Our price 2.99
Price: $ 20
200 $
20,-
6 999 USD
Desired output:
1
1,99
1.99
10
100
5999
2
2,99
2.99
20
200
20
6 999
I have tried /([0-9.,\s]+)/ but the output of 6 999 USD becomes 6.
Edit
The code we are using looks like this:
preg_match($regex, $value, $extractions);
array_shift($extractions);
$this->persist($extractions);
Demo
Update:
If you have   instead of spaces, you can do two things..my recommended is to just do a str_replace() first:
str_replace(' ', ' ', $number);
The other option is to also check for   with the [\s,] group:
[\d.](?:[\d.]|(?:[\s,]| )(?=\d))*
Example:
preg_match('/[\d.](?:[\d.]|[\s,](?=\d))*/', $number, $matches);
$number = reset($matches);
Explanation:
So I classified the valid characters (digits, spaces, commas, and periods) into two groups: [\d.] and [\s,]. A number must start with a digit or a period ($.99 == .99 != 99). Then we use a repeated non-capturing group (?:...)* to take care of our alternation and lookahead assertions. Anytime there is a [\d.] we match it with now questions asked. Otherwise (|), it it is a [\s,] we assert that it is followed with a digit using a lookahead ((?=...)).
Demo
Example:
preg_replace('/\s*[^\d\s,.]+\s*|,(?!\d)/', '', $number);
Explanation:
[^\d\s,.]+ will match 1+ characters that are not either a digit, whitespace, a comma, or a period. We put \s* on either side to grab any extra whitespace around these unwanted characters (like in "Our price "). The only unwanted character this doesn't match is a trailing comma. We use an alternation (|), then look for a comma, and then make sure that it is not followed by a digit using a negative lookahead ((?!...)).
Demo

Regular expression to extract the year from a string

This should be simple but i cant seem to get it going. The purpose of it is to extract v3 tags from mp3 file names in Mp3tag.
I have these strings I want to extract the year.
Test String 1 (1994) -> extract 1994
34 Test String 2 (1995)" -> extract 1995
Test (String) 3 (1996)" -> extract 1996
I had ^(.+)\s\(([0-9]*)\)$ but obviously its not giving me the results i was expecting. You can say that im not very good with regular expressions.
Thanks in advance
A suggestion for a more generic solution, not sure if that is what you need. Valid years will always have the form 19xx or 20xx, and the years will be separated with a word-break character (something other than a number or a letter):
\b(19|20)\d{2}\b
This doesn't really care where in the tag the year appears. A simpler version that doesn't assume anything more than 4 digits in the year would be this expression:
\b\d{4}\b
The key here is the \b escape sequence, which matches any non-word character (word charaters are letters, digits and underscores), including parenthesis, of course.
Would also like to recommend this site:
http://www.regular-expressions.info/
You can use something like this \((\d{4})\)$. The first group will have your match.
Explanation
\( # Match the character “(” literally
( # Match the regular expression below and capture its match into backreference number 1
\d # Match a single digit 0..9
{4} # Exactly 4 times
)
\) # Match the character “)” literally
$ # Assert position at the end of a line (at the end of the string or before a line break character)
You need to escape the parentheses. Also you can restrict that a year has only got 4 numbers:
^(.+)\s\(([0-9]{4})\)$
The year is in matchgroup 2.
I'd go with
^(.*)\s\(([0-9]{4})\)$
(assuming all years have 4 digits, use [0-9]+ if you have an unknown number of digits, but at least one, or [0-9]* if there could be no digits)
You're almost there with your regular expression.
What you really need is:
\s\((\d{4})\)$
Where:
\s is some whitespace
\( is a literal '('
( is the start of the match group
\d is a digit
{4} means four of the previous atom (i.e. four digits)
) is the end of the match group
\) is a literal ')'
$ is the end of the string
For best results, put into a function:
>>> def get_year(name):
... return re.search('\s\((\d{4})\)$', name).groups()[0]
...
>>> for name in "Test String 1 (1994)", "34 Test String 2 (1995)", "Test (String) 3 (1996)":
... print get_year(name)
...
1994
1995
1996

Regular expression for phone numbers

I'm trying:
\d{3}|\d{11}|\d{11}-\d{1}
to match three-digit numbers, eleven-digit numbers, eleven-digit followed by a hyphen, followed by one digit.
But, it only matches three digit numbers!
I also tried \d{3}|\d{11}|\d{11}-\d{1} but doesn't work.
Any ideas?
There are many ways of punctuating phone numbers. Why don't you remove everything but the digits and check the length?
Note that there are several ways of indicating "extension":
+1 212 555 1212 ext.35
If the first part of an alternation matches, then the regex engine doesn't even try the second part.
Presuming you want to match only three-digit, 11 digit, or 11 digit hyphen 1 digit numbers, then you can use lookarounds to ensure that the preceding and following characters aren't digits.
(?<!\d)(\d{3}|\d{11}|\d{11}-\d{1})(?!\d)
\d{7}+\d{4} will select an eleven digit number. I could not get \d{11} to actually work.
This should work: /(?:^|(?<=\D))(\d{3}|\d{11}|\d{11}-\d{1})(?:$|(?=\D))/
or combined /(?:^|(?<!\d))(\d{3}|\d{11}(?:-\d{1})?)(?:$|(?![\d-]))/
expanded:
/ (?:^ | (?<!\d)) # either start of string or not a digit before us
( # capture grp 1
\d{3} # a 3 digit number
| # or
\d{11} # a 11 digit number
(?:-\d{1})? # optional '-' pluss 1 digit number
) # end capture grp 1
(?:$ | (?![\d-])) # either end of string or not a digit nor '-' after us
/