Trying to put together regex that can match minimum 4 digits, maximum 16 digits, and those digits can be separated by characters: ()- x+ (but should not be part of the min/max count).
ie. "555-123-4567" would return true, "1-234" is true, "+44(55)123-3333" is true, "abcd1" is false, "1-()-4++++-()-6" is false.
Any way to do that with purely regex? Trying a couple expressions but not working.
what you need to do, is to match any number of the allowed characters, followed by a digit, followed by any number of the allowed characters, and match that same sequence between 4 an 16 times.
like this
^([()\- x+]*\d[()\- x+]*){4,16}$
http://rubular.com/r/6VhALkFPQZ
This:
/^[(]{0,1}[0-9]{3}[)]{0,1}[-\s\.]{0,1}[0-9]{3}[-\s\.]{0,1}[0-9]{4}$/
Works with these formats:
123-456-7890
(123) 456-7890
1234567890
123.456.7890
TL;DR
The OP has already accepted a regex solution. Below I present an alternative way of looking at the problem. Hopefully it helps the OP, but it's really aimed more at future visitors and followers of regex.
Don't Validate Logic with Regexps
Regular expressions work best for matching or extracting patterns, rather than for complex data validation. For example, the OP gives the following rules:
Trying to put together regex that can match minimum 4 digits, maximum 16 digits, and those digits can be separated by characters: ()- x+ (but should not be part of the min/max count).
but then says that 1-()-4++++4-()-66 should be false. However, it meets the rules for truth as originally defined by the OP. (NB: This example was later changed in the OP's question, but the point I'm making remains valid.)
Example: Using Code to Simplify the Regex Pattern Match
Logic should be encapsulated in short, testable pieces of code, not in complex regular expressions. For example, consider the following Ruby code:
numbers = [
'555-123-4567',
'1-234',
'+44(55)123-3333',
'abcd1',
'1-()-4++++4-()-66'
]
numbers.map { |num| num.delete '- x+()' }.grep /\A\d{4,16}\z/
#=> ["5551234567", "1234", "44551233333", "14466"]
Even if you aren't a Rubyist, the code should be easy to follow. This code strips out the characters that are irrelevant to our match, then checks that each string contains nothing but 4-16 digits anchored to the beginning and end of the string. Instead of validating a complex pattern, you're now just validating a simple pattern (e.g. all numbers) with a well-defined interval from 4 to 16. Furthermore, you can break this kind of logic up into smaller steps rather than simply calling long method chains, making this inherently more testable.
Example: Avoiding Regexp Validation Altogether
You could even go further by avoiding the regex for any sort of validation, and making your Boolean expressions more explicit. Consider the following:
numbers = [
'555-123-4567',
'1-234',
'+44(55)123-3333',
'abcd1',
'1-()-4++++4-()-66'
]
numbers.map do |num|
digits = num.scan /\d/
valid = digits.count >= 4 and digits.count <= 16
puts "#{num}: #{valid}"
end
This will print:
555-123-4567: true
1-234: true
+44(55)123-3333: true
abcd1: false
1-()-4++++4-()-66: true
To me, this seems like a much more robust and flexible way of solving the "phone number validation" question, which gets asked here on Stack Overflow in one form or another with amazing regularity. Your mileage may vary.
Related
I want to check if a number is 50 or more using a regular expression. This in itself is no problem but the number field has another regex checking the format of the entered number.
The number will be in the continental format: 123.456,78 (a dot between groups of three digits and always a comma with 2 digits at the end)
Examples:
100.000,00
50.000,00
50,00
34,34
etc.
I want to capture numbers which are 50 or more. So from the four examples above the first three should be matched.
I've come up with this rather complicated one and am wondering if there is an easier way to do this.
^(\d{1,3}[.]|[5-9][0-9]|\d{3}|[.]\d{1,3})*[,]\d{2}$
EDIT
I want to match continental numbers here. The numbers have this format due to internal regulations and specify a price.
Example: 1000 EUR would be written as 1.000,00 EUR
50000 as 50.000,00 and so on.
It's a matter of taste, obviously, but using a negative lookahead gives a simple solution.
^(?!([1-4]?\d),)[1-9](\d{1,2})?(\.\d{3})*,\d{2}\b
In words: starting from a boundary ignore all numbers that start with 1 digit OR 2 digits (the first being a 1,2,3 or 4), followed by a comma.
Check on regex101.com
Try:
EDIT ^(.{3,}|[5-9]\d),\d{2}$
It checks if:
there 3 chars or more before the ,
there are 2 numbers before the , and the first is between 5 and 9
and then a , and 2 numbers
Donno if it answer your question as it'll return true for:
aa50,00
1sdf,54
But this assumes that your original string is a number in the format you expect (as it was not a requirement in your question).
EDIT 3
The regex below tests if the number is valid referring to the continental format and if it's equal or greater than 50. See tests here.
Regex: ^((([1-9]\d{0,2}\.)(\d{3}\.){0,}\d{3})|([1-9]\d{2})|([5-9]\d)),\d{2}$
Explanation (d is a number):
([1-9]\d{0,2}\.): either d., dd. or ddd. one time with the first d between 1 and 9.
(\d{3}\.){0,}: ddd. zero or x time
\d{3}: ddd 3 digit
These 3 parts combined match any numbers equals or greater than 1000 like: 1.000, 22.002 or 100.000.000.
([1-9]\d{2}): any number between 100 and 999.
([5-9]\d)): a number between 5 and 9 followed by a number. Matches anything between 50 and 99.
So it's either the one of the parts above or this one.
Then ,\d{2}$ matches the comma and the two last digits.
I have named all inner groups, for better understanding what part of number is matched by each group. After you understand how it works, change all ?P<..> to ?:.
This one is for any dec number in the continental format.
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})*|0)(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})*,)|0,|,)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
test
This one is for the same with the limit number>=50
^(?P<common_int>(?P<int>(?P<int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<int_end>\.\d{3})+|(?P<int_short>[1-9]\d{2}|[5-9]\d))(?!,)|(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})+,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d))(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_start>\d{3}\.)*(?P<frac_end>\d{1,3})))?$
tests
If you always have the integer part under 999.999 and fractal part always 2 digits, it will be a bit more simple:
^(?P<dec_int_having_frac>(?P<dec_int>(?P<dec_int_start>[1-9]\d{1,2}|[1-9]\d|[1-9])(?P<dec_int_end>\.\d{3})?,)|(?P<dec_short_int>[1-9]\d{2}|[5-9]\d),)(?=\d)(?P<frac_from_comma>(?<=,)(?P<frac>(?P<frac_end>\d{1,2})))?$
test
If you can guarantee that the number is correctly formed -- that is, that the regex isn't expected to detect that 5,0.1 is invalid, then there are a limited number of passing cases:
ends with \d{3}
ends with [5-9]\d
contains \d{3},
contains [5-9]\d,
It's not actually necessary to do anything with \.
The easiest regex is to code for each of these individually:
(\d{3}$|[5-9]\d$|\d{3},|[5-9]\d)
You could make it more compact and efficient by merging some of the cases:
(\d{3}[$,]|[5-9]\d[$,])
If you need to also validate the format, you will need extra complexity. I would advise against attempting to do both in a single regex.
However unless you have a very good reason for having to do this with a regex, I recommend against it. Parse the string into an integer, and compare it with 50.
How can you create a regular expression that checks if a user input matches characters formally found in a currency syntax? (number, period/decimal place, comma, or dollar sign?).
The following can find all characters listed above except for the dollar sign, any idea how to properly structure this?
/([0-9.,])/g
The regex I use for currency validation is as follows:
^(\$)?([1-9]{1}[0-9]{0,2})(\,\d{3})*(\.\d{2})?$|^(\$)?([1-9]{1}[0-9]{0,2})(\d{3})*(\.\d{2})?$|^(0)?(\.\d{2})?$|^(\$0)?(\.\d{2})?$|^$
RegExr is a great website for testing and reviewing these strings (perhaps you could make a regex string that's less of a beast!)
Are you just trying to test the characters? In that case
[0-9,.$]+
will suffice. Or are you testing for the format $1,123,123.12 with the correct placements of commas and everything?
In that case you would need something more like
(\$?\d{1,3}(?:,\d{3})*(?:.\d{2})?)
should do.
You need to define what you want your regex to match, more formally than "matches characters formally found in a currency syntax". We don't know which currencies you're interested in. We don't know how strict you need it to be.
Maybe you'll come up with something like:
These elements must come in this order:
A currency symbol ('£', '€' or '$') (your requirement might specify more currencies)
1 or more numeric digits
A period or a comma
Exactly two numeric digits
Once you have a specification like that, it's easy to translate into a regular expression:
[£€$] // one of these chars.
\d+ // '+' means 'one or more'
[.,] // '[]' means 'any one of these'.
\d\d // Two digits. Could also be written as '\d{2}'
Or concatenated together:
[£€$]\d+[.,]\d\d
If you've learned about escaping special characters like $ and ., you may be surprised not to see it done here. Within [], they lose their special meaning.
(There are dialects of regex -- check the documentation for whatever implementation you're using)
Your requirements may be different though. The example I've given doesn't match:
$ 12.00
$12
USD12
¥200.00
25¢
$0.00005
20 μBTC
44 dollars
£1/19/11¾d ("one pound, nineteen shillings and elevenpence three farthings")
Work out your requirement, then write your code to meet it.
you should set \ before special chars, also you should set star(0+) or plus(1+) for match full currency chars, for example:
/([0-9\.,]*)/g
or for real price how 200,00 where all time exist 2 symbols after comma:
/(([0-9]+)(\.|,)([0-9]){2})/g
I came across the regular expression not containing 101 as follows:
0∗1∗0∗+(1+00+000)∗+(0+1+0+)∗
I was unable to understand how the author come up with this regex. So I just thought of string which did not contain 101:
01000100
I seems that above string will not be matched by above regex. But I was unsure. So tried translating to equivalent pcre regex on regex101.com, but failed there too (as it can be seen my regex does not even matches string containing single 1.
Whats wrong with my translation? Is above regex indeed correct? If not what will be the correct regex?
Here is a bit shorter expression ^0*(1|00+)*0*$
https://www.regex101.com/r/gG3wP5/1
Explanation:
(1|00+)* we can mix zeroes and ones as long as zeroes occur in groups
^0*...0*$ we can have as many zeroes as we want in prefix/suffix
Direct translation of the original regexp would be like
^(0*1*0*|(1|00|000)*|(0+1+0+)*)$
Update
This seems like artificially complicated version of the above regexp:
(1|00|000)* is the same as (1|00+)*
it is almost the solution, but it does not match strings 0, 01.., and ..10
0*1*0* doesn't match strings with 101 inside, but matches 0 and some of 01.., and ..10
we still need to match those of 01.., and ..10 which have 0 & 1 mixed inside, e.g. 01001.. or ..10010
(0+1+0+)* matches some of the remaining cases but there are still some valid strings unmatched
e.g. 10010 is the shortest string that is not matched by all of the cases.
So, this solution is overly complicated and not complete.
read the explanation in the right side tab in regex101 it tells you what your regex does( I think you misunderstood what list operator does) , inside a list operator ( [ ) , the other characters such as ( won't be metacharacters anymore so the expression [(0*1*0*)[1(00)(000)] will be equivalent to [01()*[] which means it matches 0 or 1 or ( or ) or [
The correct translation of the regular expression 0∗1∗0∗+(1+00+000)∗+(0+1+0+)∗
will be as follows:
^((?:0*1*0*)|(?:1|00|000)*|(?:0+1+0+)*)$
regex101
Debuggex Demo
What your regex [(0*1*0*)[1(00)(000)]*(0+1+0+)*] does:
[(0*1*0*)[1(00)(000)]* -> matches any of characters 0,(,),*,[ zero or more times followed by
(0+1+0+)* --> matches the pattern 0+1+0+ 0 or more times followed by
] --> matches the character ]
so you expression is equivalent to
[([)01](0+1+0+)*] which is not a regular expression to match strings that do not contain 101
0* 1* ( (00+000)* 1*)* (ε+0)
i think this expression covers all cases because --
any number apart from 1 can be broken into constituent 2's and 3's i.e. any number n=2*i+3*j. So there can be any number of 0's between 2 consecutive 1's apart from one 0.Hence, 101 cannot be obtained.
ε+0 for expressions ending in one 0.
The RE for language not containing 101 as sub-string can also be written as (0*1*00)*.0*.1*.0*
This may me a smaller one then what you are using. Try to make use of this.
Regular Expression I got (0+10)1. (looks simple :P)
I just considered all cases to make this.
you consider two 1's we have to end up with continuous 1's
case 1: 11111111111111...
case 2: 0000000011111111111111...(once we take two 1's we cant accept 0's so one and only chance is to continue with 1's)
if you consider only one 1 which was followed by 0 So, no issue and after one 1 we can have any number of 0's.
case 3: 00000000 10100100010000100000100000 1111111111
=>(0*+10*)1
final answer (0+10)1.
Thanks for your patience.
There's a long natural number that can be grouped to smaller numbers by the 0 (zero) delimiter.
Example: 4201100370880
This would divide to Group1: 42, Group2: 110, Group3: 370880
There are 3 groups, groups never start with 0 and are at least 1 char long. Also the last groups is "as is", meaning it's not terminated by a tailing 0.
This is what I came up with, but it only works for certain inputs (like 420110037880):
(\d+)0([1-9][0-9]{1,2})0([1-9]\d+)
This shows I'm attempting to declare the 2nd group's length to min2 max3, but I'm thinking the correct solution should not care about it. If the delimiter was non-numeric I could probably tackle it, but I'm stumped.
All right, factoring in comment information, try splitting on a regex (this may vary based on what language you're using - .split(/.../) in JavaScript, preg_split in PHP, etc.)
The regex you want to split on is: 0(?!0). This translates to "a zero that is not followed by a zero". I believe this will solve your splitting problem.
If your language allows a limit parameter (PHP does), set it to 3. If not, you will need to do something like this (JavaScript):
result = input.split(/0(?!0)/);
result = result.slice(0,2).concat(result.slice(2).join("0"));
The following one should suit your needs:
^(.*?)0(?!0)(.*?)0(?!0)(.*)$
Visualization by Debuggex
The following regex works:
(\d+?)0(?!0) with the g modifier
Demo: http://regex101.com/r/rS4dE5
For only three matches, you can do:
(\d+?)0(?!0)(\d+?)0(?!0)(.*)
I want to create a regex that will match any of these values
7-5
6-6 ((0-99) - (0-99))
6-4
6-3
6-2
6-1
6-0
0-6
1-6
2-6
3-6
4-6
the 6-6 example is a special case, here are some examples of values:
6-6 (23-8)
6-6 (4-25)
6-6 (56-34)
Is it possible to make one regex that can do this?
If so, is it possible to further extend that regex for the 6-6 special case such that the the difference between the two numbers within the parentheses is equal to 2 or -2?
I could easily write this with procedural code, but i'm really curious if someone can devise a regex for this.
Lastly, if it could be further extended such that the individual digits were in their own match groups I'd be amazed. An example would be for 7-5, i could have a match group that just had the value 7, and another that had the value 5. However for 6-6 (24-26) I'd like a match group that had the first six, a match group for the second 6, a match group for the 24 and a match group for the 26.
This may be impossible, but some of you can probably get this part of the way there.
Good luck, and thanks for the help.
NO. The answer is "We can't," and the reason is because you're trying to use a hammer to dig a hole.
The problem with writing one long "clever" (this word causes a knee-jerk reaction in many people who are far more anti-regex than I) regex is that, six months from now, you'll have forgotten those clever regex features that you used so heavily, and you'll have written six months worth of code related to something else, and you'll get back to your impressive regex and have to tweak one detail, and you'll say, "WTF?"
This is what (I understand) you want, in Perl:
# data is in $_
if(/7-5|6-[0-4]|[0-4]-6|6-6 \((\d{1,2})-(\d{1,2})\)/) {
if($1 and $2 and abs($1 - $2) == 2) {
# we have the right difference
}
}
Some might say that the given regex is a bit much, but I don't think it's too bad. If the \d{1,2} bit is a little too obscure you could use \d\d? (which is what I used at first, but didn't like the repetition).
You can do it like this:
7-5|6-[0-4]|[0-5]-6|6-6 \(\d\d?-\d\d?\)
Just add parens to get your match groups.
Off the top of my head (there may be some errors but the principle should be good):
\d-\d|6-6 (\d+-\d+)
And like with any regexp, you can surround what you want to extract with parentheses for match groups:
(\d)-(\d)|(6)-(6) ((\d)+-(\d+))
In the 6-6 case, the first two parentheses should get the sixes, and the second two should get the multi-digit values that come afterwards.
Here is one that will match only the numbers you want and let you get each digit by name:
p = r'(?P<a>[0-4]|6|7)-(?P<b>[0-4]|6|5) *(\((?P<c>\d{1,2})-(?P<d>\d{1,2})\))?'
To get each digit you could use:
values = re.search(p, string).group('a', 'b', 'c', 'd')
Which will return a four element tuple with the values you are looking for (or None if no match was found).
One problem with this pattern is that it will patch the stuff in the parenthesis whether or not there was a match to '6-6'. This one will only match the final parenthesis if 6-6 is matched:
p = r'(?P<a>[0-4]|(?P<tmp_a>6)|7)-(?P<b>(?(tmp_a)(?P<tmp_b>6)|([0-4]|5)))(?(tmp_b) *(\((?P<c>\d{1,2})-(?P<d>\d{1,2})\))?)'
I don't know of any way to look for a difference between the numbers in the parenthesis; regex only knows about strings, not numerical values . . .
(I am assuming python syntax here; the perl syntax is slightly different, though perl supports the python way of doing things.)