Why does this not match my example? - regex

as I go through the regex101 quiz/lessons, I am supposed to match an IP address (without leading zeros).
Now the following
^[^0]+[0-9]+\\.[^0]+[0-9]+\\.[^0]+[0-9]+\\.[^0]+[0-9]+$
matches 23.34.7433.33
but fails to match single digit numbers like 1.2.3.4
Why is this so, when my + is supposed to match "1 to infinite" times...?

You are in fact matching more than 2 digits for each number in the IP address because you have:
[^0]+[0-9]+
[^0]+ matches at least one character, and [0-9]+ matches at least 1 character. Both will match 'at least 2 characters' (characters being in scope of the character classes).
Also 23.34.7433.3 doesn't match your regex for the reason I stated above.
And you might try this regex for the purpose you stated:
^(?:[1-9][0-9]{0,2}\.){3}[1-9][0-9]{0,2}$
[1-9][0-9]{0,2} will match up to 3 digits, with a non leading 0.
EDIT: You mentioned in a comment that 0.0.0.0 (single digit zeroes) are to be accepted as well. The modified regex from above would be:
^(?:(?:[1-9][0-9]{0,2}|0)\.){3}(?:[1-9][0-9]{0,2}|0)$

Assuming you want to check an IPv4, I suggest you this pattern:
^(?<nb>2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9])(?>\.\g<nb>){3}$
I have defined a named subpattern nb to make the pattern shorter, but if you prefer, you can rewrite all and replace \g<nb>:
^(?>2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9])(?>\.(?>2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9])){3}$
Numbers greater than 255 are not allowed.
pattern details:
The goal is to describe what is allowed:
numbers with 3 digits that begins with "2" can be followed by a digit in [0-4] and a digit in [0-9] OR by 5 and a digit in [0-5] because it can exceed 255.
numbers with 3 digits that begins with "1" can be followed by any two digits.
any number with 2 digits that doesn't begin with "0"
any number with 1 digit (zero included)
If I add one by one these rules, I obtain
2(?>[0-4][0-9]|5[0-5])
2(?>[0-4][0-9]|5[0-5]) | 1[0-9]{2}
2(?>[0-4][0-9]|5[0-5]) | 1[0-9]{2} | [1-9][0-9]
2(?>[0-4][0-9]|5[0-5]) | 1[0-9]{2} | [1-9][0-9] | [0-9]
Now I have a definition for allowed numbers. I can reduce a little the size of the pattern replacing [1-9][0-9] | [0-9] by [1-9]?[0-9]
Then you only have to add the dot repeat the subpattern four times: x.x.x.x
But since there is only three dots, I write the first number and I repeat 3 times a group that contains a dot and a number:
2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9] # the first number
(?>\.2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9]){3} # the group repeated 3 times
To be sure that the string doesn't contain anything else that the IP I described, I add anchors for the start of string ^ and for the end of string $, then the string begins and ends with the IP.
To reduce the size of a pattern you can define a named group which allows to reuse the subpattern it contains,
Then you can rewrite the pattern like this:
^
(?<nb> 2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9] ) # named group definition
(?> \. \g<nb> ){3} # \g<nb> is the reference to the subpattern named nb
$

[0-9]+ should be [0-9]*
* matches 0 or more.
+ matches 1 or more.
You already have the case [^0] <--- this actually wrong because it will match letters also.
besides that it will match the first character that's NOT zero then at least one number after that.
It should be written as
[1-9][0-9]*
This essentially checks the first letter and sees if its a number that's between 1-9 then the next numbers(0 nums to infinite nums) after that is a number 0-9.
Then this will come out to.
^[1-9][0-9]*\.[1-9][0-9]*\.[1-9][0-9]*\.[1-9][0-9]*$
Edit live on Debuggex
cleaning it up.
^(?:[1-9][0-9]*\.){3}[1-9][0-9]*$
this should work...
^(?:[1-9][0-9]*\.|[0-9])(?:[1-9][0-9]*\.|[0-9])(?:[1-9][0-9]*\.|[0-9])(?:[1-9][0-9]*|[0-9])$
cleaned up.
^(?:(?:[1-9][0-9]*|0)\.){3}(?:[1-9][0-9]*|0)$

Your regex would match ABCDEFG999.FOOBSR888 etc, because [^0] is any character other than a zero, and bith character classes are required by the +.
I think you want this:
^[1-9]\d*(\\.[1-9]\d*){3}$
having replaced various verbose expressions with their equivalent, this is 4 groups of digits each starting with a non-zero.
Actually the problem is far more complicated, because your approach (once corrected) allows 999.999.999.999, which is not a valid IP.

It might be because you need at least two digits between two dots '.'
try using this pattern: ^[^0]+[0-9]*\.[^0]+[0-9]*\.[^0]+[0-9]*\.[^0]+[0-9]*$

to match ip address you should use this pattern:
\b(?:\d{1,3}.){3}\d{1,3}\b
taken from here:
http://www.regular-expressions.info/examples.html

Related

Regex expression for numbers and leading zeros just with a dot and decimal

I'm trying to find a regex for numeric inputs. We can receive a leading 0 just if we add a dot for adding 1 or 2 decimal numbers. And of course just accept numbers.
These are the scenarios that we can accept:
0.01
1.1
1.02
120.01
We can't accept these values
0023
0100
.01
.12
Which regex is the best option for these cases?
Until now we try we the following regex for accepting just number and dots
[A-Za-z,]
And also we try with the following ones:
^[+-]?[0-9]{1,3}(?:[0-9]*(?:[.,][0-9]{1})?|(?:,[0-9]{3})*(?:\.[0-9]{1,2})?|(?:\.[0-9]{3})*(?:,[0-9]{1,2})?)$
"/^[-]?[$]\d{1,3}(?:,?\d{3})*\.\d{2}$/"
"/(^(\d{1})\.{0,1}([0-9]){0,2}$)|(^([1-9])\d{0,2}(\,\d{0,3})$)/g"
(?:0|[1-9][0-9]*)(?:\.[0-9]{1,2})?
And the next one for deleting the leading zeros but it didn't work for 0.10 cases
^0+
If a negative lookahead is supported, you can exclude matches that start with a zero and have no decimal part.
^(?!0\d*$)\d+(?:\.\d{1,2})?$
^ Start of string
(?!0+\d*$) Negative lookahead, assert not a zero followed by optional digits at the right
\d+ Match 1+ digits
(?:\.\d{1,2})? Match an optional decimal part with 1 or 2 digits
$ End of string
Regex demo
I would go with ^(0|[1-9]\d*|(0|[1-9]\d*)\.\d+)$
You can test here: https://regex101.com/r/oNMgR9/1
Explanation
^ means : match the beginning of the string (or line if the m flag is enabled).
$ means : match the end of the string (or line if the m flag is enabled).
(a|b) means match "a" or match "b" so I'll use this to match either "0" alone or any number not starting with a "0". It's the syntax for a logical or.
. alone is used to match any char. So you have to escape it if you want to match the dot character. This is why I wrote 0\. instead of 0..
[ ] is used to list some characters you want to match. It can be a range if you use the - char, so [1-9] means any digit char from "1" to "9".
\d is to match a digit. It's totally equivalent to [0-9].
* means : match the preceding pattern 0 or many times, so \d* means that it will match 0 or many times a digit, so it will match "8" or "465" or "09" but also an empty string "". If you want to match the preceding pattern at least once or many times then you use + instead of *. So \d+ won't match an empty string "" but \d* would match it.
A) Just a number not starting with 0
[1-9]\d* will match any digit from 1 to 9 and then optionnaly followed by other digits. This will match numbers without a decimal point.
B) Just 0
0 alone is a possibility. This is because the case above isn't covering it.
B) A number with decimals
(0|[1-9]\d*)\.\d+ will match either a "0" alone or a number not starting by "0" and then followed by a point and some other digits (which have to be present because we don't want to match "45." without the numbers behind the dot).
Better alternative
The solution from #TheFourthBird is a bit cleaner with the use of a negative lookahead. It's just a bit different to understand. And he read the question completely: You wanted 1 or 2 digits after the decimal. I forgot about that, so, effectively, \d+ should be replaced by \d{1,2} as you don't want more than 2 digits.
You can use
^(?![0.]+$)(?:[1-9]\d*|0)(?:\.\d{1,2})?$
See the regex demo.
Details:
^ - start of string
(?![0.]+$) - fail the match if there are just zeros or dots till end of string
(?:[1-9]\d*|0) - either a non-zero digit followed with any zero or more digits or a zero
(?:\.\d{1,2})? - optionally followed with a sequence of a . and one or two digits
$ - end of string.

Regex to block more than 3 numbers in a string

I am trying to block any strings that contain more than 3 numbers and prevent special characters. I have the special characters part down. I'm just missing the number part.
For example:
"Hello 1234" - Not Allowed
"Hello 123" - Allowed
I've tried the following:
/^[!?., A-Za-z0-9]+$/
/((^[!?., A-Za-z]\d)([0-9]{3}+$))/
/^((\d){2}[a-zA-Z0-9,.!? ])*$/
The last one is the closest I got as it prevents any special characters and any numbers from being entered at all.
I've looked through previous posts, but am coming up short.
Edit for clarification
Essentially I'm trying to find a way to prevent customers from entering PII on a form. No submission should be allowed that contains more than 3 numbers in a string.
Hello1234 - Not allowed
12345 - Not allowed
1111 - not allowed
No where in the comment section when the user enters the string should there be more than 3 numbers in total.
About the patterns that you tried
^[!?., A-Za-z0-9]+$ The pattern matches 1+ times any of the listed, including 1 or more digits
((^[!?., A-Za-z]\d)([0-9]{3}+$)) If {3}+ is supported, the pattern matches a single char from the character class, 1 digit followed by 3 digits
^((\d){2}[a-zA-Z0-9,.!? ])*$ The pattern repeats 0+ times matching 2 digits and 1 of the listed in the character class
You can use a negative lookahead if that is supported to assert not 4 digits in a row.
^(?!.*\d{4})[a-zA-Z0-9,.!? ]+$
regex demo
If there can not be 4 digits in total, but 0-3 occurrences:
^[a-zA-Z,.!? ]*(?:\d[a-zA-Z,.!? ]*){0,3}$
Explanation
^ Start of string
[a-zA-Z,.!? ]* Match 0+ times any of the listed (without a digit)
(?:\d[a-zA-Z,.!? ]*){0,3} Repeat 0 - 3 times matching a single digit followed by optional listed chars (Again without a digit)
$ End of string
regex demo
If you don't want to match an empty string and a lookahead is supported:
^(?!$)[a-zA-Z,.!? ]*(?:\d[a-zA-Z,.!? ]*){0,3}$
See another regex demo
Here is my two cents:
^(?!(.*\d){4})[A-Za-z ,.!?\d]+$
See the online demo
^ - Start string anchor.
(?! - Open a negative lookahead.
( - Open capture group.
.*\d - Match anything other than newline up to a digit.
){4} - Close capture group and match it 4 times.
) - Close negative lookahead.
[A-Za-z ,.!?\d]+ - 1+ Characters from specified class.
$ - End string anchor.
I think it should cover what you described.
Assuming you mean <= 3 digits, this may be a naive one but how about
[ALLOWED_CHARS]*[0-9]?[ALLOWED_CHARS]*[0-9]?[ALLOWED_CHARS]*[0-9][ALLOWED_CHARS]*?
Fill [ALLOWED_CHARS] to whatever you define is not special character and nums.

Using regex to match numbers which have 5 increasing consecutive digits somewhere in them

First off, this has sort of been asked before. However I haven't been able to modify this to fit my requirement.
In short: I want a regex that matches an expression if and only if it only contains digits, and there are 5 (or more) increasing consecutive digits somewhere in the expression.
I understand the logic of
^(?=\d{5}$)1*2*3*4*5*6*7*8*9*0*$
however, this limits the expression to 5 digits. I want there to be able to be digits before and after the expression. So 1111345671111 should match, while 11111 shouldn't.
I thought this might work:
^[0-9]*(?=\d{5}0*1*2*3*4*5*6*7*8*9*)[0-9]*$
which I interpret as:
^$: The entire expression must only contain what's between these 2 symbols
[0-9]*: Any digits between 0-9, 0 or more times followed by:
(?=\d{5}0*1*2*3*4*5*6*7*8*9*): A part where at least 5 increasing digits are found followed by:
[0-9]*: Any digits between 0-9, 0 or more times.
However this regex is incorrect, as for example 11111 matches. How can I solve this problem using a regex? So examples of expressions to match:
00001459000
12345
This shouldn't match:
abc12345
9871234444
While this problem can be solved using pure regular expressions (the set of strictly ascending five-digit strings is finite, so you could just enumerate all of them), it's not a good fit for regexes.
That said, here's how I'd do it if I had to:
^\d*(?=\d{5}(\d*)$)0?1?2?3?4?5?6?7?8?9?\1$
Core idea: 0?1?2?3?4?5?6?7?8?9? matches an ascending numeric substring, but it doesn't restrict its length. Every single part is optional, so it can match anything from "" (empty string) to the full "0123456789".
We can force it to match exactly 5 characters by combining a look-ahead of five digits and an arbitrary suffix (which we capture) and a backreference \1 (which must exactly the suffix matched by the look-ahead, ensuring we've now walked ahead 5 characters in the string).
Live demo: https://regex101.com/r/03rJET/3
(By the way, your explanation of (?=\d{5}0*1*2*3*4*5*6*7*8*9*) is incorrect: It looks ahead to match exactly 5 digits, followed by 0 or more occurrences of 0, followed by 0 or more occurrences of 1, etc.)
Because the starting position of the increasing digits isn't known in advance, and the consecutive increasing digits don't end at the end of the string, the linked answer's concise pattern won't work here. I don't think this is possible without being repetitive; alternate between all possibilities of increasing digits. A 0 must be followed by [1-9]. (0(?=[1-9])) A 1 must be followed by [2-9]. A 2 must be followed by [3-9], and so on. Alternate between these possibilities in a group, and repeat that group four times, and then match any digit after that (the lookahead in the last repeated digit in the previous group will ensure that this 5th digit is in sequence as well).
First lookahead for digits followed by the end of the string, then match the alternations described above, followed by one or more digits:
^(?=\d+$)\d*?(?:0(?=[1-9])|1(?=[2-9])|2(?=[3-9])|3(?=[4-9])|4(?=[5-9])|5(?=[6-9])|6(?=[7-9])|7(?=[89])|8(?=9)){4}\d+
Separated out for better readability:
^(?=\d+$)\d*?
(?:
0(?=[1-9])|
1(?=[2-9])|
2(?=[3-9])|
3(?=[4-9])|
4(?=[5-9])|
5(?=[6-9])|
6(?=[7-9])|
7(?=[89])|
8(?=9)
){4}
\d+
The lazy quantifier in the first line there \d*? isn't necessary, but it makes the pattern a bit more efficient (otherwise it initially greedily matches the whole string, requiring lots of failing alternations and backtracking until at least 5 characters before the end of the string)
https://regex101.com/r/03rJET/2
It's ugly, but it works.

Regex to match a 2-digit number or a 3 digit number

I need to be able to check if a string contains either a 2 digit or a 4 digit number before a . (period).
For example, 39. is good, and so is 3926., but 392. is not.
I originally had (^\\d{2,4).$) but that allows between a 2 and a 4 digit number preceding a period.
I also tried (^\\d{2}.|\\d{4}.$) but that didn't work.
You can use this regex:
^\d{2}(?:\d{2})?\.$
This regex makes 2nd set of \d{2} optional thus allowing to match 12. or 1234. but not 123..
In the expression (^\d{2}.|\d{4}.$), the dots match any character.
Try escaping them to make them match literal dots: (^\d{2}\.|\d{4}\.$)

Regex for 9-digit phone number dot-separated

I would like to check if a phone number contains exactly 3 digits - dot - 3 digits - dot - 3 digits. (e.g. 123.456.789)
So far I have this, but it doesn't work:
^(\d{3}\){2}\d{4}$
Note that an escaped bracket \) loses its special meaning in regex and the pattern becomes invalid since the capturing group is not closed.
If you want to match a dot with a regex, you need to include it to your pattern, and if you say 3 digits must be at the end there is no point in declaring 4 digits with \d{4}.
^(\d{3}\.){2}\d{3}$
^ ^
or if we expand the first group:
^\d{3}\.\d{3}\.\d{3}$
So all the fix consists in adding a dot after the second backslash and adjusting the final limiting quantifier.
Note that for mostly "stylistics" concerns (since efficiency gain is insignificant) I'd use a non-capturing group with the first regex variant:
^(?:\d{3}\.){2}\d{3}$