Simple regex validation

Simple regex validation - regex

I want to implement the following validation. Match at least 5 digits and also some other characters between(for example letters and slashes). For example 12345, 1A/2345, B22226, 21113C are all valid combinations. But 1234, AA1234 are not. I know that {5,} gives minimum number of occurrences, but I don't know how to cope with the other characters. I mean [0-9A-Z/]{5,} won't work:(. I just don't know where to put the other characters in the regex expression.
Thanks in advance!
Best regards,
Petar

Using the simplest regex features since you haven't specified which engine you're using, you can try:
.*([0-9].*){5}
|/|\ /|/| |
| | \ / | | +--> exactly five occurrences of the group
| | | | +----> end group
| | | +------> zero or more of any character
| | +---------> any digit
| +------------> begin group
+--------------> zero or more of any character
This gives you any number (including zero) of characters, followed by a group consisting of a single digit and any number of characters again. That group is repeated exactly five times.
That'll match any string with five or more digits in it, along with anything else.
If you want to limit what the other characters can be, use something other than .. For example, alphas only would be:
[A-Za-z]*([0-9][A-Za-z]*){5}

EDIT: I'm picking up your suggestion from a comment to paxdiablo's answer: This regex now implements an upper bound of five for the number of "other" characters:
^(?=(?:[A-Z/]*\d){5})(?!(?:\d*[A-Z/]){6})[\dA-Z/]*$
will match and return a string that has at least five digits and zero or more of the "other" allowed characters A-Z or /. No other characters are allowed.
Explanation:
^ # Start of string
(?= # Assert that it's possible to match the following:
(?: # Match this group:
[A-Z/]* # zero or more non-digits, but allowed characters
\d # exactly one digit
){5} # five times
) # End of lookahead assertion.
(?! # Now assert that it's impossible to match the following:
(?: # Match this group:
\d* # zero or more digits
[A-Z/] # exactly one "other" character
){6} # six times (change this number to "upper bound + 1")
) # End of assertion.
[\dA-Z/]* # Now match the actual string, allowing only these characters.
$ # Anchor the match at the end of the string.

You may want to try counting the digits instead. I feel its much cleaner than writing a complex regex.
>> "ABC12345".gsub(/[^0-9]/,"").size >= 5
=> true
the above says substitute all things not numbers, and then finding the length of those remaining. You can do the same thing using your own choice of language. The most fundamental way would be to iterate the string you have, counting each character which is a digit until it reaches 5 (or not) and doing stuff accordingly.

Related

How can I limit the total length of 2 adjacent strings in Regular Expression?

Example word: name.surname#exm.gov.xx.en
I want to limit the name + surname's total length to 12.
Ex: If name's length is 5 then the surname's length cannot bigger than 7.
My regex is here: ([a-z|çöşiğü]{0,12}.[a-z|çöşiğü]{0,12}){0,12}#exm.gov.xx.en
Thx in advance

If there should be a single dot present which should not be at the start or right before the #, you could assert 13 characters followed by an #
^(?=[a-zçöşğü.]{13}#)[a-zçöşğü]+\.[a-zçöşğü]+#exm\.gov\.xx\.en$
In parts
^ Start of string
(?= Positive lookahead, assert what is on the right is
[a-zçöşğü.]{13}# Match 13 times any of the listed followed by an #
) Close lookahead
[a-zçöşğü]+\.[a-zçöşğü]+ Match 2 times any of the listed with a dot inbetween
#exm\.gov\.xx\.en Match #exm.gov.xx.en
$ End of string
Regex demo
Note that I have omitted the pipe | from the character class as it would match it literally instead of meaning OR. If you meant to use it as a char, you could add it back. I also have remove the i as that will be matched by a-z

Regex: match a string if the next 5 characters contain 3 occurences ot the character k

I like to match all substrings in a continues string, which start with a specific character and are followed by 5 arbitrary characters, which at least contain the character c 3 times.
Every String I am looking for has some arbitrary characters (x) and the starting character I am looking for (M) and I only what to match the String if the next 5 characters contain exactly C 3 times.
e.g.:
...xxMxCxCCxCxC...
...xxMCCCxxCxxC...
...xxMCxxxCCxxC...
...xxMxCCCCxCxx...
returning:
MxCxCC
MCCCxx
Null --> only 2x C in 5 characters after M
Null --> 4x C in the 5 characters after M
Case is not important I just used upper case for better illustration
I tried several things but the closest I got was, to be able to match everything until 3x C were reached:
M(?:[^C]*C){3}
and i was wondering, if I could somehow combine this with a Lookahead.
I am fairly new to this so maybe you could point me in the right direction.

If a quantifier in the lookbehind is supported, you might also match M followed by 5 times a non whitespace char.
Then use a positive lookbehind to assert what is on the left is M followed by matching 3 times a C, and 0-2 times any char except C
M\S{5}(?<=M(?:[^C\s]{0,3}C){3}[^C\s]{0,2})
Explanation
M Match M
\S{5} Match 5 times a non whitespace char
(?<= Positive lookbehind, assert what is on the left is
M Match M
(?: Non capture group
[^C\s]{0,3}C Match 0-3 times any char except C or whitespace char, then match C
){3} Close group and repeat 3 times
[^C\s]{0,2} Match 0-2 times any char except C or whitespace char
) Close lookbehind
Instead of using \S, you could also use [A-Za-z]{5} to match only chars a-z
Regex demo

I believe you will need to enumerate the 10 possible ways C may appear exactly 3 times in a string of five characters:
M(?:C{3}[^C]{2}|C{2}[^C]C[^C]|C{2}[^C]{2}C|C[^C]C{2}[^C]|C[^C]C[^C]C|C[^C]{2}C{2}|[^C]C{3}[^C]|[^C]C{2}[^C]C|[^C]C[^C]C{2}|[^C]{2}C{3})
The case-different flag needs to be set (/i or add (?i) at the beginning).
Demo
The regex can be read more easily as follows.
M
(?:
C{3}[^C]{2}
|
C{2}[^C]C[^C]
|
C{2}[^C]{2}C
|
C[^C]C{2}[^C]
|
C[^C]C[^C]C
|
C[^C]{2}C{2}
|
[^C]C{3}[^C]
|
[^C]C{2}[^C]C
|
[^C]C[^C]C{2}
|
[^C]{2}C{3}
)
I used the following Ruby code to get the 10 combinations (to make sure I didn't miss any):
['C','[^C]'].repeated_permutation(5)
.select { |arr| arr.count('C') == 3 }

Why does this not match my example?

as I go through the regex101 quiz/lessons, I am supposed to match an IP address (without leading zeros).
Now the following
^[^0]+[0-9]+\\.[^0]+[0-9]+\\.[^0]+[0-9]+\\.[^0]+[0-9]+$
matches 23.34.7433.33
but fails to match single digit numbers like 1.2.3.4
Why is this so, when my + is supposed to match "1 to infinite" times...?

You are in fact matching more than 2 digits for each number in the IP address because you have:
[^0]+[0-9]+
[^0]+ matches at least one character, and [0-9]+ matches at least 1 character. Both will match 'at least 2 characters' (characters being in scope of the character classes).
Also 23.34.7433.3 doesn't match your regex for the reason I stated above.
And you might try this regex for the purpose you stated:
^(?:[1-9][0-9]{0,2}\.){3}[1-9][0-9]{0,2}$
[1-9][0-9]{0,2} will match up to 3 digits, with a non leading 0.
EDIT: You mentioned in a comment that 0.0.0.0 (single digit zeroes) are to be accepted as well. The modified regex from above would be:
^(?:(?:[1-9][0-9]{0,2}|0)\.){3}(?:[1-9][0-9]{0,2}|0)$

Assuming you want to check an IPv4, I suggest you this pattern:
^(?<nb>2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9])(?>\.\g<nb>){3}$
I have defined a named subpattern nb to make the pattern shorter, but if you prefer, you can rewrite all and replace \g<nb>:
^(?>2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9])(?>\.(?>2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9])){3}$
Numbers greater than 255 are not allowed.
pattern details:
The goal is to describe what is allowed:
numbers with 3 digits that begins with "2" can be followed by a digit in [0-4] and a digit in [0-9] OR by 5 and a digit in [0-5] because it can exceed 255.
numbers with 3 digits that begins with "1" can be followed by any two digits.
any number with 2 digits that doesn't begin with "0"
any number with 1 digit (zero included)
If I add one by one these rules, I obtain
2(?>[0-4][0-9]|5[0-5])
2(?>[0-4][0-9]|5[0-5]) | 1[0-9]{2}
2(?>[0-4][0-9]|5[0-5]) | 1[0-9]{2} | [1-9][0-9]
2(?>[0-4][0-9]|5[0-5]) | 1[0-9]{2} | [1-9][0-9] | [0-9]
Now I have a definition for allowed numbers. I can reduce a little the size of the pattern replacing [1-9][0-9] | [0-9] by [1-9]?[0-9]
Then you only have to add the dot repeat the subpattern four times: x.x.x.x
But since there is only three dots, I write the first number and I repeat 3 times a group that contains a dot and a number:
2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9] # the first number
(?>\.2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9]){3} # the group repeated 3 times
To be sure that the string doesn't contain anything else that the IP I described, I add anchors for the start of string ^ and for the end of string $, then the string begins and ends with the IP.
To reduce the size of a pattern you can define a named group which allows to reuse the subpattern it contains,
Then you can rewrite the pattern like this:
^
(?<nb> 2(?>[0-4][0-9]|5[0-5])|1[0-9]{2}|[1-9]?[0-9] ) # named group definition
(?> \. \g<nb> ){3} # \g<nb> is the reference to the subpattern named nb
$

[0-9]+ should be [0-9]*
* matches 0 or more.
+ matches 1 or more.
You already have the case [^0] <--- this actually wrong because it will match letters also.
besides that it will match the first character that's NOT zero then at least one number after that.
It should be written as
[1-9][0-9]*
This essentially checks the first letter and sees if its a number that's between 1-9 then the next numbers(0 nums to infinite nums) after that is a number 0-9.
Then this will come out to.
^[1-9][0-9]*\.[1-9][0-9]*\.[1-9][0-9]*\.[1-9][0-9]*$
Edit live on Debuggex
cleaning it up.
^(?:[1-9][0-9]*\.){3}[1-9][0-9]*$
this should work...
^(?:[1-9][0-9]*\.|[0-9])(?:[1-9][0-9]*\.|[0-9])(?:[1-9][0-9]*\.|[0-9])(?:[1-9][0-9]*|[0-9])$
cleaned up.
^(?:(?:[1-9][0-9]*|0)\.){3}(?:[1-9][0-9]*|0)$

Your regex would match ABCDEFG999.FOOBSR888 etc, because [^0] is any character other than a zero, and bith character classes are required by the +.
I think you want this:
^[1-9]\d*(\\.[1-9]\d*){3}$
having replaced various verbose expressions with their equivalent, this is 4 groups of digits each starting with a non-zero.
Actually the problem is far more complicated, because your approach (once corrected) allows 999.999.999.999, which is not a valid IP.

It might be because you need at least two digits between two dots '.'
try using this pattern: ^[^0]+[0-9]*\.[^0]+[0-9]*\.[^0]+[0-9]*\.[^0]+[0-9]*$

to match ip address you should use this pattern:
\b(?:\d{1,3}.){3}\d{1,3}\b
taken from here:
http://www.regular-expressions.info/examples.html

limit expression length

I am using the following in a script of mine to verify minutes entered... it allows for numbers and a comma for thousands in the correct format only... however, I would like to add a length restriction as well... I can't seem to do it or I'm just putting itin the wrong spot... here is the code as is with no limit :
(!preg_match("#^(\d{1,3}(\,\d{3})*|(\d+))$#",$values['minutes']))
I would like to make this at least one with a max of five... the entry is for minutes online per day... well there are only 1440 minutes in a day... if you entered 1,440 which is valid currently that is 5 characters and I want to limit the expression to that...
Anyone?

Two suggestions:
preg_match("#^(?:\d{1,3}|1,?\d{3})$#"
Explanation:
^ # Start of string
(?: # Either match...
\d{1,3} # a three-digit number
| # or
1 # a four digit number that starts with a 1
,? # and may have a thousands separator
\d{3} # (and three more digits)
)
$ # End of string
The problem is of course that this also allows 1,999, so you'd still need an extra sanity check. This probably is the better solution.
You can also do the range limitation in the regex itself, but that's cumbersome:
preg_match("#^(?:1,?440|1,?4[0-3]\d|1,?[0-3]\d{2}|[1-9]\d{1,2}|\d)$#"
Explanation:
^ # Start of string
(?: # Either match...
1,?440 # 1440
| # or
1,?4[0-3]\d # 1400-1439
| # or
1,?[0-3]\d{2} # 1000-1399
| # or
[1-9]\d{1,2} # 10-999
| # or
\d # 0-9
)
$ # End of string

You're probably better off just testing the string's length or even the integer value. But just to show that it's possible:
preg_match("#^(\d,\d{3}|\d{1,4})$#")
Yes, it's very simple, since a four-digit number can only take one of the forms
one digit, comma, three digits
four digits

Regex expressions for matching comparisons

Is it possible to create a regular expression that matches a comparison such as less than or greater than? For example, match all dollar values less than $500.
One way I would use this would be on online stores that list many products on a single page but do not provide a way to sort by price. I found a search page by regex extension for Chrome and am trying to figure out if there is a way I can use a regex to match any strings on the page beginning with a dollar sign followed by any number less than a number that I specify.

This should work for you \$[1-4]?\d?\d\b.
Explanation:
r"""
\$ # Match the character “$” literally
[1-4] # Match a single character in the range between “1” and “4”
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
\d # Match a single digit 0..9
\b # Assert position at a word boundary
"""

This could do what you need: ^(\$[1-4]?\d?\d)$. This will match any value between $1 and $499.
As mentioned above, if you would like to match even decimal values you could use something like so: ^(\$[1-4]?\d?\d(\.\d{2})?)$. That being said, numeric validation should ideally be done using actual mathematical operations, and not regular expressions.

Edit: this is overly complicated, but it will also match any value strictly less than 500
\$[1-4]\d{2}(\.\d{2})?$|\$\d{1,2}(\.\d{2})?$
if you need to match $500 as well, add another |\$500(\.00)?$
This matches:
\$ the dollar symbol
[1-4] followed by a digit between 1 and 4
\d{2} followed by exactly 2 digits
(\.\d{2})? optionally --> ()? followed by a dot --> \. and exactly 2 digits
$ followed by end of line (may be replaced with \b for word boundaries)
| or
\$\d{1,2} the dollar symbol followed by any two digits
(\.\d{1,2})?$ again optionally followed by cents, followed by end of line

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Simple regex validation - regex

Related

How can I limit the total length of 2 adjacent strings in Regular Expression?

Regex: match a string if the next 5 characters contain 3 occurences ot the character k

Why does this not match my example?

limit expression length

Regex expressions for matching comparisons

Categories

Resources