Regex matching string containing at least x characters and x numbers

Regex matching string containing at least x characters and x numbers - regex

I have a requirement to test if string matches following rules:
Has at least 8 [a-zA-Z!##$%^&+=] characters and has at least 1 [0-9] number OR
Has at least 8 [0-9] numbers and has at least 1 [a-zA-Z!##$%^&+=] character
So far I tried this:
"^(?=(?=.*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=].*[a-zA-Z!##$%^&+=])(?=.*[0-9])|(?=.*[0-9].*[0-9].*[0-9].*[0-9].*[0-9].*[0-9].*[0-9].*[0-9])(?=.*[a-zA-Z!##$%^&+=])).{8,}\$")
It mostly works ok, but one scenario is failing:
"!abcdefgh1" --> matched (OK)
"{abcdefgh1" --> matched (NOT OK because character { shouldn't be allowed)
How to disallow any other characters except [a-zA-Z!##$%^&+=]?
Is it possible to write that regex in shorter way?
Thanks

The problem is that your .s are matching any character. To keep the convenience of using . to match a generic character but also require that the string doesn't contain any characters other than what's allowed, a simple tweak would be another lookahead at the beginning of the string to ensure that all characters before the end of the string are [a-zA-Z!##$%^&+=] or [0-9], nothing else.
Also note that [0-9] simplifies to \d, which is a bit nicer to look at:
^(?=[a-zA-Z!##$%^&+=\d]{9,}$) <rest of your regex>
You can also simplify your regex by repeating the big character set in a group, when possible, rather than writing it out manually 8 times. Also, as comment notes, when checking whether a string has enough digits, better to repeat (?:\D*\d) rather than using a dot, because you know that you want the antecedent to match non-digit characters.
Actually, because the initial lookahead above ensures that the string contains only digits and those certain non-digit characters, rather than repeating the long character set [a-zA-Z!##$%^&+=] again and again when matching a non-digit, you can just use \D, since the initial lookahead guarantees that non-digits will be within that character set.
For example:
^(?=[a-zA-Z!##$%^&+=\d]+$)(?:(?=\D*\d)(?=(?:\d*\D){8})|(?=(?:\D*\d){8})(?=\d*\D))
Explanation:
^(?=[a-zA-Z!##$%^&+=\d]{9,}$) - ensure string contains only desired characters (fail immediately if there are not at least 9 of them), then alternate between either:
(?=\D*\d)(?=(?:\d*\D){8}) - string contains at least one digit, and 8 other characters, or
(?=(?:\D*\d){8})(?=\d*\D) - string contains at least 8 digits, and at least one of the other characters
https://regex101.com/r/18xtBw/2 (to test, input only one line at a time - otherwise, the \Ds will match newline characters, which will cause problems)

Related

Am I implementing negative lookaheads correctly with my regex?

I'm a beginner with regex and stuck with creating regex with the following conditions:
Minimum of 8 characters
Maximum of 60 characters
Must contain 2 letters
Must contain 1 number
Must contain 1 special character
Special character cannot be the following: & ` ( ) = [ ] | ; " ' < >
So far I have the following...
(?=^.{8,60}$)(?=.*\d)(?=[a-zA-Z]{2,})(?!.*[&`()=[|;"''\]'<>]).*
But my last two tests are failing and I have no idea why...
!##$%^*+-_~?,.{}!HR12345
123456789AB!
If you'd like to see my test and expected results, visit here: https://regexr.com/73m2o
My tests contains acceptable number of characters, appropriate number of alphabetic characters, and supported special characters... I don't know why it's failing!

Using .* to verify a character in the string can be very inefficient and I would suggest using negated character classes for the principle of contast.
Apart from that, there is a point in the question Must contain 1 special character that is not addressed yet in the current answers.
You can use a positive lookahead for that to assert one of the characters that you consider special.
^(?=[^\d\n]*\d)(?=[^a-zA-Z\n]*[a-zA-Z][^a-zA-Z\n]*[a-zA-Z])(?=[^!##$%^\n]*[!##$%^])[^&`()=[|;"''\]'<>\n]{8,60}$
Explanation
^ Start of string (Outside of the lookahead)
(?=[^\d\n]*\d) Assert a digit
(?=[^a-zA-Z\n]*[a-zA-Z][^a-zA-Z\n]*[a-zA-Z]) Assert 2 chars a-zA-Z
(?=[^!##$%^\n]*[!##$%^]) Assert a "special" character
[^&`()=[|;"''\]'<>\n]{8,60} Match 8-60 characters except for the ones that you don't want to match
$ End of string
See a regex demo.

Part of the issue is that you're missing the .* in (?=[a-zA-Z]{2,}). However, your implementation of "two or more" letters is not correct unless the letters must be consecutive.
You'll see that the string 1234567B89A! fails to match, even with the correction. You can fix this like so:
(?=^.{8,60}$)(?=.*\d)(?=.*[a-zA-Z].*[a-zA-Z])(?!.*[&`()=[|;"''\]'<>]).*
The part I changed is (?=.*[a-zA-Z].*[a-zA-Z]) asserting that we can match a letter, zero or more other characters, and then another letter.
https://regex101.com/r/jEsK0S/1
Also, there's currently no assertion that there must be a special character, only an assertion of which ones shouldn't match. So I'd suggest adding another lookahead with a list of valid special characters.

Since the 2+ alphabetical characters can appear anywhere in the string, you need to prepend your check for them with .* (as you have with the other character classes you're checking for); otherwise the positive lookaheads will, in this scenario, try to assert their appearance at the beginning of the string (position 0):
(?=^.{8,60}$)(?=.*\d)(?=.*[a-zA-Z]{2,})(?!.*[&`()=[|;"''\]'<>]).*

Matching any password except one containing repeating characters [duplicate]

Edit: Thanks for the advice to make my question clearer :)
The Match is looking for 3 consecutive characters:
Regex Match =AaA653219
Regex Match = AA5556219
The code is ASP.NET 4.0. Here is the whole function:
public ValidationResult ApplyValidationRules()
{
ValidationResult result = new ValidationResult();
Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
bool valid = regEx.IsMatch(_Password);
if (!valid)
result.Errors.Add("Passwords must be 8-20 characters in length, contain at least one alpha character and one numeric character");
return result;
}
I've tried for over 3 hours to make this work, referencing the below with no luck =/
How can I find repeated characters with a regex in Java?
.net Regex for more than 2 consecutive letters
I have started with this for 8-20 characters a-Z 0-9 :
^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$
As Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
I've tried adding variations of the below with no luck:
/(.)\1{9,}/
.*([0-9A-Za-z])\\1+.*
((\\w)\\2+)+".
Any help would be much appreciated!

http://regexr.com?34vo9
The regular expression:
^(?=.{8,20}$)(([a-z0-9])\2?(?!\2))+$
The first lookahead ((?=.{8,20}$)) checks the length of your string. The second portion does your double character and validity checking by:
(
([a-z0-9]) Matching a character and storing it in a back reference.
\2? Optionally match one more EXACT COPY of that character.
(?!\2) Make sure the upcoming character is NOT the same character.
)+ Do this ad nauseum.
$ End of string.
Okay. I see you've added some additional requirements. My basic forumla still works, but we have to give you more of a step by step approach. SO:
^...$
Your whole regular expression will be dropped into start and end characters, for obvious reasons.
(?=.{n,m}$)
Length checking. Put this at the beginning of your regular expression with n as your minimum length and m as your maximum length.
(?=(?:[^REQ]*[REQ]){n,m})
Required characters. Place this at the beginning of your regular expression with REQ as your required character to require N to M of your character. YOu may drop the (?: ..){n,m} to require just one of that character.
(?:([VALID])\1?(?!\1))+
The rest of your expression. Replace VALID with your valid Characters. So, your Password Regex is:
^(?=.{8,20}$)(?=[^A-Za-z]*[A-Za-z])(?=[^0-9]*[0-9])(?:([\w\d*?!:;])\1?(?!\1))+$
'Splained:
^
(?=.{8,20}$) 8 to 20 characters
(?=[^A-Za-z]*[A-Za-z]) At least one Alpha
(?=[^0-9]*[0-9]) At least one Numeric
(?:([\w\d*?!:;])\1?(?!\1))+ Valid Characters, not repeated thrice.
$
http://regexr.com?34vol Here's the new one in action.

Tightened up matching criteria as it was too broad; for example, "not A-Za-z" matches a lot more than is intended. The previous REGEX was matching on the string "ThiIsNot". For the most part, passwords are only going to contain alphanumeric and punctation characters, so I limited the scope, which made all matches more accurate. Used character classes for human readability. Added and exclusion list, and differentiated upper and lower case letters.
^(?=.{8,20}$)(?!(?:.*[01IiLlOo]))(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1})(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1})(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1})(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$
The breakdown:
^(?=.{8,20}$) - Positive lookahead that the string is between 8 and 20 chars
(?!(?:.*[01IiLlOo])) - Negative lookahead for any blacklisted chars
(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2}) - Verify that at least 2 alpha chars exist
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1}) - Verify that at least 1 lowercase alpha exists
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1}) - Verify that at least 1 uppercase alpha exists
(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1}) - Verify that at least 1 digit exists
(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1}) - Verify that at least 1 special/punctuation char exists
(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$ - Verify that no char is repeated more than twice in a row

Decoding a regex... I know what it's function is but I want to understand exactly what is happening

I have a regular expression that I'm going to be using to verify that an inputted number is in standard U.S. telephone format (i.e (###) ###-####). I am new to regex and still having some trouble figuring out the exact function of each character. If someone would go through this piece by piece/verify that I am understanding I would really appreciate it. Also if the regex is wrong I would obviously like to know that.
\D*?(\d\D*?){10}
What I think is happening:
\D*?( indicates an escape sequence for the parenthesis metacharacter... not sure why the \D*? is necessary
\d indicating digits
\D*? indicating there is a non-digit character (-) followed by the closing parenthesis.
{10} for the 10 digits
I feel very unsure explaining this, like my understanding is very vague in terms of why the regex is in the order that it is etc. Thanks in advance for help/explanations.
EDIT
It seems like this is not the best regex for what I want. Another possibility was [(][0-9]{3}[)] [0-9]{3}-[0-9]{4}, but I was told this would fail. I suppose I'll have to do a little more work with regular expressions to figure this out.

\D matches any non-digit character.
* means that the previous character is repeated 0 or more times.
*? means that the previous character is repeated 0 or more times, but until the match of the following character in the regex. It is a bit difficult perhaps at the start, but in your regex, the next character is \d, meaning \D*? will match the least amount of characters until the next \d character.
( ... ) is a capture group, and is also used to group things. For instance {10} means that the previous character or group is repeated 10 times exactly.
Now, \D*?(\d\D*?){10} will match exactly 10 numbers, starting with non-digit characters or not, with non-digit characters in between the digits if they are present.
[(][0-9]{3}[)] [0-9]{3}-[0-9]{4}
This regex is a bit better since it doesn't just accept anything (like the first regex does) and will match the format (###) ###-#### (notice the space is a character in regex!).
The new things introduced here are the square brackets. These represent character classes. [0-9] means any character between 0 to 9 inclusive, which means it will match 0, 1, 2, 3, 4, 5, 6, 7, 8 or 9. Adding {3} after it makes it match 3 similar character class, and since this character class contains only digits, it will match exactly 3 digits.
A character class can be used to escape certain characters, such as ( or ) (note I mentioned earlier they are for capturing groups, or grouping) and thus, [(] and [)] are literal ( and ) instead of being used for capturing/grouping.
You can also use backslashes (\) to escape characters. Thus:
\([0-9]{3}\) [0-9]{3}-[0-9]{4}
Will also work. I would also recommend the use of line anchors ^ and $ if you're only trying to see if a phone number matches the above format. This ensures that the string has only the phone number, and nothing else. ^ matches the beginning of a line and $ matches the end of a line. Thus, the regex will become:
^\([0-9]{3}\) [0-9]{3}-[0-9]{4}$
However, I don't know all the combinations of the different formats of phone numbers in the US, so this regex might need some tweaking if you have different phone number formats.

\D is "not a digit"; \d is "digit". With that in mind:
This matches zero or more non-digits, then it matches a digit and any number of non-digit characters 10 times. This won't actually verify that the number if formatted properly, just that it contains 10 digits. I suspect that the regex isn't what you want in the first place.
For example, the following will match your regex:
this is some bad text 1 and some more 2 and more 34567890

\D matches a character that is not a digit
* repeats the previous item 0 or more times
? find the first occurrence
\d matches a digit
so your group is matches 10 digits or non digits

RegEx No more than 2 identical consecutive characters and a-Z and 0-9

Edit: Thanks for the advice to make my question clearer :)
The Match is looking for 3 consecutive characters:
Regex Match =AaA653219
Regex Match = AA5556219
The code is ASP.NET 4.0. Here is the whole function:
public ValidationResult ApplyValidationRules()
{
ValidationResult result = new ValidationResult();
Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
bool valid = regEx.IsMatch(_Password);
if (!valid)
result.Errors.Add("Passwords must be 8-20 characters in length, contain at least one alpha character and one numeric character");
return result;
}
I've tried for over 3 hours to make this work, referencing the below with no luck =/
How can I find repeated characters with a regex in Java?
.net Regex for more than 2 consecutive letters
I have started with this for 8-20 characters a-Z 0-9 :
^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$
As Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
I've tried adding variations of the below with no luck:
/(.)\1{9,}/
.*([0-9A-Za-z])\\1+.*
((\\w)\\2+)+".
Any help would be much appreciated!

http://regexr.com?34vo9
The regular expression:
^(?=.{8,20}$)(([a-z0-9])\2?(?!\2))+$
The first lookahead ((?=.{8,20}$)) checks the length of your string. The second portion does your double character and validity checking by:
(
([a-z0-9]) Matching a character and storing it in a back reference.
\2? Optionally match one more EXACT COPY of that character.
(?!\2) Make sure the upcoming character is NOT the same character.
)+ Do this ad nauseum.
$ End of string.
Okay. I see you've added some additional requirements. My basic forumla still works, but we have to give you more of a step by step approach. SO:
^...$
Your whole regular expression will be dropped into start and end characters, for obvious reasons.
(?=.{n,m}$)
Length checking. Put this at the beginning of your regular expression with n as your minimum length and m as your maximum length.
(?=(?:[^REQ]*[REQ]){n,m})
Required characters. Place this at the beginning of your regular expression with REQ as your required character to require N to M of your character. YOu may drop the (?: ..){n,m} to require just one of that character.
(?:([VALID])\1?(?!\1))+
The rest of your expression. Replace VALID with your valid Characters. So, your Password Regex is:
^(?=.{8,20}$)(?=[^A-Za-z]*[A-Za-z])(?=[^0-9]*[0-9])(?:([\w\d*?!:;])\1?(?!\1))+$
'Splained:
^
(?=.{8,20}$) 8 to 20 characters
(?=[^A-Za-z]*[A-Za-z]) At least one Alpha
(?=[^0-9]*[0-9]) At least one Numeric
(?:([\w\d*?!:;])\1?(?!\1))+ Valid Characters, not repeated thrice.
$
http://regexr.com?34vol Here's the new one in action.

Tightened up matching criteria as it was too broad; for example, "not A-Za-z" matches a lot more than is intended. The previous REGEX was matching on the string "ThiIsNot". For the most part, passwords are only going to contain alphanumeric and punctation characters, so I limited the scope, which made all matches more accurate. Used character classes for human readability. Added and exclusion list, and differentiated upper and lower case letters.
^(?=.{8,20}$)(?!(?:.*[01IiLlOo]))(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1})(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1})(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1})(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$
The breakdown:
^(?=.{8,20}$) - Positive lookahead that the string is between 8 and 20 chars
(?!(?:.*[01IiLlOo])) - Negative lookahead for any blacklisted chars
(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2}) - Verify that at least 2 alpha chars exist
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1}) - Verify that at least 1 lowercase alpha exists
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1}) - Verify that at least 1 uppercase alpha exists
(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1}) - Verify that at least 1 digit exists
(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1}) - Verify that at least 1 special/punctuation char exists
(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$ - Verify that no char is repeated more than twice in a row

How can I recognize a valid barcode using regex?

I have a barcode of the format 123456########. That is, the first 6 digits are always the same followed by 8 digits.
How would I check that a variable matches that format?

You haven't specified a language, but regexp. syntax is relatively uniform across implementations, so something like the following should work: 123456\d{8}
\d Indicates numeric characters and is typically equivalent to the set [0-9].
{8} indicates repetition of the preceding character set precisely eight times.
Depending on how the input is coming in, you may want to anchor the regexp. thusly:
^123456\d{8}$
Where ^ matches the beginning of the line or string and $ matches the end. Alternatively, you may wish to use word boundaries, to ensure that your bar-code strings are properly separated:
\b123456\d{8}\b
Where \b matches the empty string but only at the edges of a word (normally defined as a sequence consisting exclusively of alphanumeric characters plus the underscore, but this can be locale-dependent).

123456\d{8}
123456 # Literals
\d # Match a digit
{8} # 8 times
You can change the {8} to any number of digits depending on how many are after your static ones.
Regexr will let you try out the regex.

123456\d{8}
should do it. This breaks down to:
123456 - the fixed bit, obviously substitute this for what you're fixed bit is, remember to escape and regex special characters in here, although with just numbers you should be fine
\d - a digit
{8} - the number of times the previous element must be repeated, 8 in this case.
the {8} can take 2 digits if you have a minimum or maximum number in the range so you could do {6,8} if the previous element had to be repeated between 6 and 8 times.

The way you describe it, it's just
^123456[0-9]{8}$
...where you'd replace 123456 with your 6 known digits. I'm using [0-9] instead of \d because I don't know what flavor of regex you're using, and \d allows non-Arabic numerals in some flavors (if that concerns you).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex matching string containing at least x characters and x numbers - regex

Related

Am I implementing negative lookaheads correctly with my regex?

Matching any password except one containing repeating characters [duplicate]

Decoding a regex... I know what it's function is but I want to understand exactly what is happening

RegEx No more than 2 identical consecutive characters and a-Z and 0-9

How can I recognize a valid barcode using regex?

Categories

Resources