Regex to check string for unique characters and prohibited characters - regex

I'm writing a password validation regex and I've managed to get 80-90% of the way there but can't incorporate the last two pieces I need and I'm sick of beating my head against the wall so that's where you guys come in ;)
Here is my expression so far:
^(?!.*(.)\1{3}).*(?=.{8,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[\Q~!##$%^&*()-_=+[]{}|;:,.<>/?\E]).*$
so in order I have the following rules:
(?!.*(.)\1{3}) - no more than 3 of the same character in sequence
.*(?=.{8,}) - string must be a minimum of 8 characters
(?=.*\d) - must contain at least one digit
(?=.*[a-z]) - must contain at least one lower case letter
(?=.*[A-Z]) - must contain at least one upper case letter
(?=.\*[\Q~!##$%^&*()-_=+[]{}|;:,.<>/?\E]) - must contain at least one of these special characters
I need to add two more restrictions
1) no character other than an alphanumeric or one of my special characters may appear in the string. So I think I have the basic expression correct:
^([\w\Q~!##$%^&*()-_=+[]{}|;:,.<>/?\E]*)$
but when I try to add that into my overall expression it doesn't work or it screws up one of my other conditions, so I'm not sure what I'm doing wrong
2) the string MUST contain 4 unique characters. I cant figure this one out at all.
thanks in advance for any help you can provide

Try this one. (I removed a couple of .*s which aren't needed and removed the minimum of 8 chars because that can be incorporated in the final piece.)
^
(?!.*(.)\1{3})
(?=.*\d)
(?=.*[a-z])
(?=.*[A-Z])
(?=.*[\Q~!##$%^&*()-_=+[]{}|;:,.<>/?\E])
[\w\Q~!##$%^&*()-_=+[]{}|;:,.<>/?\E]{8,}
$
Also, your last rule:
the string MUST contain 4 unique characters.
Is already checked for, because you are requesting one digit, one upper, one lower, and one special = four different classes.

Related

Regex to have two out of three character types [duplicate]

My client has requested that passwords on their system must following a specific set of validation rules, and I'm having great difficulty coming up with a "nice" regular expression.
The rules I have been given are...
Minimum of 8 character
Allow any character
Must have at least one instance from three of the four following character types...
Upper case character
Lower case character
Numeric digit
"Special Character"
When I pressed more, "Special Characters" are literally everything else (including spaces).
I can easily check for at least one instance for all four, using the following...
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?\d)(?=.*?[^a-zA-Z0-9]).{8,}$
The following works, but it's horrible and messy...
^((?=.*?[A-Z])(?=.*?[a-z])(?=.*?\d)|(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[^a-zA-Z0-9])|(?=.*?[A-Z])(?=.*?\d)(?=.*?[^a-zA-Z0-9])|(?=.*?[a-z])(?=.*?\d)(?=.*?[^a-zA-Z0-9])).{8,}$
So you don't have to work it out yourself, the above is checking for (1,2,3|1,2,4|1,3,4|2,3,4) which are the 4 possible combinations of the 4 groups (where the number relates to the "types" in the set of rules).
Is there a "nicer", cleaner or easier way of doing this?
(Please note, this is going to be used in an <asp:RegularExpressionValidator> control in an ASP.NET website, so therefore needs to be a valid regex for both .NET and javascript.)
It's not much of a better solution, but you can reduce [^a-zA-Z0-9] to [\W_], since a word character is all letters, digits and the underscore character. I don't think you can avoid the alternation when trying to do this in a single regex. I think you have pretty much have the best solution.
One slight optimization is that \d*[a-z]\w_*|\d*[A-Z]\w_* ~> \d*[a-zA-Z]\w_*, so I could remove one of the alternation sets. If you only allowed 3 out of 4 this wouldn't work, but since \d*[A-Z][a-z]\w_* was implicitly allowed it works.
(?=.{8,})((?=.*\d)(?=.*[a-z])(?=.*[A-Z])|(?=.*\d)(?=.*[a-zA-Z])(?=.*[\W_])|(?=.*[a-z])(?=.*[A-Z])(?=.*[\W_])).*
Extended version:
(?=.{8,})(
(?=.*\d)(?=.*[a-z])(?=.*[A-Z])|
(?=.*\d)(?=.*[a-zA-Z])(?=.*[\W_])|
(?=.*[a-z])(?=.*[A-Z])(?=.*[\W_])
).*
Because of the fourth condition specified by the OP, this regular expression will match even unprintable characters such as new lines. If this is unacceptable then modify the set that contains \W to allow for more specific set of special characters.
I'd like to improve the accepted solution with this one
^(?=.{8,})(
(?=.*[^a-zA-Z\s])(?=.*[a-z])(?=.*[A-Z])|
(?=.*[^a-zA-Z0-9\s])(?=.*\d)(?=.*[a-zA-Z])
).*$
The above Regex worked well for most scenarios except for strings such as "AAAAAA1$", "$$$$$$1a"
This could be an issue only in iOS ( Objective C and Swift) that the regex "\d" has issues
The following fix worked in iOS, i.e changing to [0-9] for digits
^((?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])|(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[^a-zA-Z0-9])|(?=.*?[A-Z])(?=.*?[0-9])(?=.*?[^a-zA-Z0-9])|(?=.*?[a-z])(?=.*?[0-9])(?=.*?[^a-zA-Z0-9])).{8,}$
Password must meet at least 3 out of the following 4 complexity rules,
[at least 1 uppercase character (A-Z) at least 1 lowercase character (a-z) at least 1 digit (0-9) at least 1 special character — do not forget to treat space as special characters too]
at least 10 characters
at most 128 characters
not more than 2 identical characters in a row (e.g., 111 not allowed)
'^(?!.(.)\1{2}) ((?=.[a-z])(?=.[A-Z])(?=.[0-9])|(?=.[a-z])(?=.[A-Z])(?=.[^a-zA-Z0-9])|(?=.[A-Z])(?=.[0-9])(?=.[^a-zA-Z0-9])|(?=.[a-z])(?=.[0-9])(?=.*[^a-zA-Z0-9])).{10,127}$'
(?!.*(.)\1{2})
(?=.[a-z])(?=.[A-Z])(?=.*[0-9])
(?=.[a-z])(?=.[A-Z])(?=.*[^a-zA-Z0-9])
(?=.[A-Z])(?=.[0-9])(?=.*[^a-zA-Z0-9])
(?=.[a-z])(?=.[0-9])(?=.*[^a-zA-Z0-9])
.{10,127}

Regex to find 3 out of 4 conditions

My client has requested that passwords on their system must following a specific set of validation rules, and I'm having great difficulty coming up with a "nice" regular expression.
The rules I have been given are...
Minimum of 8 character
Allow any character
Must have at least one instance from three of the four following character types...
Upper case character
Lower case character
Numeric digit
"Special Character"
When I pressed more, "Special Characters" are literally everything else (including spaces).
I can easily check for at least one instance for all four, using the following...
^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?\d)(?=.*?[^a-zA-Z0-9]).{8,}$
The following works, but it's horrible and messy...
^((?=.*?[A-Z])(?=.*?[a-z])(?=.*?\d)|(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[^a-zA-Z0-9])|(?=.*?[A-Z])(?=.*?\d)(?=.*?[^a-zA-Z0-9])|(?=.*?[a-z])(?=.*?\d)(?=.*?[^a-zA-Z0-9])).{8,}$
So you don't have to work it out yourself, the above is checking for (1,2,3|1,2,4|1,3,4|2,3,4) which are the 4 possible combinations of the 4 groups (where the number relates to the "types" in the set of rules).
Is there a "nicer", cleaner or easier way of doing this?
(Please note, this is going to be used in an <asp:RegularExpressionValidator> control in an ASP.NET website, so therefore needs to be a valid regex for both .NET and javascript.)
It's not much of a better solution, but you can reduce [^a-zA-Z0-9] to [\W_], since a word character is all letters, digits and the underscore character. I don't think you can avoid the alternation when trying to do this in a single regex. I think you have pretty much have the best solution.
One slight optimization is that \d*[a-z]\w_*|\d*[A-Z]\w_* ~> \d*[a-zA-Z]\w_*, so I could remove one of the alternation sets. If you only allowed 3 out of 4 this wouldn't work, but since \d*[A-Z][a-z]\w_* was implicitly allowed it works.
(?=.{8,})((?=.*\d)(?=.*[a-z])(?=.*[A-Z])|(?=.*\d)(?=.*[a-zA-Z])(?=.*[\W_])|(?=.*[a-z])(?=.*[A-Z])(?=.*[\W_])).*
Extended version:
(?=.{8,})(
(?=.*\d)(?=.*[a-z])(?=.*[A-Z])|
(?=.*\d)(?=.*[a-zA-Z])(?=.*[\W_])|
(?=.*[a-z])(?=.*[A-Z])(?=.*[\W_])
).*
Because of the fourth condition specified by the OP, this regular expression will match even unprintable characters such as new lines. If this is unacceptable then modify the set that contains \W to allow for more specific set of special characters.
I'd like to improve the accepted solution with this one
^(?=.{8,})(
(?=.*[^a-zA-Z\s])(?=.*[a-z])(?=.*[A-Z])|
(?=.*[^a-zA-Z0-9\s])(?=.*\d)(?=.*[a-zA-Z])
).*$
The above Regex worked well for most scenarios except for strings such as "AAAAAA1$", "$$$$$$1a"
This could be an issue only in iOS ( Objective C and Swift) that the regex "\d" has issues
The following fix worked in iOS, i.e changing to [0-9] for digits
^((?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])|(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[^a-zA-Z0-9])|(?=.*?[A-Z])(?=.*?[0-9])(?=.*?[^a-zA-Z0-9])|(?=.*?[a-z])(?=.*?[0-9])(?=.*?[^a-zA-Z0-9])).{8,}$
Password must meet at least 3 out of the following 4 complexity rules,
[at least 1 uppercase character (A-Z) at least 1 lowercase character (a-z) at least 1 digit (0-9) at least 1 special character — do not forget to treat space as special characters too]
at least 10 characters
at most 128 characters
not more than 2 identical characters in a row (e.g., 111 not allowed)
'^(?!.(.)\1{2}) ((?=.[a-z])(?=.[A-Z])(?=.[0-9])|(?=.[a-z])(?=.[A-Z])(?=.[^a-zA-Z0-9])|(?=.[A-Z])(?=.[0-9])(?=.[^a-zA-Z0-9])|(?=.[a-z])(?=.[0-9])(?=.*[^a-zA-Z0-9])).{10,127}$'
(?!.*(.)\1{2})
(?=.[a-z])(?=.[A-Z])(?=.*[0-9])
(?=.[a-z])(?=.[A-Z])(?=.*[^a-zA-Z0-9])
(?=.[A-Z])(?=.[0-9])(?=.*[^a-zA-Z0-9])
(?=.[a-z])(?=.[0-9])(?=.*[^a-zA-Z0-9])
.{10,127}

How to include special chars in this regex

First of all I am a total noob to regular expressions, so this may be optimized further, and if so, please tell me what to do. Anyway, after reading several articles about regex, I wrote a little regex for my password matching needs:
(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])(^[A-Z]+[a-z0-9]).{8,20}
What I am trying to do is: it must start with an uppercase letter, must contain a lowercase letter, must contain at least one number must contain at least on special character and must be between 8-20 characters in length.
The above somehow works but it doesn't force special chars(. seems to match any character but I don't know how to use it with the positive lookahead) and the min length seems to be 10 instead of 8. what am I doing wrong?
PS: I am using http://gskinner.com/RegExr/ to test this.
Let's strip away the assertions and just look at your base pattern alone:
(^[A-Z]+[a-z0-9]).{8,20}
This will match one or more uppercase Latin letters, followed by by a single lowercase Latin letter or decimal digit, followed by 8 to 20 of any character. So yes, at minimum this will require 10 characters, but there's no maximum number of characters it will match (e.g. it will allow 100 uppercase letters at the start of the string). Furthermore, since there's no end anchor ($), this pattern would allow any trailing characters after the matched substring.
I'd recommend a pattern like this:
^(?=.*[a-z])(?=.*[0-9])(?=.*[!##$])[A-Z]+[A-Za-z0-9!##$]{7,19}$
Where !##$ is a placeholder for whatever special characters you want to allow. Don't forget to escape special characters if necessary (\, ], ^ at the beginning of the character class, and- in the middle).
Using POSIX character classes, it might look like this:
^(?=.*[:lower:])(?=.*[:digit:])(?=.*[:punct:])[:upper:]+[[:alnum:][:punct:]]{7,19}$
Or using Unicode character classes, it might look like this:
^(?=.*[\p{Ll}])(?=.*\d)(?=.*[\p{P}\p{S}])[\p{Lu}]+[\p{L}\d\p{P}\p{S}]{7,19}$
Note: each of these considers a different set of 'special characters', so they aren't identical to the first pattern.
The following should work:
^(?=.*[a-z])(?=.*[0-9])(?=.*[^a-zA-Z0-9])[A-Z].{7,19}$
I removed the (?=.*[A-Z]) because the requirement that you must start with an uppercase character already covers that. I added (?=.*[^a-zA-Z0-9]) for the special characters, this will only match if there is at least one character that is not a letter or a digit. I also tweaked the length checking a little bit, the first step here was to remove the + after the [A-Z] so that we know exactly one character has been matched so far, and then changing the .{8,20} to .{7,19} (we can only match between 7 and 19 more characters if we already matched 1).
Well, here is how I would write it, if I had such requirements - excepting situations where it's absolutely not possible or practical, I prefer to break up complex regular expressions. Note that this is English-specific, so a Unicode or POSIX character class (where supported) may make more sense:
/^[A-Z]/ && /[a-z]/ && /[1-9]/ && /[whatever special]/ && ofCorrectLength(x)
That is, I would avoid trying to incorporate all the rules at once.

Regular expression for passwords with special characters

Here is the regular expression i fount from microsoft's website
(?!^[0-9]*$)(?!^[a-zA-Z]*$)^([a-zA-Z0-9]{8,10})$
and it Validates a strong password. It must be between 8 and 10 characters, contain at least one digit and one alphabetic character, and must not contain special characters.
But now we decide to allow user using special characters in their passwords, so how do I modify this regular expression? I don't quite understand why put ?! in front.
(?!^[0-9]*$) is a negative lookahead. This assertion fails if there are only digits from the start to the end. So, you have different possibilities:
I would rewrite those conditions to require at least one and not to forbid only that characters.
(?=.*\d) would require at least one digit
(?=.*[a-zA-Z]) would require at least one letter
Your regex would then look something like this:
^(?=.*[0-9])(?=.*[a-zA-Z]).{8,10}$
means require at least one digit, one letter and consist of 8 to 10 characters. The . can be any character, but no newlines.
See it here at Regexr

Special way of forming regex?

I've come across this regex and I was wondering how this is used:
^.*(?=.{10,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$
I want to know what the individual section of the regex mean, not only what the regex in its whole does.
With the knowledge of regex's I have, I think it matches for any input (at least 10 chars long) that matches a digit (0-9), lowercase and uppercase letters, but I need confirmation if this is correct?
Edit
I also don't know what it is meant to validate, but looking at what I think it does, is it right that the regex can be simplified to:
[\d|[a-zA-Z]]{10,}
Edit 2
I've noticed my replacement regex doesn't make sure I have at least one of every requirements (at least a digit, upcase and lowcase letter). Any way to adjust it so the regex does that as well, or is that only possible with the original regex?
I can explain what the parts of the regex do, but in general I find this quite odd:
^.*(?=.{10,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$
Basically what you said is true - there is no other magic in the regex.
^.* - match the beginning of the line and 0+ characters then ensure that
The following just assert - none of them matches/captures anything. It's called the positive lookahead if you want to look it up. if all of them evaluate to true, the last part of the regex will do the rest:
(?=.{10,}) - from where the first matching stops (could be after the beginning of the line) there is a string of 10+ chars (any chars)
(?=.*\d) - and there is at least one digit in the whole string ahead
(?=.*[a-z]) - and a lower case letter
(?=.*[A-Z]) - and an upper case letter
If all that is true, then:
.*$ - match everything till the end of the line
Note: if any of the asserts fail, nothing will be matched.
To your edit
I don't think so - it's not the same to say that there is an upper and lower case letter and a digit somewhere in the string, and to say that the string consists of 10+ characters of which all are either digits or letters (upper or lower case) or both. Your regex would match a string that consists of only digits as well as only letters or a mix of both - the original regex ensures that each of these classes is represented at least once. It seems that someone might have used it to validate a user password or something like that.
This is probably used to validate candidate passwords - it
Checks that it is at least 10 characters long
Checks that it contains at least one digit
Checks that it contains at least one lower case letter
Checks that it contains at least one upper case letter
Your replacement regex is not identical because it just ORs the above conditions - the long nasty regex ANDs them. Also there is no order to the above conditions; the letters or digits can occur anywhere in the string.
I don't see a way of simplifying it much further actually - you might perhaps remove the .* at the beginning and .*$ at the end since they don't really serve any purpose. But otherwise, that long regex does a good job of conjunctively imposing those conditions without imposing an order.
I think this is used for ensuring password strength: it has to be at least 10 chars long, with at least 1 digit, at least 1 lowercase letter, and at least 1 uppercase letter.
The most important part of the whole regex is the (?=...) operator, which matches, but does NOT consume the part of the string it matches. Multiple (?=...) next to one another, therefore, acts as an AND operator.
(?=.{10,}) matches any sequence of at least 10 chars.
(?=.*\d) matches a single digit that follows anything.
(?=.*[a-z]) matches a lowercase char that follows anything.
(?=.*[A-Z]) matches an uppercase char that follows anything.
So this regex will match any string that has a substring that is at least 10-char long, has at least a digit, a lowercase char, and an uppercase char.
You can see that it sounds more complicated than it should, especially the substring part. Indeed, the .* part right after ^ is not necessary, and we can simplify this as
^(?=.{10,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$
It's a password strength validation regex as others have said, but that .* at the beginning should not be there. As it is, the .* initially consumes the whole string, then backtracks until it reaches a position where all four lookaheads can match. It works, but why make the regex do so much work if it doesn't have to?
^(?=.{10,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$
With the leading .* removed, the regex never has to backtrack (unless you count returning to the starting position after a successful lookahead backtracking). As for the .*$ at the end, it might not be necessary, but it does no harm either. I would leave it in, just in case someone tries to use the result of the match for something instead of the original string.
One more point: you could make the regex more concise by removing the first lookahead and putting the .{10,} in place of the .*:
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{10,}$
The reason it's written the way it is, is to work around a long standing bug in Internet Explorer (ref). The bug finally got fixed in IE8 or IE9, but I would leave it the way it is, just in case.