OK so I have a mental block when it comes to regex - but I was told to come up with a regex expression that met these conditions:
must be at least 8 characters (easy!)
must have characters from at least 3 of the 4 different character types - upper case, lower case, digits, symbols (ok)
must have at least 5 different characters
must not have a long sequence of the same character type (eg. asdnme would be considered bad as its a long sequence of lower case)
(?=^.{8,255}$)((?=.*\d)(?=.*[A-Z])(?=.*[a-z])|(?=.*\d)(?=.*[^A-Za-z0-9\s])(?=.*[a-z])|(?=.*[^A-Za-z0-9\s])(?=.*[A-Z])(?=.*[a-z])|(?=.*\d)(?=.*[A-Z])(?=.*[^A-Za-z0-9\s]))
This regex expression satisfies 1 and 2. But I am struggling to find examples for 3 and 4.
If any regex enthusiasts could help me - it would be appreciated. :)
Note: I would prefer not to use Regex - this is me asking anyone if it's possible to check for the 3rd and 4th condition using regex? And please don't downvote me for the belief that regex is the only solution. I don't believe it is - our achitect decided the least effort would be involved in using regex to solve this issue.
Personally I think this level of password security is going to make the system unusable!!! But maybe I don't care enough about password security :)
Note: We're trying to make use of the Microsoft ASPNET Membership - regex expression. Which is why I thought it needed to be a single expression. I get that it's horrible to try to read/understand.
If anyone can provide individual regex expressions for
- must have at least 5 different characters
- must not have a long sequence of the same character type (eg. asdnme would be considered bad as its a long sequence of lower case) - assume 5 sequence is too long..
Or c# code /javascript ? Although this is specific to one particular client - we don't want it blanked applied to all clients. Which is probably why the architect wanted a nice regex expression that you could just slot in at deployment time. :(
Found someone else's example that works in .NET
^(?!.*(.)\1{2})((?[A-Z])|(?[a-z])|(?\d)|(?[^A-Za-z\d])){8,}(?(Upper)(?(Lower)(?(Numeric)|(?(NonAlphaNumeric)|(?!)))|(?(Numeric)(?(NonAlphaNumeric)|(?!))|(?!)))|(?(Lower)(?(Numeric)(?(NonAlphaNumeric)|(?!))|(?!))|(?!)))$
Unfortunately it meets these conditions:
Must have a minimum length of 8 characters
Must contain characters from three of the four following types:
English upper-case characters (A - Z)
English lower-case characters (a - z)
Numerical digits (0 - 9)
Non-alphanumeric characters
No character can be repeated 3 or more times in a row, e.g.
BB (letter B twice) is OK, but BBB (letter B 3 times) is NOT OK.
But it doesn't detect that at least 5 different characters are used :(
Nevermind - the answer below seems to work. Only thing is that it appears to allow 4 different characters rather than requiring 5?
I have tweaked it to be:
^(?=.{8,})(?:(?=.\d)(?=.[A-Z])(?=.[a-z])|(?=.\d)(?=.[^A-Za-z0-9\s])(?=.[a-z])|(?=.[^A-Za-z0-9\s])(?=.[A-Z])(?=.[a-z])|(?=.\d)(?=.[A-Z])(?=.[^A-Za-z0-9\s]))(?=(.)(?>.?(?!\1})(.))(?>.?(?!\1}|\2)(.))(?>.?(?!\1|\2|\3)(.))(?>.?(?!\1|\2|\3|\4)(.))(?>.?(?!\1|\2|\3|\4|\5).))(?!.?\d{4})(?!.?[a-z]{4})(?!.?[A-Z]{4})(?!.*?[^A-Za-z0-9\s]{4})
Here's hoping we never have to touch it again ;) With more time if this crops up again I'll push the code option I think :)
Edit: Discovered that the string isn't quite right. It's not passing "!tt23yyy" without having to add another digit or special character. So have canned the regex idea and am going with the code option. It's just too hard to debug regex issues if you don't comprehend regex :) (understandably so)
Here is a PCRE/Perl regex that would do all that:
/
^ # anchor it
# must be at least 8 characters
(?=.{8,})
# must have characters from at least 3 of the 4 different character types
(?:
(?=.*\d)(?=.*[A-Z])(?=.*[a-z])
| (?=.*\d)(?=.*[^A-Za-z0-9\s])(?=.*[a-z])
| (?=.*[^A-Za-z0-9\s])(?=.*[A-Z])(?=.*[a-z])
| (?=.*\d)(?=.*[A-Z])(?=.*[^A-Za-z0-9\s])
)
# at least 5 different chars
(?=
(.)
(?>.*?(?!\1}) (.))
(?>.*?(?!\1}|\2) (.))
(?>.*?(?!\1|\2|\3) (.))
(?>.*?(?!\1|\2|\3|\4) . )
)
# no long sequence of the same character type (if long is 3)
(?!.*?\d{3})
(?!.*?[a-z]{3})
(?!.*?[A-Z]{3})
(?!.*?[^A-Za-z0-9\s]{3})
/xs
Not tested so could have missed something. Enjoy. ;-)
If you are really going to be using that (on longer strings), you might want to add some (more) atomic grouping (?>foo) (or the like) to prevent exponential backtracking.
Related
We currently have a content compliance in place where by we monitor anything that contains a credit card number with no spaces (e.g 5100080000000000)
What we need is for a reg ex to pick up credit card numbers that are entered with spaces every 4 digits (eg: 5100 0800 0000 0000)
We've been looking at alternate reg exs but have not yet found one that works for both scenarios mentioned above.
The current reg ex we use is below
^((4\d{3})|(5[1-5]\d{2})|(6011)|(34\d{1})|(37\d{1}))-?\d{4}-?\d{4}-?\d{4}|3[4,7][\d\s-]{15}$
Just add optional /s? in where you already have the optional -?
So your regex becomes
^((4\d{3})|(5[1-5]\d{2})|(6011)|(34\d{1})|(37\d{1}))-?\s?\d{4}-?\s?\d{4}-?\s?\d{4}|3[4,7][\d\s-]{15}$
It seems that you already accept a dash every four characters. Thus you can simply replace -? with [- ]? everywhere.
If you require the dashes or spaces to be consistent - that is, allow no grouping at all, or a dash every four characters, or a space every four characters, you can use a back reference to force the repetitions to be identical to the first match:
^(?:4\d{3}|5[1-5]\d{2}|6011|3[47]\d{2})([- ]?)\d{4}\1\d{4}\1\d{4}$
You will notice I removed the final 3[4,7]... which looked like an erroneous addition, apparently made when attempting to solve this problem partially. Also I changed the parentheses to non-grouping ones (?:...) or simply removed them where no grouping seemed necessary or useful, mainly because this makes it easier to see what the backreference \1 refers to. Finally, the 34.. and 37.. patterns had \d{1} where apparently \d{2} was intended (or if those particular series are only three digits before the first dash, the repetition {1} was just superfluous, but then the 3[4,7]... part would have been even more wrong!)
Won't all these ideas blow up on you as soon as someone uses and AMEX card and enters 3 or 5 numbers instead of 4 in any one 'block'
((\d+) *(\d+) *(\d+) *(\d+))
That would be the general idea (and it even works!), you can polish it if you want. There is a great page to test your regexp live - http://rubular.com/
Try this:
(\d{4} *\d{4} *\d{4} *\d{4})
I am having a bit of a hard time with a password requirement regular expression for an ASP.NET project
Out requirements are the following
Must be at least 8 characters
Must have at least 3 of the 4 following:
Have at least 1 UPPERCASE letter
Have at least 1 lowercase letter
Have at least 1 special character
Have at least 1 number
The regular expression I am using is as follows (this is escaped and encoded for use in the web.config xml file:
passwordStrengthRegularExpression="^.*(?=.{8,})(?=.*[a-zA-Z])(?=.*\d)(?=.*[!##$%^&*()\?\+\,\-\.\/\:\:\;\<\=\>\[\]\\_\`\{\|\}\~\"\']).*$"
I cant figure out how to allow for one of the requirements to be optional.
the password Reaction7 should be sufficient, but it is rejected because it doesn't have a special character.
Anyone know what I can do to evaluate the 3 out of 4 requirements other than length?
Not sure I like this solution, but if you're limited to using only a single regex (which looks like the case), you could enumerate all possibilities with a pipe-or group:
passwordStrengthRegularExpression="^.*(?=.{8,})((?=.*[A-Z])(?=.*\d)(?=.*[!##$%^&*()\?\+\,\-\.\/\:\:\;\<\=\>\[\]\\_\`\{\|\}\~\"\'])|(?=.*[a-z])(?=.*\d)(?=.*[!##$%^&*()\?\+\,\-\.\/\:\:\;\<\=\>\[\]\\_\`\{\|\}\~\"\'])|(?=.*[a-z])(?=.*[A-Z])(?=.*[!##$%^&*()\?\+\,\-\.\/\:\:\;\<\=\>\[\]\\_\`\{\|\}\~\"\'])|(?=.*[a-z])(?=.*[A-Z])(?=.*\d)).*$"
It is rather long but does get the job done. Adding a fifth requirement will make this string explode in size though, so it's not exactly "extendable".
I am using this regex to validate my password.
My password -
should be alphanumeric ONLY,
contains at least 8 characters,
at least 2 numbers
and at least 2 alphabet.
My regex is
^.*(?=.{8,})(?=.*\d*\d)(?=.*[a-zA-Z]*[a-zA-Z])(?!.*\W).*$
but unfortunately it still matches if I try to put special characters at the beginning.
For example #password12, !password12.
Because your pattern begins and ends with .*, it will match anything at the beginning or end of the string, including special characters.
You shouldn't be solving this problem with a single regular expression, it makes the code hard to read and hard to modify. Write one function for each rule using whatever makes sense for that rule, then your validation script becomes crystal clear:
if is_alpha_only(password) &&
len(password) > = 8 &&
has_2_or_more_numbers(password) &&
has_2_or_more_alpha(password) ...
Seriously, what's the point of cramming all of that into a single regular expression?
And why disallow special characters? There's simply no reason for that.
You can use the following regex in case insensitive mode:
^(?=[a-z]*[0-9][a-z]*[0-9])^(?=[0-9]*[a-z][0-9]*[a-z])[a-z0-9]{8,}$
See it
I had a similar situation in which the client needed 4 alpha, 1 number, and between 8 and 20 characters. I've adapted my solution to your problem:
^(?=(?:[a-zA-Z0-9]*[a-zA-Z]){2})(?=(?:[a-zA-Z0-9]*\d){2})[a-zA-Z0-9]{8,}$
I understand the other answers dissuading you from this route, but sometimes the client wants what the client wants, regardless of your arguments to the contrary.
One of my homework questions asked to develop a regex for all strings over x,y,z that did not contain xxx
After doing some reading I found out about negative lookahead and made this which works great:
(x(?!xx)|y|z)*
Still, in the spirit of completeness, is there anyway to write this without negative lookahead?
Reading I have done makes me think it can be done with some combination of carets (^), but I cannot get the right combination so I am not sure.
Taking it a step further, is it possible to exclude a string like xxx using only the or (|) operator, but still check the strings in a recursive fashion?
EDIT 9/6/2010:
Think I answered my own question. I messed with this some more, trying make this regex with only or (|) statements and I am pretty sure I figured it out... and it isn't nearly as messy as I thought it would be. If someone else has time to verify this with a human eye I would appreciate it.
(xxy|xxz|xy|xz|y|z)*(xxy|xxz|xx|xy|xz|x|y|z)
Try this:
^(x{0,2}(y|z|$))*$
The basic idea is this: for match at most 2 X's, followed by another letter or the end of the string.
When you reach a point where you have 3 X's, the regex has no rule that allows it to keep matching, and it fails.
Working example: http://rubular.com/r/ePH0fHlZxL
A less compact way to write the same is (with free spaces, usually the /x flag):
^(
y| # y is ok
z| # so is z
x(y|z|$)| # a single x, not followed by x
xx(y|z|$) # 2 x's, not followed by x
)*$
Based on the latest edit, here's an ever flatter version of the pattern: I'm not entirely sure I understand your fascination with the pipe, but you can eliminate some more options - by allowing an empty match on the second group you don't need to repeat permutations from the first group. That regex also allows ε, which I think is included in your language.
^(xxy|xxz|xy|xz|y|z)*(xx|x|)$
Basically you have the right answer already - well done you. :)
Carat (^) in a set [^abc] will only match where it does not find a character in that set so it's application for matching orders of characters (i.e. strings) is limited and weak.
Regex has numeric quantifiers {n} and {a,b} which allow you to match a defined number of repititions of a pattern, which would work for this specific pattern (because it's 'x' repeated) but it's not particularily expressive of the problem you're trying to solve (even for regex!) and is a bit brittle (it wouldn't be appropriate for negative match 'xyx' for example.
An or pattern again would be verbose and rather unexpressive but it could be done as the fragment:
(x|xx)[^x] // x OR xx followed by NOT x
Obviously you can do this with an iterative algorithm but that's highly inefficient compared to a regex.
Well done for thinking beyond the solution though.
I know you don't want to use lookahead, but here's another way to solve this:
^(?:(?!xxx)[xyz])*$
will match any line of characters x, y or z as long as it doesn't contain the string xxx.
I have a very basic regular expression that I just can't figure out why it's not working so the question is two parts. Why does my current version not work and what is the correct expression.
Rules are pretty simple:
Must have minimum 3 characters.
If a % character is the first character must be a minimum of 4 characters.
So the following cases should work out as follows:
AB - fail
ABC - pass
ABCDEFG - pass
% - fail
%AB - fail
%ABC - pass
%ABCDEFG - pass
%%AB - pass
The expression I am using is:
^%?\S{3}
Which to me means:
^ - Start of string
%? - Greedy check for 0 or 1 % character
\S{3} - 3 other characters that are not white space
The problem is, the %? for some reason is not doing a greedy check. It's not eating the % character if it exists so the '%AB' case is passing which I think should be failing. Why is the %? not eating the % character?
Someone please show me the light :)
Edit: The answer I used was Dav below: ^(%\S{3}|[^%\s]\S{2})
Although it was a 2 part answer and Alan's really made me understand why. I didn't use his version of ^(?>%?)\S{3} because it worked but not in the javascript implementation. Both great answers and a lot of help.
The word for the behavior you described isn't greedy, it's possessive. Normal, greedy quantifiers match as much as they can originally, but back off if necessary to allow the whole regex to match (I like to think of them as greedy but accommodating). That's what's happening to you: the %? originally matches the leading percent sign, but if there aren't enough characters left for an overall match, it gives up the percent sign and lets \S{3} match it instead.
Some regex flavors (including Java and PHP) support possessive quantifiers, which never back off, even if that causes the overall match to fail. .NET doesn't have those, but it has the next best thing: atomic groups. Whatever you put inside an atomic group acts like a separate regex--it either matches at the position where it's applied or it doesn't, but it never goes back and tries to match more or less than it originally did just because the rest of the regex is failing (that is, the regex engine never backtracks into the atomic group). Here's how you would use it for your problem:
^(?>%?)\S{3}
If the string starts with a percent sign, the (?>%?) matches it, and if there aren't enough characters left for \S{3} to match, the regex fails.
Note that atomic groups (or possessive quantifiers) are not necessary to solve this problem, as #Dav demonstrated. But they're very powerful tools which can easily make the difference between impossible and possible, or too damn slow and slick as can be.
Regex will always try to match the whole pattern if it can - "greedy" doesn't mean "will always grab the character if it exists", but instead means "will always grab the character if it exists and a match can be made with it grabbed".
Instead, what you probably want is something like this:
^(%\S{3}|[^%\s]\S{2})
Which will match either a % followed by 3 characters, or a non-%, non-whitespace followed by 2 more.
I always love to look at RE questions to see how much time people spend on them to "Save time"
str.len() >= str[0]=='&' ? 4 : 3
Although in real life I'd be more explicit, I just wrote it that way because for some reason some people consider code brevity an advantage (I'd call it an anti-advantage, but that's not a popular opinion right now)
Try the regex modified a little based on Dav's original one:
^(%\S{3,}|[^%\s]\S{2,})
with the regex option "^ and $ match at line breaks" on.