Regular Expression for strong password in MVC model - regex

I need to create a regular expression to validate a strong password in my MVC Model. Here are the rules that I need to apply:
Min 7 characters
Max 15 characters
At least 3 out of 4 different types of characters.
Numbers
Lowercase
Uppercase
Special (viz. !##$%&/=?_.)
Here's what I have tried so far:
[DataType(DataType.Password)]
[RegularExpression("([a-z]|[A-Z]|[0-9]|[\\W]){4}[a-zA-Z0-9\\W]{3,11}", ErrorMessage = "Invalid password format")]
public string Password { get; set; }

Creating a regex that will find any number of different kinds of character classes in any order is a little challenging because as soon as you match a character, it's already captured and you can't back it up. However, the .NET regex engine supports lookahead expressions. Therefore, you can check to make sure that a string contains something without actually capturing any of the string. For instance, let's say that you want to find any 10 character long string that contains at least one instance of the letter "J". You could do that easily with a lookahead expression, like this:
(?=.*J).{10}
The (?=) construct declares a lookahead pattern. The pattern it looks for is .*J, which means that, starting at the current position, there can be any number of any character followed by a letter "J". If anything comes after the J, that's fine, it will still match. However, since it's a lookahead, none of those characters were actually captured, so then the .{10} capturing part of the pattern picks up from the original position and matches from there. Since the lookaheads don't move the position at all, you can put multiple of them in a row without consequence, so you can do something like this:
^(?=.*[A-Z])(?=.*\d)(?=.*[a-z])(?=.*\W).{7,15}$
As far as applying only three of the four character class rules, the only way I can think of to do that would be to list all of the combinations (e.g. for two of the three rules A, B, and C, you could match AB|AC|BC) . So for instance, if you were only concerned with two of the three (uppercase, lowercase, and digits, for instance), you could structure your lookaheads like this:
(?:(?=.*[A-Z])(?=.*\d)|(?=.*[A-Z])(?=.*[a-z])|(?=.*\d)(?=.*[a-z]))
Making it support there out of four would just be a matter of making the list of options even longer...

A development of Steven's answer would be to test for all possible combinations of the types:
^((?=.*[A-Z])(?=.*\d)(?=.*[a-z])|(?=.*[A-Z])(?=.*\d)(?=.*[!##$%&\/=?_.-])|(?=.*[A-Z])(?=.*[a-z])(?=.*[!##$%&\/=?_.-])|(?=.*\d)(?=.*[a-z])(?=.*[!##$%&\/=?_.-])).{7,15}$
Also using the given special characters.
In a more readable form
^((?=.*[A-Z])(?=.*\d)(?=.*[a-z])|
(?=.*[A-Z])(?=.*\d)(?=.*[!##$%&\/=?_.-])|
(?=.*[A-Z])(?=.*[a-z])(?=.*[!##$%&\/=?_.-])|
(?=.*\d)(?=.*[a-z])(?=.*[!##$%&\/=?_.-])).{7,15}$
Test for either of the possible four combinations of types and make sure there are 7-14 of them.
Regards

Related

Regular expression to check strings containing a set of words separated by a delimiter

As the title says, I'm trying to build up a regular expression that can recognize strings with this format:
word!!cat!!DOG!! ... Phone!!home!!
where !! is used as a delimiter. Each word must have a length between 1 and 5 characters. Empty words are not allowed, i.e. no strings like !!,!!!! etc.
A word can only contain alphabetical characters between a and z (case insensitive). After each word I expect to find the special delimiter !!.
I came up with the solution below but since I need to add other controls (e.g. words can contain spaces) I would like to know if I'm on the right way.
(([a-zA-Z]{1,5})([!]{2}))+
Also note that empty strings are not allowed, hence the use of +
Help and advices are very welcome since I just started learning how to build regular expressions. I run some tests using http://regexr.com/ and it seems to be okay but I want to be sure. Thank you!
Examples that shouldn't match:
a!!b!!aaaaaa!!
a123!!b!!c!!
aAaa!!bbb
aAaa!!bbb!
Splitting the string and using the values between the !!
It depends on what you want to do with the regular expression. If you want to match the values between the !!, here are two ways:
Matching with groups
([^!]+)!!
[^!]+ requires at least 1 character other than !
!! instead of [!]{2} because it is the same but much more readable
Matching with lookahead
If you only want to match the actual word (and not the two !), you can do this by using a positive lookahead:
[^!]+(?=!!)
(?=) is a positive lookahead. It requires everything inside, i.e. here !!, to be directly after the previous match. It however won't be in the resulting match.
Here is a live example.
Validating the string
If you however want to check the validity of the whole string, then you need something like this:
^([^!]+!!)+$
^ start of the string
$ end of the string
It requires the whole string to contain only ([^!]+!!) one or more than one times.
If [^!] does not fit your requirements, you can of course replace it with [a-zA-Z] or similar.

Which regex could match those values?

I'm developing new specific syntax. Within it there are two kinds of code:
I: = or + or - (one or several plus, minus or equal signs in a row);
Regex for that is /[+=-]+/.
II: 6:+ or 15:- or 999:= (any integer, followed by one plus, minus or equal sign);
Regex for that is /\d+:[+=-]/.
In one entry there may be any amount of any of these tokens.
Each new entry has to be surrounded by brackets: [code here].
Kinds of code in brackets may stand next to each other: [=6:+-] or [15:-++=3:+] etc.
Empty entries are not allowed.
So, I can't make a regex to match proper entries!
I've tried this one /\[([=+-]*(\d+:[=+-])?[=+-]*)\]/, but it matches [] as well, while it is an еггог.
MATCH any of those
[=] [---] [+=-] [=+-] [17:=] [==+-] [6:=-] [+5:=-]
[==-=+] [+=====-] [15:-++=3:+] [=======] [+=-+==-] [---==--] [==-=+==] [=--==--]
NO MATCH
[] [=:1] [:2+] [3-:]
I dont know what flavor of regex but this should work for pretty much all of them:
\[((?:[+=-]+|[+=-]?\d+:[+=-]+)+)\]
Debuggex Demo
It makes use of | or operand, so it either captures one kind of match (the collection of -+= signs or the numbers with colons and such)
Also, it seems that since you want [+5:=-] to match, I added a [+-]? to match for that.
EDIT:
This allows for multiple occurrences of the language. This, however, may be trivial as there is nothing to distinguish between separate parts of code.
OMG, it could be way more simple:
\[(?:(?:\d+:)?[+=-])+\]
I can't believe I was so stupid.

Multiple spaces, multiple commas and multiple hypens in alphanumeric regex

I am very new to regex and regular expressions, and I am stuck in a situation where I want to apply a regex on an JSF input field.
Where
alphanumeric
multiple spaces
multiple dot(.)
multiple hyphen (‐)
are allowed, and Minimum limit is 1 and Maximum limit is 5.
And for multiple values - they must be separated by comma (,)
So a Single value can be:
3kd-R
or
k3
or
-4
And multiple values (must be comma separated):
kdk30,3.K-4,ER--U,2,.I3,
By the help of stackoverflow, so far I am able to achieve only this:
(^[a-zA-Z0-9 ]{5}(,[a-zA-Z0-9 ]{5})*$)
Something like
^[-.a-zA-Z0-9 ]{1,5}(,[-.a-zA-Z0-9 ]{1,5})*$
Changes made
[-.a-zA-Z0-9 ] Added - and . to the character class so that those are matched as well.
{1,5} Quantifier, ensures that it is matched minimum 1 and maximum 5 characters
Regex demo
You've done pretty good. You need to add hyphen and dot to that first character class. Note: With the hyphen, since it delegates ranges within a character class, you need to position it where contextually it cannot be specifying a range--not to say put it where it seems like it would be an invalid range, e.g., 7-., but positionally cannot be a range, i.e., first or last. So your first character class would look something like this:
[a-zA-Z 0-9.-]{1,5} or [-a-zA-Z0-9 .]{1,5}
So, we've just defined what one segment looks like. That pattern can reoccur zero or more times. Of course, there are many ways to do that, but I would favor a regex subroutine because this allows code reuse. Now if the specs change or you're testing and realize you have to tweak that segment pattern, you only need to change it in one place.
Subroutines are not supported in BRE or ERE, but most widely-used modern regex engines support them (Perl, PCRE, Ruby, Delphi, R, PHP). They are very simple to use and understand. Basically, you just need to be able to refer to it (sound familiar? refer-back? back-reference?), so this means we need to capture the regex we wish to repeat. Then it's as simple as referring back to it, but instead of \1 which refers to the captured value (data), we want to refer to it as (?1), the capturing expression. In doing so, we've logically defined a subroutine:
([a-zA-Z 0-9.-]{1,5})(,(?1))*
So, the first group basically defines our subroutine and the second group consists of a comma followed by the same segment-definition expression we used for the first group, and that is optional ('*' is the zero-or-more quantifier).
If you operate on large quantities of data where efficiency is a consideration, don't capture when you don't have to. If your sole purpose for using parenthesis is to alternate (e.g., \b[bB](asset|eagle)\b hound) or to quantify, as in our second group, use the (?: ... ) notation, which signifies to the regex engine that this is a non-capturing group. Without going into great detail, there is a lot of overhead in maintaining the match locations--not that it's complex, per se, just potentially highly repetitive. Regex engines will match, store the information, then when the match fails, they "give up" the match and try again starting with the next matching substring. Each time they match your capture group, they're storing that information again. Okay, I'm off the soapbox now. :-)
So, we're almost there. I say "almost" because I don't have all the information. But if this should be the sole occupant of the "subject" (line, field, etc.--the data sample you're evaluating), you should anchor it to "assert" that requirement. The caret '^' is beginning of subject, and the dollar '$' is end of subject, so by encapsulating our expression in ^ ... $ we are asserting that the subject matches in it's entirety, front-to-back. These assertions have zero-length; they consume no data, only assert a relative position. You can operate on them, e.g., s/^/ / would indent your entire document two spaces. You haven't really substituted the beginning of line with two spaces, but you're able to operate on that imaginary, zero-length location. (Do some research on zero-length assertions [aka zero-width assertions, or look-arounds] to uncover a powerful feature of modern regex. For example, in the previous regex if I wanted to make sure I did not insert two spaces on blank lines: s/^(?!$)/ /)
Also, you didn't say if you need to capture the results to do something with it. My impression was it's validation only, so that's not necessary. However, if it is needed, you can wrap the entire expression in capturing parenthesis: ^( ... )$.
I'm going to provide a final solution that does not assume you need to capture but does assume the entire subject should consist of this value:
^([a-zA-Z 0-9. -]{1,5})(?:,(?1))*$
I know I went on a bit, but you said you were new to regex, so wanted to provide some detail. I hope it wasn't too much detail.
By the way, an excellent resource with tutorials is regular-expressions dot info, and a wonderful regex development and testing tool is regex101 dot com. And I can never say enough about stack overflow!

Regex for string that contains characters in unspecified order

When having strings like
helloworld
worldhello
ollehdlrow
Is there a regex that can match all those cases? So, basically a pattern that will match all strings that contain all characters, in unspecified order.
I tried using
/[helloworld]{10}/
but this doesn't work for obvious reasons, as it will also match eeeeeeeeee.
You definitely don't want to use regular expressions for this.
In order to check if a character exists in the string, in your case, you would have to use a positive lookahead. It would look something like this (?=a) to check for the character a. Thats fine. If we want to check for a string containing the character a and b we can do /^(?=.*a)(?=.*b)/. Problems arise if we want to check for multiple as.
View this example: http://regex101.com/r/iV2jC8
As you can see, the regex has been "told" to look two times for the letter 'a'. However, the first case still matches. This is because the engine does not save the position where it initially found the first 'a', and thus the next assertion finds the very same a. This is the case in all three of the examples. So in reality, none of them are really being validated.
You would have to do something like this: http://regex101.com/r/cR8eR4
Which as you probably can imagine will quickly get out of hand with larger patterns.
I hope this helps, best of luck.

Regex for username that allows numbers, letters and spaces

I'm looking for some regex code that I can use to check for a valid username.
I would like for the username to have letters (both upper case and lower case), numbers, spaces, underscores, dashes and dots, but the username must start and end with either a letter or number.
Ideally, it should also not allow for any of the special characters listed above to be repeated more than once in succession, i.e. they can have as many spaces/dots/dashes/underscores as they want, but there must be at least one number or letter between them.
I'm also interested to find out if you think this is a good system for a username? I've had a look for some regex that could do this, but none of them seem to allow spaces, and I would like for the usernames to have some spaces in them.
Thank you :)
So it looks like you want your username to have a "word" part (sequence of letters or numbers), interspersed with some "separator" part.
The regex will look something like this:
^[a-z0-9]+(?:[ _.-][a-z0-9]+)*$
Here's a schematic breakdown:
_____sep-word…____
/ \
^[a-z0-9]+(?:[ _.-][a-z0-9]+)*$ i.e. "word ( sep word )*"
|\_______/ \____/\_______/ |
| "word" "sep" "word" |
| |
from beginning of string... till the end of string
So essentially we want to match things like word, word-sep-word, word-sep-word-sep-word, etc.
There will be no consecutive sep without a word in between
The first and last char will always be part of a word (i.e. not a sep char)
Note that for [ _.-], - is last so that it's not a range definition metacharacter. The (?:…) is what is called a non-capturing group. We need the brackets for grouping for the repetition (i.e. (…)*), but since we don't need the capture, we can use (?:…)* instead.
To allow uppercase/various Unicode letters etc, just expand the character class/use more flags as necessary.
References
regular-expressions.info/Anchors, Character Class, Repetition, Grouping
Although I'm sure someone will shortly post a 1 million lines regex to do exactly what you want, I don't think in this case a regex is a good solution.
Why don't you write a good old fashioned parser? It will take about as long as writing the regex that does everything you mentioned, but it's going to be much easier to maintain and read.
In particular, this is the tricky part:
it should also not allow for any of
the special characters listed above to
be repeated more than once in
succession
Alternatively you can always do a hybrid of the two. A regex for the other checks ([a-zA-Z0-9][a-zA-Z0-9 _-\.]*[a-zA-Z0-9]) and a non-regex method for the no-repeat requirement.
You don't have to use a regex for everything. I find that requirements like the "no two consecutive characters" usually make the regexes so ugly that it's better to do that bit with a simple procedural loop.
I'd just use something like ^[A-Za-z0-9][A-Za-z0-9 \.\-_]*[A-Za-z0-9]$ (or the equivalents like ::alnum:: if your regex engine is more advanced) and then just check every character in a loop to make sure the next character isn't the same.
By doing it procedurally, you can check all the other rules you're likely to want at some point without resorting to what I call "regex gymnastics", things like:
not allowed to contain your first or last name.
no more than two consecutive digits.
and so forth.