Regex password validation needs just one more adjustment - regex

I have an expression that is close to what I need it's just missing my "no adjacent number" rule
^.(.).\1.*$
abcdef1 is allowed
abcdef1g2 is allowed
abcdef12 is NOT allowed (but my current expression allows this)
The password rules are:
Cannot have adjacent numbers
The same number cannot be repeated anywhere in the password
No repeating characters anywhere in the password
[edit]I am not sure what language it is using - I can tell you I am testing it with what looks like JavaScript (http://gskinner.com/RegExr/). I am using it in a windows application (Tools4Ever - E-SSOM) that is for Single Sign on

You can confirm that this does not match:
\d\d|(.).*(\1)
It may be better/easier to not use regex to do this validation though, as checking a unique character list is pretty easy to do. I'm also of the philosophy that you shouldn't put restrictions on what users want for their passwords.

Related

RegEx equivalent for C# data annotation [DataType(DataType.Password)]

I have an iOS native login that works with a custom API for a site with .Net's Identity.
I need a regEx expression (for setting the password when signing up) that matches the requirements for the data annotation [DataType(DataType.Password)] in C#.
Does anyone know where to look?
DataType.Password doens't trigger any specific (regex) validations. If you have an Html.EditorFor a password type inputfield it will generate a html that contains the ***** (hidden inputs)
Otherwise password strength is validated by the membership provider (or what are you using to store your user). And even then it often can't easily be captured in a regex since it contains requirements as
- at least 1 digit
- at least 1 lower & uppercase letter
- at least 6 characters long.
- etc
those kind of requirements often turn into very nasty regex expressions
([a-z]+[A-Z]+[a-zA-Z])|([A-Z]+[a-z]+[a-zA-Z]) ....
it becomes easier if you split each requirement in it's own regular expression.

Why does MVC validation not work for "<" and ">"?

I have this MVC application, and I want to be able to allow a user to be able to enter a username that is 6 to 255 characters long, including special characters that I deem fit. I have a simple regex for this:
[RegularExpression(#"^([a-zA-Z0-9!\##\$%\^&\(\)-_\+\.'`~/=\?\{\}\|]){6,255}$", ErrorMessageResourceType = typeof(AdminResource), ErrorMessageResourceName = "UserNameFormatError")]
The validation works to a certain extent. It will not let you enter in a username shorter than 6 characters, and it will not let you enter one longer than 255, and it will also let you use all of the special characters I have listed. Interestingly though, it will also let you use "<" and ">", which I don't want to let it use, because then you start getting some errors on the backend because security stuff thinks you are trying to inject malicious code or w/e. That's beside the point, how come the validation allows use of those when they are not included in the regex?
The dash seems to be the culprit. Except at the beginning of the group, it would denote a range. So you are allowing everything between ) and _. You can escape or move it.

Regular expression - for email spam filtering, match email address variants other than the original

I am a email spam quarantine administrator and I can write regular expression rules to block email messages. There is a common classification of email spam hitting our domain such that the username of any of our email addresses is spoofed in front of some other domain.
For example, suppose my email address is jwclark#domain.com. In that case, spammers are writing to me from all kinds of other domains that start with my username such as:
jwclark1234#whatever.com
jwclark#wrongdomain.com
jwclark#a.domain.com
How can I write a regular expression rule to match everything including jwclark and any wildcards, but not match the original jwclark#domain.com? I would like a regex that matches everything above except for my actual example email address jwclark#domain.com.
I've made this regexp here
^jwclark.*[#](?!domain\.com).*$
it's in javascript format, but it should be easy to adapt to php or something else.
Given the nature of your problem, you might be better off making a regex builder function that makes the proper regexp for you, given the parameters.
Or, actually use a different approach. I recently found out how to parse ranges of floating point numbers with regexp, but that doesn't make it the proper solution to finding numbers within ranges. :P
edit - fixed silly redundancy thanks to zx81
edit - change to comply with strange limitations:
^jwclark.{0,25}[#][^d][^o][^m][^a][^i][^n].{0,25}\.com.{0,25}$
demo for the strange one

HTML5 Input Pattern vs. Non-Latin Letters

I want to make pre-validation of some input form with new HTML5 pattern attirbute. My dataset is "Domain Name", so <input type="url"> regex preset isn't applied.
But there is a problem, I wont use A-Za-z , because of damned IDN's (Internationalized domain name).
So question: is there any way to use <input pattern=""> for random non-english letters validation ?
I tried \w ofcource but it works only for latin...
Maybe someone has a set of some \xNN-\xNN which guarantees entering of ALL unicode alpha characters, or some another way?
edit: "This question may already have an answer here:" - no, there is no answer.
Based on my testing, HTML5 pattern attributes supports Unicode character code points in the exact same way that JavaScript does and does not:
It only supports \u notation for unicode code points so \u00a1 will match 'ยก'.
Because these define characters, you can use them in character ranges like [\u00a1-\uffff]
. will match Unicode characters as well.
You don't really specify how you want to pre-validate so I can't really help you more than that, but by looking up the unicode character values, you should be able to work out what you need in your regex.
Keep in mind that the pattern regex execution is rather dumb overall and isn't universally supported. I recommend progressive enhancement with some javascript on top of the pattern value (you can even re-use the regex more or less).
As always, never trust user input - It doesn't take a genius to make a request to your form endpoint and pass more or less whatever data they like. Your server-side validation should necessarily be more explicit. Your client-side validation can be more generous, depending upon whether false positives or false negatives are more problematic to your use case.
I know this isn't what you want to hear, but...
The HTML5 pattern attribute isn't really for the programmer so much as it's for the user. So, considering the unfortunate limitations of pattern, you are best off providing a "loose" pattern--one that doesn't give false negatives but allows for a few false positives. When I've run into this problem, I found that the best thing to do was a pattern consisting of a blacklist + a couple minimum requirements. Hopefully, that can be done in your case.

How to reject names (people and companies) using whitelists with C# regex's?

I've run into a few problems using a C# regex to implement a whitelist of allowed characters on web inputs. I am trying to avoid SQL injection and XSS attacks. I've read that whitelists of the allowable characters are the way to go.
The inputs are people names and company names.
Some of the problems are:
Company names that have ampersands. Like "Jim & Sons". The ampersand is important, but it is risky.
Unicode characters in names (we have asian customers for example), that enter their names using their character sets. I need to whitelist all these.
Company names can have all kinds of slashes, like "S/A" and "S\A". Are those risky?
I find myself wanting to allow almost every character after seeing all the data that is in the DB already (and being entered by new users).
Any suggestions for a good whitelist that will handle these (and other) issues?
NOTE: It's a legacy system, so I don't have control of all the code. I was hoping to reduce the number of attacks by preventing bad data from getting into the system in the first place.
This SO thread has a lot of good discussion on protecting yourself from injection attacks.
In short:
Filter your input as best as you can
Escape your strings using framework based methods
Parameterize your sql statements
In your case, you can limit the name field to a small character set. The company field will be more difficult, and you need to consider and balance your users need for freedom of entry with your need for site security. As others have said, trying to write your own custom sanitation methods is tricky and risky. Keep it simple and protect yourself through your architecture - don't simply rely on strings being "safe", even after sanitization.
EDIT:
To clarify - if you're trying to develop a whitelist, it's not something that the community can hand out, since it's entirely dependent on the data you want. But let's look at a example of a regex whitelist, perhaps for names. Say I've whitelisted A-Z and a-z and space.
Regex reWhiteList = new Regex("^[A-Za-z ]+$")
That checks to see if the entire string is composed of those characters. Note that a string with a number, a period, a quote, or anything else would NOT match this regex and thus would fail the whitelist.
if (reWhiteList.IsMatch(strInput))
// it's ok, proceed to step 2
else
// it's not ok, inform user they've entered invalid characters and try again
Hopefully this helps some more! With names and company names you'll have a tough-to-impossible time developing a rigorous pattern to check against, but you can do a simple allowable character list, as I showed here.
Do not try to sanitize names, especially with regex!
Just make sure that you are properly escaping the values and saving them safely in your DB, and them escaping them back when presenting in HTML
Company names might have almost any kind of symbol in them, so I don't know how well this is going to work for you. I'd concentrate on shielding yourself directly from various attacks, not hoping that your strings are "naturally" safe.
(Certainly they can have ampersands, colons, semicolons, exclamation points, hyphens, percent signs, and all kinds of other things that could be "unsafe" in a host of contexts.)
Why filter or regex the data at all, or even escape it, you should be using bind variables to access the database.
This way, the customer could enter something like: anything' OR 'x'='x
And your application doesn't care because your SQL code doesn't parse the variable because it's not set when you prepare the statement. I.e.
'SELECT count(username) FROM usertable WHERE username = ? and password = ?'
then you execute that code with those variables set.
This works in PHP, PERL, J2EE applications, and so on.
I think writing your own regexp is not a good idea: it would be very hard. Try leveraging existing functions of your web framework, there is lots of resources on the net. If you say C#, I assume you are using ASP.NET, try the following article:
How To: Protect From Injection Attacks in ASP.NET
This is my current regex WHITELIST for a company name. Any input outside of these characters is rejected:
"^[0-9\p{L} '\-\.,\/\&]{0,50}$"
The \p{L} matches any unicode "letter". So, the accents and asian characters are whitelisted.
The \& is a bit problematic because it potentially allows javascript special characters.
The \' is problematic if not using parameterized queries, because of SQL injection.
The \- could allow "--", also a potential for SQL injection if not using parameterized queries.
Also, the \p{L} won't work client-side, so you can't use it in the ASP.NET regular expression validator without disabling clientside validation:
EnableClientScript="False"