Why does MVC validation not work for "<" and ">"? - regex

I have this MVC application, and I want to be able to allow a user to be able to enter a username that is 6 to 255 characters long, including special characters that I deem fit. I have a simple regex for this:
[RegularExpression(#"^([a-zA-Z0-9!\##\$%\^&\(\)-_\+\.'`~/=\?\{\}\|]){6,255}$", ErrorMessageResourceType = typeof(AdminResource), ErrorMessageResourceName = "UserNameFormatError")]
The validation works to a certain extent. It will not let you enter in a username shorter than 6 characters, and it will not let you enter one longer than 255, and it will also let you use all of the special characters I have listed. Interestingly though, it will also let you use "<" and ">", which I don't want to let it use, because then you start getting some errors on the backend because security stuff thinks you are trying to inject malicious code or w/e. That's beside the point, how come the validation allows use of those when they are not included in the regex?

The dash seems to be the culprit. Except at the beginning of the group, it would denote a range. So you are allowing everything between ) and _. You can escape or move it.

Related

Regex password validation needs just one more adjustment

I have an expression that is close to what I need it's just missing my "no adjacent number" rule
^.(.).\1.*$
abcdef1 is allowed
abcdef1g2 is allowed
abcdef12 is NOT allowed (but my current expression allows this)
The password rules are:
Cannot have adjacent numbers
The same number cannot be repeated anywhere in the password
No repeating characters anywhere in the password
[edit]I am not sure what language it is using - I can tell you I am testing it with what looks like JavaScript (http://gskinner.com/RegExr/). I am using it in a windows application (Tools4Ever - E-SSOM) that is for Single Sign on
You can confirm that this does not match:
\d\d|(.).*(\1)
It may be better/easier to not use regex to do this validation though, as checking a unique character list is pretty easy to do. I'm also of the philosophy that you shouldn't put restrictions on what users want for their passwords.

Regex for URL routing - match alphanumeric and dashes except words in this list

I'm using CodeIgniter to write an app where a user will be allowed to register an account and is assigned a URL (URL slug) of their choosing (ex. domain.com/user-name). CodeIgniter has a URL routing feature that allows the utilization of regular expressions (link).
User's are only allowed to register URL's that contain alphanumeric characters, dashes (-), and under scores (_). This is the regex I'm using to verify the validity of the URL slug: ^[A-Za-z0-9][A-Za-z0-9_-]{2,254}$
I am using the url routing feature to route a few url's to features on my site (ex. /home -> /pages/index, /activity -> /user/activity) so those particular URL's obviously cannot be registered by a user.
I'm largely inexperienced with regular expressions but have attempted to write an expression that would match any URL slugs with alphanumerics/dash/underscore except if they are any of the following:
default_controller
404_override
home
activity
Here is the code I'm using to try to match the words with that specific criteria:
$route['(?!default_controller|404_override|home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254}'] = 'view/slug/$1';
but it isn't routing properly. Can someone help? (side question: is it necessary to have ^ or $ in the regex when trying to match with URL's?)
Alright, let's pick this apart.
Ignore CodeIgniter's reserved routes.
The default_controller and 404_override portions of your route are unnecessary. Routes are compared to the requested URI to see if there's a match. It is highly unlikely that those two items will ever be in your URI, since they are special reserved routes for CodeIgniter. So let's forget about them.
$route['(?!home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254}'] = 'view/slug/$1';
Capture everything!
With regular expressions, a group is created using parentheses (). This group can then be retrieved with a back reference - in our case, the $1, $2, etc. located in the second part of the route. You only had a group around the first set of items you were trying to exclude, so it would not properly capture the entire wild card. You found this out yourself already, and added a group around the entire item (good!).
$route['((?!home|activity)[A-Za-z0-9][A-Za-z0-9_-]{2,254})'] = 'view/slug/$1';
Look-ahead?!
On that subject, the first group around home|activity is not actually a traditional group, due to the use of ?! at the beginning. This is called a negative look-ahead, and it's a complicated regular expression feature. And it's being used incorrectly:
Negative lookahead is indispensable if you want to match something not followed by something else.
There's a LOT more I could go into with this, but basically we don't really want or need it in the first place, so I'll let you explore if you'd like.
In order to make your life easier, I'd suggest separating the home, activity, and other existing controllers in the routes. CodeIgniter will look through the list of routes from top to bottom, and once something matches, it stops checking. So if you specify your existing controllers before the wild card, they will match, and your wild card regular expression can be greatly simplified.
$route['home'] = 'pages';
$route['activity'] = 'user/activity';
$route['([A-Za-z0-9][A-Za-z0-9_-]{2,254})'] = 'view/slug/$1';
Remember to list your routes in order from most specific to least. Wild card matches are less specific than exact matches (like home and activity), so they should come after (below).
Now, that's all the complicated stuff. A little more FYI.
Remember that dashes - have a special meaning when in between [] brackets. You should escape them if you want to match a literal dash.
$route['([A-Za-z0-9][A-Za-z0-9_\-]{2,254})'] = 'view/slug/$1';
Note that your character repetition min/max {2,254} only applies to the second set of characters, so your user names must be 3 characters at minimum, and 255 at maximum. Just an FYI if you didn't realize that already.
I saw your own answer to this problem, and it's just ugly. Sorry. The ^ and $ symbols are used improperly throughout the lookahead (which still shouldn't be there in the first place). It may "work" for a few use cases that you're testing it with, but it will just give you problems and headaches in the future.
Hopefully now you know more about regular expressions and how they're matched in the routing process.
And to answer your question, no, you should not use ^ and $ at the beginning and end of your regex -- CodeIgniter will add that for you.
Use the 404, Luke...
At this point your routes are improved and should be functional. I will throw it out there, though, that you might want to consider using the controller/method defined as the 404_override to handle your wild cards. The main benefit of this is that you don't need ANY routes to direct a wild card, or to prevent your wild card from goofing up existing controllers. You only need:
$route['404_override'] = 'view/slug';
Then, your View::slug() method would check the URI, and see if it's a valid pattern, then check if it exists as a user (same as your slug method does now, no doubt). If it does, then you're good to go. If it doesn't, then you throw a 404 error.
It may not seem that graceful, but it works great. Give it a shot if it sounds better for you.
I'm not familiar with codeIgniter specifically, but most frameworks routing operate based on precedence. In other words, the default controller, 404, etc routes should be defined first. Then you can simplify your regex to only match the slugs.
Ok answering my own question
I've seem to come up with a different expression that works:
$route['(^(?!default_controller$|404_override$|home$|activity$)[A-Za-z0-9][A-Za-z0-9_-]{2,254}$)'] = 'view/slug/$1';
I added parenthesis around the whole expression (I think that's what CodeIgniter matches with $1 on the right) and added a start of line identifier: ^ and a bunch of end of line identifiers: $
Hope this helps someone who may run into this problem later.

ReqEx expression for form validation

I am trying to add form validation to my html site in order to prevent xss injection attacks.
I am using a simple java form validator genvalidator_v4.js that allows me to use regex expressions to determine what is allowed in a text box. I am trying to write one that would prevent "<" or ">" or any other tags that could be used in this kind of attack, but still allow alphanumeric, punctuation, and other special characters.
Any ideas? Also open to other methods of preventing xss attacks but I am very inexperienced in this area so please keep it as simple as possible.
You are trying to blacklist dangerous input. That's very tricky, it's very easy to get it wrong because of the sheer number of tokens that could be dangerous.
Thus, the following two practices are recommended instead:
Escape everything read from the database before outputting it on a web page. If you correctly HtmlEncode everything (your language of choice surely has a library method for that), it doesn't matter if a user entered <script>/* do something evil */</script> and that code got stored in your database. Correctly encoded, this will just be printed verbatim and do no harm.
If you still want to filter input (which might be useful as an additional layer of security), whitelists are generally safer than blacklists. So, instead of saying that < is harmful, you say that letters, digits, punctuation, etc. are safe. What exactly is safe depends on what type of field you are filtering.

Quick regexp question for Railo

I guess the following line of code looks familiar to many...
[^A-Za-z0-9]
And what I'd like to do is to keep a block of "text", alphnumeric as stated above and plus punctuations and other special characters that sql for MS Access can handle, also, the special character of # sign would be replaced with ## (a double of it to escape a single #) for the underlying scripting language I'm using (Railo). So, to sum up, I'd like to remove any character that Access and Railo can't handle prior to writing the string into a db table.
The above alphnumeric is a start. Can you help to make it complete?
Thanks.
Railo can handle every character that I'm aware of. Not sure what the author of the question is implying. The pound character is the only character that I'm aware of that needs to be escaped. If the user used the proper <cfqueryparam> correctly, he should be able to insert just about anything he wants in Access.

negative look ahead to exclude html tags

I'm trying to come up with a validation expression to prevent users from entering html or javascript tags into a comment box on a web page.
The following works fine for a single line of text:
^(?!.*(<|>)).*$
..but it won't allow any newline characters because of the dot(.). If I go with something like this:
^(?!.*(<|>))(.|\s)*$
it will allow multiple lines but the expression only matches '<' and '>' on the first line. I need it to match any line.
This works fine:
^[-_\s\d\w"'\.,:;#/&\$\%\?!#\+\*\\(\)]{0,4000}$
but it's ugly and I'm concerned that it's going to break for some users because it's a multi-lingual application.
Any ideas? Thanks!
Note that your RE prevents users from entering < and >, in any context. "2 > 1", for example. This is very undesirable.
Rather than trying to use regular expressions to match HTML (which they aren't well suited to do), simply escape < and > by transforming them to < and >. Alternatively, find a package for your language-of-choice that implements whitelisting to allow a limited subset of HTML, or that supports its own markup language (I hear markdown is nice).
As for "." not matching newline characters, some regexp implementations support a flag (usually "m" for "multi-line" and "s" for "single line"; the latter causes "." to match newlines) to control this behavior.
The first two are basically equivalent to /^[^<>]*$/, except this one works on multiline strings. Any reason why you didn't write the RE that way?
So, I looked into it and there is a .Net 'SingleLine' option for regular expressions that causes "." to also match on the new line character. Unfortunately, this isn't available in the ASP.Net RegularExpressionValidator. As far as I can see, there's no way to make something like ^(?!.(<\w+>)).$ work on a multi-line textbox without doing server-side validation.
I took your advice and went the route of escaping the tags on the server side. This requires setting the validation page directive to 'false' but in this particular instance that isn't a big deal because the comment box is really the only thing to worry about.