ASP.NET Validation Controls VS C# regular expression Validation - regex

To me regular expression validation seems straight forward and meaningfull rather than validating everything with asp.net validation controls. I am learning asp.net and do not want to memorize all asp.net validation controls, when any form input can be simply validated with reqular expression. Am I thinking right or should I use validation controls?
Example:`RequiredFieldValidator vs Regex Solution C#
if(TextBox1.Text == ""){
Label1.text = "Name Field is required, Please try again";
return;
}
CompareValidator vs Regex Solution
if(Regex.IsMatch(TextBox1.Text, #"^[0-9]")){
if(Convert.ToInt32(TextBox1.Text) > 18){
output.InnerHtml = #"some code";
} else{
Label1.Text = "You should be old enough to express out your political views ";
return;
}
} else{
Label1.Text = "You should be old enough to express out your political views";
return;
}
}
`
Thinking would not be better to do everything in C#, rather than remembring all those validation controls

The major advantage to the validation controls is that in most cases they will output JavaScript validation for the client side that matches the server-side validation. This can reduce round trips to the server which is always a benefit. However, if you're good with JavaScript, you can probably code the client side piece more efficiently than the control would output anyway.
One other thing to consider, when using the control you can turn the validation on/off on both client and server using one flag on the control, if using your own code, you have to handle those separately.

You are right, regular expression validator can replace a lot of other validators, provided you can write a validation expression that works well on client and server side.

You can do a lot of validation work in regular expressions, but there are some areas where regexes are not ideal:
date validation: Either you get a terribly unwieldy regex, or you'll miss lots of plausible but illegal dates (like Feb 29, 2000).
email validation. Same thing here - you either reject some valid addresses, or you allow invalid addresses (and in either case, you'll allow addresses that are syntactically OK but don't correspond to an actual mailbox).
number validation in general - regular expressions are good for matching textual data. Using them to validate numbers is cumbersome and error-prone. Have you thought of exponential notation, locale-dependent decimal separators, thousands separators, leading zeroes, etc...?
Apart from that, the JavaScript regex engine has some limitations (e. g., lack of lookbehind assertions) that you need to know about when trying to write regexes that have to work both on the client and the server side.
And finally, do you realize that there's an error in your example regex? Maybe using a validator is safer unless you really know how to build a regex that does exactly what you intend it to do...

Related

Is there a way to test if my regex is vulnerable to catastrophic backtracking?

There are many questions on this topic, but I'm not sure if my regex is vulnerable or not.
The following regex is what I use for email validation:
/^\w+([\.-]?\w+)*#\w+([\.-]?\w+)*(\.\w{2,3})+$/.test(email)
Because I'm using a * in a few places, I suspect it could be.
I'd like to be able to test any number of occurrences in my code for problems.
I'm using Node.js so this could shut down my server entirely given the single threaded nature of the event loop.
Good question. Yes, given the right input, it's vulnerable and a runaway regex is able to block the entire node process, making the service unavailable.
The basic example of a regex prone to catastrophic backtracking looks like
^(\w+)*$
a pattern which can be found multiple times in the given regex.
When the regex contains optional quantifiers and the input contains long sequences that cannot be matched in the end the JS regex engine has to backtrack a lot and burns CPU. Potentially ad infinitum if the input is long enough. (You can play with this on regex101 as well using your regex by adjusting the timeout value in the settings.)
In general,
avoid monstrosities,
use HTML5 input validation whenever possible (in the front-end),
use established validation libraries for common input, e.g. validator.js,
try to detect potentially catastrophic exponential-time regular expressions ahead of time using tools like safe-regex & vuln-regex-detector (those offer pretty much what you had in mind),
and know your stuff 1, 2, 3 (I think the third link explains the issue best).
More drastic approaches to mitigate catastrophic backtracking in node.js are wrapping your regex efforts in a child process or vm context and set a meaningful timeout. (In a perfect world JavaScript's RegExp constructor would have a timeout param, maybe someday.)
The approach of using a child process is described here on SO.
The VM context (sandboxing) approach is described here.
const Joi = require('#hapi/joi');
function isEmail(emailAsStr) {
const schema = Joi.object({ email: Joi.string().email() });
const result = schema.validate({ email: emailAsStr });
const validated = result.error ? false : true;
if (validated) return true;
return [false, result.error.details[0].message];
}
Here's another way to do it - use a library! :)
I tested it against common catastrophic backtrack regex.
The answer to my original question is to use the npm lib. safe-regex, but I thought I'd share another example of how to resolve this problem w/o regex.

US State Regular expression with case sensitive

I'm using ASP.NET MVC application and model has the following regular expression to validate US states.
This one works fine if user enter all upper case, but not working for lower case/camel case scenarios.
[RegularExpression(#"^((A[ELKSZR])|(C[AOT])|(D[EC])|(F[ML])|(G[AU])|(HI)|(I[DLNA])|(K[SY])|(LA)|(M[EHDAINSOT])|(N[EVHJMYCD])|(MP)|(O[HKR])|(P[WAR])|(RI)|(S[CD])|(T[NX])|(UT)|(V[TIA])|(W[AVIY]))$", ErrorMessage = "Invalid State")]
public string State { get; set; }
I tried this one, but no luck.
// [RegularExpression(#"^(?-i:A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$", ErrorMessage = "Invalid State")]
thank you.
Since this expression can be used for client side validation (and thus requires ECMA regex syntax, that is, JavaScript-compatible regular expression) you cannot use an inline modifier like (?i) let alone the toggled version (?i:...).
You have to double each letter with the lowercase counterpart:
^(([Aa][EeLlKkSsZzRr])|([Cc][AaOoTt])|([Dd][EeCc])|([Ff][MmLl])|([Gg][AaUu])|([Hh][Ii])|([Ii][DdLlNnAa])|([Kk][SsYy])|([Ll][Aa])|([Mm][EeHhDdAaIiNnSsOoTt])|([Nn][EeVvHhJjMmYyCcDd])|([Mm][Pp])|([Oo][HhKkRr])|([Pp][WwAaRr])|([Rr][Ii])|([Ss][CcDd])|([Tt][NnXx])|([Uu][Tt])|([Vv][TtIiAa])|([Ww][AaVvIiYy]))$
See demo
The list above is not as exhaustive - it is missing some military abbreviations. Trust me - you do not want to receive the ire of patriotic families trying to send stuff to their loved ones in the military.
Same technique - I added a few more.
^(([Aa][EeLlKkSsZzRr])|([Cc][AaOoTt])|([Dd][EeCc])|([Ff][MmLl])|([Gg][AaUu])|([Hh][Ii])|([Ii][DdLlNnAa])|([Kk][SsYy])|([Ll][Aa])|([Mm][EeHhDdAaIiNnSsOoTt])|([Nn][EeVvHhJjMmYyCcDd])|([Mm][Pp])|([Oo][HhKkRr])|([Pp][WwAaRr])|([Rr][Ii])|([Ss][CcDd])|([Tt][NnXx])|([Uu][Tt])|([Vv][TtIiAa])|([Ww][AaVvIiYy]))$
I have used
[^,]*[A-Z]{2}
hopefully, it works for you.

How to create Gmail filter searching for text only at start of subject line?

We receive regular automated build messages from Jenkins build servers at work.
It'd be nice to ferret these away into a label, skipping the inbox.
Using a filter is of course the right choice.
The desired identifier is the string [RELEASE] at the beginning of a subject line.
Attempting to specify any of the following regexes causes emails with the string release in any case anywhere in the subject line to be matched:
\[RELEASE\]*
^\[RELEASE\]
^\[RELEASE\]*
^\[RELEASE\].*
From what I've read subsequently, Gmail doesn't have standard regex support, and from experimentation it seems, as with google search, special characters are simply ignored.
I'm therefore looking for a search parameter which can be used, maybe something like atstart:mystring in keeping with their has:, in: notations.
Is there a way to force the match only if it occurs at the start of the line, and only in the case where square brackets are included?
Sincere thanks.
Regex is not on the list of search features, and it was on (more or less, as Better message search functionality (i.e. Wildcard and partial word search)) the list of pre-canned feature requests, so the answer is "you cannot do this via the Gmail web UI" :-(
There are no current Labs features which offer this. SIEVE filters would be another way to do this, that too was not supported, there seems to no longer be any definitive statement on SIEVE support in the Gmail help.
Updated for link rot The pre-canned list of feature requests was, er canned, the original is on archive.org dated 2012, now you just get redirected to a dumbed down page telling you how to give feedback. Lack of SIEVE support was covered in answer 78761 Does Gmail support all IMAP features?, since some time in 2015 that answer silently redirects to the answer about IMAP client configuration, archive.org has a copy dated 2014.
With the current search facility brackets of any form () {} [] are used for grouping, they have no observable effect if there's just one term within. Using (aaa|bbb) and [aaa|bbb] are equivalent and will both find words aaa or bbb. Most other punctuation characters, including \, are treated as a space or a word-separator, + - : and " do have special meaning though, see the help.
As of 2016, only the form "{term1 term2}" is documented for this, and is equivalent to the search "term1 OR term2".
You can do regex searches on your mailbox (within limits) programmatically via Google docs: http://www.labnol.org/internet/advanced-gmail-search/21623/ has source showing how it can be done (copy the document, then Tools > Script Editor to get the complete source).
You could also do this via IMAP as described here:
Python IMAP search for partial subject
and script something to move messages to different folder. The IMAP SEARCH verb only supports substrings, not regex (Gmail search is further limited to complete words, not substrings), further processing of the matches to apply a regex would be needed.
For completeness, one last workaround is: Gmail supports plus addressing, if you can change the destination address to youraddress+jenkinsrelease#gmail.com it will still be sent to your mailbox where you can filter by recipient address. Make sure to filter using the full email address to:youraddress+jenkinsrelease#gmail.com. This is of course more or less the same thing as setting up a dedicated Gmail address for this purpose :-)
Using Google Apps Script, you can use this function to filter email threads by a given regex:
function processInboxEmailSubjects() {
var threads = GmailApp.getInboxThreads();
for (var i = 0; i < threads.length; i++) {
var subject = threads[i].getFirstMessageSubject();
const regex = /^\[RELEASE\]/; //change this to whatever regex you want, this one should cover OP's scenario
let isAtLeast40 = regex.test(subject)
if (isAtLeast40) {
Logger.log(subject);
// Now do what you want to do with the email thread. For example, skip inbox and add an already existing label, like so:
threads[i].moveToArchive().addLabel("customLabel")
}
}
}
As far as I know, unfortunately there isn't a way to trigger this with every new incoming email, so you have to create a time trigger like so (feel free to change it to whatever interval you think best):
function createTrigger(){ //you only need to run this once, then the trigger executes the function every hour in perpetuity
ScriptApp.newTrigger('processInboxEmailSubjects').timeBased().everyHours(1).create();
}
The only option I have found to do this is find some exact wording and put that under the "Has the words" option. Its not the best option, but it works.
I was wondering how to do this myself; it seems Gmail has since silently implemented this feature. I created the following filter:
Matches: subject:([test])
Do this: Skip Inbox
And then I sent a message with the subject
[test] foo
And the message was archived! So it seems all that is necessary is to create a filter for the subject prefix you wish to handle.

Regular expression for validating names and surnames?

Although this seems like a trivial question, I am quite sure it is not :)
I need to validate names and surnames of people from all over the world. Imagine a huge list of miilions of names and surnames where I need to remove as well as possible any cruft I identify. How can I do that with a regular expression? If it were only English ones I think that this would cut it:
^[a-z -']+$
However, I need to support also these cases:
other punctuation symbols as they might be used in different countries (no idea which, but maybe you do!)
different Unicode letter sets (accented letter, greek, japanese, chinese, and so on)
no numbers or symbols or unnecessary punctuation or runes, etc..
titles, middle initials, suffixes are not part of this data
names are already separated by surnames.
we are prepared to force ultra rare names to be simplified (there's a person named '#' in existence, but it doesn't make sense to allow that character everywhere. Use pragmatism and good sense.)
note that many countries have laws about names so there are standards to follow
Is there a standard way of validating these fields I can implement to make sure that our website users have a great experience and can actually use their name when registering in the list?
I would be looking for something similar to the many "email address" regexes that you can find on google.
I sympathize with the need to constrain input in this situation, but I don't believe it is possible - Unicode is vast, expanding, and so is the subset used in names throughout the world.
Unlike email, there's no universally agreed-upon standard for the names people may use, or even which representations they may register as official with their respective governments. I suspect that any regex will eventually fail to pass a name considered valid by someone, somewhere in the world.
Of course, you do need to sanitize or escape input, to avoid the Little Bobby Tables problem. And there may be other constraints on which input you allow as well, such as the underlying systems used to store, render or manipulate names. As such, I recommend that you determine first the restrictions necessitated by the system your validation belongs to, and create a validation expression based on those alone. This may still cause inconvenience in some scenarios, but they should be rare.
I'll try to give a proper answer myself:
The only punctuations that should be allowed in a name are full stop, apostrophe and hyphen. I haven't seen any other case in the list of corner cases.
Regarding numbers, there's only one case with an 8. I think I can safely disallow that.
Regarding letters, any letter is valid.
I also want to include space.
This would sum up to this regex:
^[\p{L} \.'\-]+$
This presents one problem, i.e. the apostrophe can be used as an attack vector. It should be encoded.
So the validation code should be something like this (untested):
var name = nameParam.Trim();
if (!Regex.IsMatch(name, "^[\p{L} \.\-]+$"))
throw new ArgumentException("nameParam");
name = name.Replace("'", "'"); //&apos; does not work in IE
Can anyone think of a reason why a name should not pass this test or a XSS or SQL Injection that could pass?
complete tested solution
using System;
using System.Text.RegularExpressions;
namespace test
{
class MainClass
{
public static void Main(string[] args)
{
var names = new string[]{"Hello World",
"John",
"João",
"タロウ",
"やまだ",
"山田",
"先生",
"мыхаыл",
"Θεοκλεια",
"आकाङ्क्षा",
"علاء الدين",
"אַבְרָהָם",
"മലയാളം",
"상",
"D'Addario",
"John-Doe",
"P.A.M.",
"' --",
"<xss>",
"\""
};
foreach (var nameParam in names)
{
Console.Write(nameParam+" ");
var name = nameParam.Trim();
if (!Regex.IsMatch(name, #"^[\p{L}\p{M}' \.\-]+$"))
{
Console.WriteLine("fail");
continue;
}
name = name.Replace("'", "'");
Console.WriteLine(name);
}
}
}
}
I would just allow everything (except an empty string) and assume the user knows what his name is.
There are 2 common cases:
You care that the name is accurate and are validating against a real paper passport or other identity document, or against a credit card.
You don't care that much and the user will be able to register as "Fred Smith" (or "Jane Doe") anyway.
In case (1), you can allow all characters because you're checking against a paper document.
In case (2), you may as well allow all characters because "123 456" is really no worse a pseudonym than "Abc Def".
I would think you would be better off excluding the characters you don't want with a regex. Trying to get every umlaut, accented e, hyphen, etc. will be pretty insane. Just exclude digits (but then what about a guy named "George Forman the 4th") and symbols you know you don't want like ##$%^ or what have you. But even then, using a regex will only guarantee that the input matches the regex, it will not tell you that it is a valid name.
EDIT after clarifying that this is trying to prevent XSS: A regex on a name field is obviously not going to stop XSS on its own. However, this article has a section on filtering that is a starting point if you want to go that route:
s/[\<\>\"\'\%\;\(\)\&\+]//g;
"Secure Programming for Linux and Unix HOWTO" by David A. Wheeler, v3.010 Edition (2003)
v3.72, 2015-09-19 is a more recent version.
BTW, do you plan to only permit the Latin alphabet, or do you also plan to try to validate Chinese, Arabic, Hindi, etc.?
As others have said, don't even try to do this. Step back and ask yourself what you are actually trying to accomplish. Then try to accomplish it without making any assumptions about what people's names are, or what they mean.
I don’t think that’s a good idea. Even if you find an appropriate regular expression (maybe using Unicode character properties), this wouldn’t prevent users from entering pseudo-names like John Doe, Max Mustermann (there even is a person with that name), Abcde Fghijk or Ababa Bebebe.
You could use the following regex code to validate 2 names separeted by a space with the following regex code:
^[A-Za-zÀ-ú]+ [A-Za-zÀ-ú]+$
or just use:
[[:lower:]] = [a-zà-ú]
[[:upper:]] =[A-ZÀ-Ú]
[[:alpha:]] = [A-Za-zÀ-ú]
[[:alnum:]] = [A-Za-zÀ-ú0-9]
It's a very difficult problem to validate something like a name due to all the corner cases possible.
Corner Cases
Anything anything here
Sanitize the inputs and let them enter whatever they want for a name, because deciding what is a valid name and what is not is probably way outside the scope of whatever you're doing; given the range of potential strange - and legal names is nearly infinite.
If they want to call themselves Tricyclopltz^2-Glockenschpiel, that's their problem, not yours.
A very contentious subject that I seem to have stumbled along here. However sometimes it's nice to head dear little-bobby tables off at the pass and send little Robert to the headmasters office along with his semi-colons and SQL comment lines --.
This REGEX in VB.NET includes regular alphabetic characters and various circumflexed european characters. However poor old James Mc'Tristan-Smythe the 3rd will have to input his pedigree in as the Jim the Third.
<asp:RegularExpressionValidator ID="RegExValid1" Runat="server"
ErrorMessage="ERROR: Please enter a valid surname<br/>" SetFocusOnError="true" Display="Dynamic"
ControlToValidate="txtSurname" ValidationGroup="MandatoryContent"
ValidationExpression="^[A-Za-z'\-\p{L}\p{Zs}\p{Lu}\p{Ll}\']+$">
This one worked perfectly for me in JavaScript:
^[a-zA-Z]+[\s|-]?[a-zA-Z]+[\s|-]?[a-zA-Z]+$
Here is the method:
function isValidName(name) {
var found = name.search(/^[a-zA-Z]+[\s|-]?[a-zA-Z]+[\s|-]?[a-zA-Z]+$/);
return found > -1;
}
Steps:
first remove all accents
apply the regular expression
To strip the accents:
private static string RemoveAccents(string s)
{
s = s.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.Length; i++)
{
if (CharUnicodeInfo.GetUnicodeCategory(s[i]) != UnicodeCategory.NonSpacingMark) sb.Append(s[i]);
}
return sb.ToString();
}
This somewhat helps:
^[a-zA-Z]'?([a-zA-Z]|\.| |-)+$
This one should work
^([A-Z]{1}+[a-z\-\.\']*+[\s]?)*
Add some special characters if you need them.

US Phone Number Verification

I have a website form that requires a US phone number input for follow up purposes, and this is very necessary in this case. I want try to eliminate users entering junk data 330-000-0000. I have seen some options of third parties that validate phone numbers for you, however idk if that is the best option for this situation. However if you have every used one of these third parties and can make a recommendation that would also be greatly appreciated here.
However I am considering checking the number against a set of rules to just try to narrow down the junk phone numbers received.
not a 555 number
does not contain 7 identical digits
valid area code (this is readily available)
not a 123-1234 or 123-4567
I guess I could also count out 867-5309 (heh*)
Would this result in any situations that you can think of that would not allow a user to enter their phone number? Could you think of any other rules that a phone number should not contain? Any other thoughts?
It seems to me that you're putting more effort into this than it warrants. Consider:
If your purpose is to guard against mis-entered phone numbers, then you can probably catch well over 90% of them with just a very simple check.
If your purpose is to try to force users to provide a valid number whether they want to give that information out or not, then you've taken on a hopeless task - even if you were able to access 100% accurate, up-to-the-second telco databases to verify that the exact number entered is currently live, you still don't gain any assurance that the number they gave you is their own. Once again, a simple check will foil the majority of people entering bogus numbers, but those who are willing to try more than two or three times will find a way to defeat your attempts to gain their numbers.
Either way, a simple test is going to get you good results and going into more complex rule sets will take up increasingly more time while providing increasingly little benefit to you (while also potentially adding false positives, as already shown with the "seven of the same digit" and 867-5309 cases).
You can do phone number validation internally in your app using regular expressions. Depending on your language you can call a function that will return true if a supplied phone number matches the expression.
In PHP:
function phone_number_is_valid($phone) {
return (eregi('^(?:\([2-9]\d{2}\)\ ?|[2-9]\d{2}(?:\-?|\ ?))[2-9]\d{2}[- ]?\d{4}$', $phone));
}
You can look up different regular expressions online. I found the one above one at http://regexlib.com/DisplayPatterns.aspx?categoryId=7&cattabindex=2
Edit: Some language specific sites for regular expressions:
PHP at php.net: http://php.net/regex
C# at MSDN
Java: http://java.sun.com/developer/technicalArticles/releases/1.4regex/
867-5309 is a valid phone number that is assigned to people in different area codes.
If you can verify the area code then unless you really, really need to know their phone number you're probably doing as much as is reasonable.
In Django there is a nice little contrib package called localflavor wich has a lot of country specific validation code, for example postal codes or phone numbers. You can look in the source too see how django handles these for the country you would like to use; For example: US Form validation. This can be a great recourse for information about countries you know little of as well.
Your customers can still do what I do, which is give out the local moviefone number.
Also, 123-1234 or 123-4567 are only invalid numbers because the prefix begins with a 1, but 234-5678 or 234-1234 would actually be valid (though it looks fake).
Maybe take a look at the answers to this question.
If you're sticking with just US- and Canada-format numbers, I think the following regex might work:
[2-9][0-9][0-9]-[2-9][0-9][0-9]-[0-9][0-9][0-9][0-9] & ![2-9][0-9][0-9]-555-[0-9][0-9][0-9][0-9]
You also need to take into account ten-digit dialing, which is used in some areas now: this is different from long-distance dialing (ie, 303-555-1234, as opposed to 1-303-555-1234). In some places, a valid phone number is ten digits long; in others, it is seven.
This is a quick function that I use (below). I do have access to a zipcode database that contains areacode and prefix data which is updated monthly. I have often thought about doing a data dip to confirm that the prefix exists for the area code.
public static bool isPhone(string phoneNum)
{
Regex rxPhone1, rxPhone2;
rxPhone1 = new Regex(#"^\d{10,}$");
rxPhone2 = new Regex(#"(\d)\1\1\1\1\1\1\1\1\1");
if(phoneNum.Trim() == string.Empty)
return false;
if(phoneNum.Length != 10)
return false;
//Check to make sure the phone number has at least 10 digits
if (!rxPhone1.IsMatch(phoneNum))
return false;
//Check for repeating characters (ex. 9999999999)
if (rxPhone2.IsMatch(phoneNum))
return false;
//Make sure first digit is not 1 or zero
if(phoneNum.Substring(0,1) == "1" || phoneNum.Substring(0,1) == "0")
return false;
return true;
}
I don't nkow if this is the right place, it's a formatting function rather than a validation function, I thought let's share it with the community, maybe one day it will be helpful..
Private Sub OnNumberChanged()
Dim sep = "-"
Dim num As String = Number.ToCharArray.Where(Function(c) Char.IsDigit(c)) _
.ToArray
Dim ext As String = Nothing
If num.Length > 10 Then ext = num.Substring(10)
ext = If(IsNullOrEmpty(ext), "", " x" & ext)
_Number = Left(num, 3) & sep & Mid(num, 4, 3) & sep & Mid(num, 7, 4) & ext
End Sub
My validation function is like so:
Public Shared Function ValidatePhoneNumber(ByVal number As String)
Return number IsNot Nothing AndAlso number.ToCharArray. _
Where(Function(c) Char.IsNumber(c)).Count >= 10
End Function
I call this last function # the OnNumberChanging(number As String) method of the entity.
For US and International Phone validation I found this code the most suitable:
((\+[1-9]{1,4}[ \-]*)|(\([0-9]{2,3}\)[ \-]*)|([0-9]{2,4})[ \-]*)*?[0-9]{3,4}?[ \-]*[0-9]{3,4}?$
You can find an (albeit somewhat dated) discussion here.
Those parameters look pretty good to me, I might also avoid numbers starting with 911 just to be safe.