US State Regular expression with case sensitive - regex

I'm using ASP.NET MVC application and model has the following regular expression to validate US states.
This one works fine if user enter all upper case, but not working for lower case/camel case scenarios.
[RegularExpression(#"^((A[ELKSZR])|(C[AOT])|(D[EC])|(F[ML])|(G[AU])|(HI)|(I[DLNA])|(K[SY])|(LA)|(M[EHDAINSOT])|(N[EVHJMYCD])|(MP)|(O[HKR])|(P[WAR])|(RI)|(S[CD])|(T[NX])|(UT)|(V[TIA])|(W[AVIY]))$", ErrorMessage = "Invalid State")]
public string State { get; set; }
I tried this one, but no luck.
// [RegularExpression(#"^(?-i:A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])$", ErrorMessage = "Invalid State")]
thank you.

Since this expression can be used for client side validation (and thus requires ECMA regex syntax, that is, JavaScript-compatible regular expression) you cannot use an inline modifier like (?i) let alone the toggled version (?i:...).
You have to double each letter with the lowercase counterpart:
^(([Aa][EeLlKkSsZzRr])|([Cc][AaOoTt])|([Dd][EeCc])|([Ff][MmLl])|([Gg][AaUu])|([Hh][Ii])|([Ii][DdLlNnAa])|([Kk][SsYy])|([Ll][Aa])|([Mm][EeHhDdAaIiNnSsOoTt])|([Nn][EeVvHhJjMmYyCcDd])|([Mm][Pp])|([Oo][HhKkRr])|([Pp][WwAaRr])|([Rr][Ii])|([Ss][CcDd])|([Tt][NnXx])|([Uu][Tt])|([Vv][TtIiAa])|([Ww][AaVvIiYy]))$
See demo

The list above is not as exhaustive - it is missing some military abbreviations. Trust me - you do not want to receive the ire of patriotic families trying to send stuff to their loved ones in the military.
Same technique - I added a few more.
^(([Aa][EeLlKkSsZzRr])|([Cc][AaOoTt])|([Dd][EeCc])|([Ff][MmLl])|([Gg][AaUu])|([Hh][Ii])|([Ii][DdLlNnAa])|([Kk][SsYy])|([Ll][Aa])|([Mm][EeHhDdAaIiNnSsOoTt])|([Nn][EeVvHhJjMmYyCcDd])|([Mm][Pp])|([Oo][HhKkRr])|([Pp][WwAaRr])|([Rr][Ii])|([Ss][CcDd])|([Tt][NnXx])|([Uu][Tt])|([Vv][TtIiAa])|([Ww][AaVvIiYy]))$

I have used
[^,]*[A-Z]{2}
hopefully, it works for you.

Related

Edit regular expression to support + (plus) notation

I'm using a regex to 'quick and dirty' validate an email address client side and I just found out it doesn't support the + plus notation (user+anything#gmail.com) google provides its users. I'm sure it fails in other points as well. How can I edit this to support + notation and ensure I'm dealing with an email address while not pissing anyone with an oddly formed email address off?
`var emailReg = new RegExp(/^(("[\w-\s]+")|([\w-]+(?:\.[\w-]+)*)|("[\w-\s]+")([\w-]+(?:\.[\w-]+)*))(#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$)|(#\[?((25[0-5]\.|2[0-4][0-9]\.|1[0-9]{2}\.|[0-9]{1,2}\.))((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\.){2}(25[0-5]|2[0-4][0-9]|1[0-9]`{2}|[0-9]{1,2})\]?$)/i);
Word wrapped:
var emailReg = new RegExp(/^(("[\w-\s]+")|([\w-]+(?:\.[\w-]+)*)|("[\w-\s]+")([\w-]+(?:\.[\w-]+)*))(#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$)|(#\[?((25[0-5]\.|2[0-4][0-9]\.|1[0-9]{2}\.|[0-9]{1,2}\.))((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\.){2}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\]?$)/i);
Thank you,
You could use this:
^(("[\w-\s]+")|([\w-]+(?:[.+][\w-]+)*)|("[\w-\s]+")([\w-]+(?:[.+][\w-]+)*))(#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$)|(#\[?((25[0-5]\.|2[0-4][0-9]\.|1[0-9]{2}\.|[0-9]{1,2}\.))((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\.){2}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\]?$)
It's still quick and dirty. It will allow your example user+anything#gmail.com, but will also allow user+anything+else#gmail.com. It won't allow for user++anything#gmail.com or user.+anything#gmail.com.
I just copied your regular expression and removed an extra parenthesis and is working fine to me:
^(("[\w-\s]+")|([\w-]+(?:.[\w-]+))|("[\w-\s]+")([\w-]+(?:.[\w-]+)))(#((?:[\w-]+.)*\w[\w-]{0,66}).([a-z]{2,6}(?:.[a-z]{2})?)$)|(#[?((25[0-5].|2[0-4][0-9].|1[0-9]{2}.|[0-9]{1,2}.)((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2}).){2}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})]?$
Live demo:
https://regex101.com/r/kX3jW0/1

What kind of "regular expression" system does SAP GUI use in the user interface?

I'm trying to search the documentation for a data element whose description contains the string '*hh:mm' but not 'mm:ss' (where '' is a wildcard for any number of characters).
I don't know how to do it, so I was wondering if any of you know the regex system SAP GUI uses, so I can have a look at what I can do with it.
Thx, you guys rule!
The GUI does not give you the opportunity to use regular expressions. You're limited to a simple pattern matching using * and ?. Furthermore, it's a bad idea to search using the description text because the text and the search is case sensitive - you'd find "hh:mm", but not "HH:MM". In the special case you mention, you could use the repository infosystem to search for domains based on the data type TIMS but with an output length of 5 and then use the where-used index to find out a corresponding data element. (It might even be possible to search for a data element based on a certain data type, I'm not entirely sure.)
As of release 7.0, ABAP supports extended regular expressions in accordance with POSIX standard 1003.2.
The classes CL_ABAP_REGEX and CL_ABAP_MATCHER permit object-oriented use of regular expressions.
More detail here

Regular Expression in Perl to check fit/match

I'm in great trouble.
I must check if a string fits (matches) another string with RegEx.
For example, given the following string:
Apr 2 13:42:32 sandbox izxp[12000]: Received disconnect from 10.11.106.14: 10: disconnected by user
In the editable input field I give the program the following shortened string:
Received disconnect from 10.11.106.14: 10
If it fits the existing string (as you can see above), it is OK.
If any part of the new edited string doesn't fit the original string, I must warn the user with a message.
Could you help me solving this question with RegEx? Or another method?
I would appreciate it!
You must get the original string in a variable, let's call it $original (this is perl). Then you must get the input from the "editable input field", let's call it $input.
Then it is a simple
if ($original=~/$input/)
{
#Your code for a message to the user here
}
Your solution would be less regex and more escaping. Assuming you're going to use no regex patterns and just search for the input string literal, you should write your function so that it turns this
Received disconnect from 10.11.106.14: 10
into this
Received disconnect from 10\.11\.106\.14: 10
This can be achieved with many different libraries depending on which language you are using.
That will then allow you to check for a match.
Regular Expressions are more designed for common patterns in strings, rather than finding exact literals.

ASP.NET Validation Controls VS C# regular expression Validation

To me regular expression validation seems straight forward and meaningfull rather than validating everything with asp.net validation controls. I am learning asp.net and do not want to memorize all asp.net validation controls, when any form input can be simply validated with reqular expression. Am I thinking right or should I use validation controls?
Example:`RequiredFieldValidator vs Regex Solution C#
if(TextBox1.Text == ""){
Label1.text = "Name Field is required, Please try again";
return;
}
CompareValidator vs Regex Solution
if(Regex.IsMatch(TextBox1.Text, #"^[0-9]")){
if(Convert.ToInt32(TextBox1.Text) > 18){
output.InnerHtml = #"some code";
} else{
Label1.Text = "You should be old enough to express out your political views ";
return;
}
} else{
Label1.Text = "You should be old enough to express out your political views";
return;
}
}
`
Thinking would not be better to do everything in C#, rather than remembring all those validation controls
The major advantage to the validation controls is that in most cases they will output JavaScript validation for the client side that matches the server-side validation. This can reduce round trips to the server which is always a benefit. However, if you're good with JavaScript, you can probably code the client side piece more efficiently than the control would output anyway.
One other thing to consider, when using the control you can turn the validation on/off on both client and server using one flag on the control, if using your own code, you have to handle those separately.
You are right, regular expression validator can replace a lot of other validators, provided you can write a validation expression that works well on client and server side.
You can do a lot of validation work in regular expressions, but there are some areas where regexes are not ideal:
date validation: Either you get a terribly unwieldy regex, or you'll miss lots of plausible but illegal dates (like Feb 29, 2000).
email validation. Same thing here - you either reject some valid addresses, or you allow invalid addresses (and in either case, you'll allow addresses that are syntactically OK but don't correspond to an actual mailbox).
number validation in general - regular expressions are good for matching textual data. Using them to validate numbers is cumbersome and error-prone. Have you thought of exponential notation, locale-dependent decimal separators, thousands separators, leading zeroes, etc...?
Apart from that, the JavaScript regex engine has some limitations (e. g., lack of lookbehind assertions) that you need to know about when trying to write regexes that have to work both on the client and the server side.
And finally, do you realize that there's an error in your example regex? Maybe using a validator is safer unless you really know how to build a regex that does exactly what you intend it to do...

Regular expression for validating names and surnames?

Although this seems like a trivial question, I am quite sure it is not :)
I need to validate names and surnames of people from all over the world. Imagine a huge list of miilions of names and surnames where I need to remove as well as possible any cruft I identify. How can I do that with a regular expression? If it were only English ones I think that this would cut it:
^[a-z -']+$
However, I need to support also these cases:
other punctuation symbols as they might be used in different countries (no idea which, but maybe you do!)
different Unicode letter sets (accented letter, greek, japanese, chinese, and so on)
no numbers or symbols or unnecessary punctuation or runes, etc..
titles, middle initials, suffixes are not part of this data
names are already separated by surnames.
we are prepared to force ultra rare names to be simplified (there's a person named '#' in existence, but it doesn't make sense to allow that character everywhere. Use pragmatism and good sense.)
note that many countries have laws about names so there are standards to follow
Is there a standard way of validating these fields I can implement to make sure that our website users have a great experience and can actually use their name when registering in the list?
I would be looking for something similar to the many "email address" regexes that you can find on google.
I sympathize with the need to constrain input in this situation, but I don't believe it is possible - Unicode is vast, expanding, and so is the subset used in names throughout the world.
Unlike email, there's no universally agreed-upon standard for the names people may use, or even which representations they may register as official with their respective governments. I suspect that any regex will eventually fail to pass a name considered valid by someone, somewhere in the world.
Of course, you do need to sanitize or escape input, to avoid the Little Bobby Tables problem. And there may be other constraints on which input you allow as well, such as the underlying systems used to store, render or manipulate names. As such, I recommend that you determine first the restrictions necessitated by the system your validation belongs to, and create a validation expression based on those alone. This may still cause inconvenience in some scenarios, but they should be rare.
I'll try to give a proper answer myself:
The only punctuations that should be allowed in a name are full stop, apostrophe and hyphen. I haven't seen any other case in the list of corner cases.
Regarding numbers, there's only one case with an 8. I think I can safely disallow that.
Regarding letters, any letter is valid.
I also want to include space.
This would sum up to this regex:
^[\p{L} \.'\-]+$
This presents one problem, i.e. the apostrophe can be used as an attack vector. It should be encoded.
So the validation code should be something like this (untested):
var name = nameParam.Trim();
if (!Regex.IsMatch(name, "^[\p{L} \.\-]+$"))
throw new ArgumentException("nameParam");
name = name.Replace("'", "'"); //' does not work in IE
Can anyone think of a reason why a name should not pass this test or a XSS or SQL Injection that could pass?
complete tested solution
using System;
using System.Text.RegularExpressions;
namespace test
{
class MainClass
{
public static void Main(string[] args)
{
var names = new string[]{"Hello World",
"John",
"João",
"タロウ",
"やまだ",
"山田",
"先生",
"мыхаыл",
"Θεοκλεια",
"आकाङ्क्षा",
"علاء الدين",
"אַבְרָהָם",
"മലയാളം",
"상",
"D'Addario",
"John-Doe",
"P.A.M.",
"' --",
"<xss>",
"\""
};
foreach (var nameParam in names)
{
Console.Write(nameParam+" ");
var name = nameParam.Trim();
if (!Regex.IsMatch(name, #"^[\p{L}\p{M}' \.\-]+$"))
{
Console.WriteLine("fail");
continue;
}
name = name.Replace("'", "'");
Console.WriteLine(name);
}
}
}
}
I would just allow everything (except an empty string) and assume the user knows what his name is.
There are 2 common cases:
You care that the name is accurate and are validating against a real paper passport or other identity document, or against a credit card.
You don't care that much and the user will be able to register as "Fred Smith" (or "Jane Doe") anyway.
In case (1), you can allow all characters because you're checking against a paper document.
In case (2), you may as well allow all characters because "123 456" is really no worse a pseudonym than "Abc Def".
I would think you would be better off excluding the characters you don't want with a regex. Trying to get every umlaut, accented e, hyphen, etc. will be pretty insane. Just exclude digits (but then what about a guy named "George Forman the 4th") and symbols you know you don't want like ##$%^ or what have you. But even then, using a regex will only guarantee that the input matches the regex, it will not tell you that it is a valid name.
EDIT after clarifying that this is trying to prevent XSS: A regex on a name field is obviously not going to stop XSS on its own. However, this article has a section on filtering that is a starting point if you want to go that route:
s/[\<\>\"\'\%\;\(\)\&\+]//g;
"Secure Programming for Linux and Unix HOWTO" by David A. Wheeler, v3.010 Edition (2003)
v3.72, 2015-09-19 is a more recent version.
BTW, do you plan to only permit the Latin alphabet, or do you also plan to try to validate Chinese, Arabic, Hindi, etc.?
As others have said, don't even try to do this. Step back and ask yourself what you are actually trying to accomplish. Then try to accomplish it without making any assumptions about what people's names are, or what they mean.
I don’t think that’s a good idea. Even if you find an appropriate regular expression (maybe using Unicode character properties), this wouldn’t prevent users from entering pseudo-names like John Doe, Max Mustermann (there even is a person with that name), Abcde Fghijk or Ababa Bebebe.
You could use the following regex code to validate 2 names separeted by a space with the following regex code:
^[A-Za-zÀ-ú]+ [A-Za-zÀ-ú]+$
or just use:
[[:lower:]] = [a-zà-ú]
[[:upper:]] =[A-ZÀ-Ú]
[[:alpha:]] = [A-Za-zÀ-ú]
[[:alnum:]] = [A-Za-zÀ-ú0-9]
It's a very difficult problem to validate something like a name due to all the corner cases possible.
Corner Cases
Anything anything here
Sanitize the inputs and let them enter whatever they want for a name, because deciding what is a valid name and what is not is probably way outside the scope of whatever you're doing; given the range of potential strange - and legal names is nearly infinite.
If they want to call themselves Tricyclopltz^2-Glockenschpiel, that's their problem, not yours.
A very contentious subject that I seem to have stumbled along here. However sometimes it's nice to head dear little-bobby tables off at the pass and send little Robert to the headmasters office along with his semi-colons and SQL comment lines --.
This REGEX in VB.NET includes regular alphabetic characters and various circumflexed european characters. However poor old James Mc'Tristan-Smythe the 3rd will have to input his pedigree in as the Jim the Third.
<asp:RegularExpressionValidator ID="RegExValid1" Runat="server"
ErrorMessage="ERROR: Please enter a valid surname<br/>" SetFocusOnError="true" Display="Dynamic"
ControlToValidate="txtSurname" ValidationGroup="MandatoryContent"
ValidationExpression="^[A-Za-z'\-\p{L}\p{Zs}\p{Lu}\p{Ll}\']+$">
This one worked perfectly for me in JavaScript:
^[a-zA-Z]+[\s|-]?[a-zA-Z]+[\s|-]?[a-zA-Z]+$
Here is the method:
function isValidName(name) {
var found = name.search(/^[a-zA-Z]+[\s|-]?[a-zA-Z]+[\s|-]?[a-zA-Z]+$/);
return found > -1;
}
Steps:
first remove all accents
apply the regular expression
To strip the accents:
private static string RemoveAccents(string s)
{
s = s.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.Length; i++)
{
if (CharUnicodeInfo.GetUnicodeCategory(s[i]) != UnicodeCategory.NonSpacingMark) sb.Append(s[i]);
}
return sb.ToString();
}
This somewhat helps:
^[a-zA-Z]'?([a-zA-Z]|\.| |-)+$
This one should work
^([A-Z]{1}+[a-z\-\.\']*+[\s]?)*
Add some special characters if you need them.