I am weak in regex but I am learning. Currently I have a requirement to validate name and I am not able to write a valid regex for it. A valid name would contain alphabet only or alphabet with hyphens or spaces.
Example of valid name would be
jones
jones-smiht
a loreal jones
but if the name contains digits it's an invalid name. The following regex
^[-\\sa-zA-Z]+$ works fine but only - is also considered as a valid name.
How do I modify it so that a valid name must contain letters regardless or whether it contains hyphens and spaces?
I think you're looking for this regex:
^[a-zA-Z][-\\sa-zA-Z]*$
This will make sure your name always starts with a letter instead of starting with hyphen or space.
Note: In Java you can also make use of (?i) for ignore case and shorten your regex as follows:
(?i)^[a-z][-\\sa-z]*$
The literal answer for you would be ^[a-zA-Z][-\sa-zA-Z]*$.
There are better answers: for instance,
([a-zA-Z]+)([-\s][a-zA-Z]+)*
will allow any number of words separated by single space or dash, allowing for simon peyton-jones, but disallowing silliness like --jumbo-spaz--.
And copied from the response I tried to publish on the deleted answer:
Regexp is single-backslash. However, since regexps are constructed from strings in Java, you need to escape the backslash; but it is the feature of strings, not of regexps. So, regexp is \s, but you need to write Pattern.compile("\\s") in Java. Not all languages have this twist, so keeping rules of strings separate from what Regexp is is useful.
Related
I have this regular expression: ^[a-zA-Z]\s{3,16}$
What I want is to match any name with any spaces, for example, John Smith and that contains 3 to 16 characters long..
What am I doing wrong?
Background
There are a couple of things to note here. First, a quantifier (in this case, {3,16}) only applies to the last regex token. So what your current regex really is saying is to "Match any string that has a single alphabetical character (case-insensitive) followed by 3 to 16 whitespace characters (e.g. spaces, tabs, etc.)."
Second, a name can have more than 2 parts (a middle name, certain ethnic names like "De La Cruz") or include special characters such as accented vowels. You should consider if this is something you need to account for in your program. These things are important and should be considered for any real application.
Assumptions and Answer
Now, let's just assume you only want a certain format for names that consists of a first name, a last name, and a space. Let's also assume you only want simple ASCII characters (i.e. no special characters or accented characters). Furthermore, both the first and last names should start with a capital character followed by only lower-case characters. Other than that, there are no restrictions on the length of the individual parts of the name. In this case, the following regex would do the trick:
^(?=.{3,16}$)[A-Z][a-z]+ [A-Z][a-z]+$
Notes
The first token after the ^ character is what is called a positive lookahead. Basically a positive look ahead will match the regex between the opening (?= and closing ) without actually moving the position of the cursor that is matching the string.
Notice I removed the \s token, since you usually want only a (space). The space can be replaced with the \s token, if tabs and other whitespace is desired there.
I also added a restriction that a name must start with a capital letter followed by only lower-case letters.
Crude English Translation
To help your understanding, here is a simple English translation of what the regex is really doing. The part in italics is just copied from the first part of the English translation of the regex.
"Match any string that has 3-16 characters and starts with a capital alphabetical character followed by one or more (+) alphabetical characters followed by a single space followed by a capital alphabetical character followed by one or more (+) alphabetical characters and ends with any lowercase letter."
Tools
There are a couple of tools I like to use when I am trying to tackle a challenging regex. They are listed below in no particular order:
https://regex101.com/ - Allows you to test regex expressions in real time. It also has a nifty little library to help you along.
http://www.regular-expressions.info/ - Basically a repository of knowledge on regex.
Edit/Update
You mentioned in your comments that you are using your regex in JavaScript. JavaScript uses a forward slash surrounding the regex to determine what is a regex. For this simple case, there are 2 options for using a regex to match a string.
First, use String's match method as follows
"John Smith".match(/^(?=.{3,16}$)[A-Z][a-z]+ [A-Z][a-z]+$/);
Second, create a regex and use its test() method. For example,
/^(?=.{3,16}$)[A-Z][a-z]+ [A-Z][a-z]+$/.test("John Smith");
The latter is probably what you want as it simply returns true or false depending on whether the regex actually matches the string or not.
So I've got /.+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(.{1})\w{2,}/ pattern I want to use for email validation on client-side, which doesn't work as expected.
I know that my pattern is simple and doesn't cover every standard possibility, but it's part of my regex training.
Local part of address should be valid only when it has at least one digit [0-9] or letter [a-zA-Z] and can be mixed with comma or plus sign or underscore (or all at once) and then # sign, then domain part, but no IP address literals, only domain names with at least one letter or digit, followed by one dot and at least two letters or two digits.
In test string form it doesn't validate a#b.com and does validate baz_bar.test+private#e-mail-testing-service..com, which is wrong - it should be vice versa - validate a#b.com and not validate baz_bar.test+private#e-mail-testing-service..com
What specific error I've got there and where?
I can't locate this, sorry..
You need to change your regex
From: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]\#[\w+-?]+(\.{1})\w{2,}
To: .+[^\x20-\x2A\x2C\x2F\x3A-\x40\x5B-\x5E\x60\x7B-\xFF]?\#[\w+-]+(\.{1})\w{2,}
Notice that I added a ? before the # sign and removed the ? from the first "group" after the # sign. Adding that ? will make your regex to know that hole "group" is not mandatory.
See it working here: https://regex101.com/r/iX5zB5/2
You're requiring the local part (before #) to be at least two characters with the .+ followed by the character class [^...]. It's looking for any character followed by another character not in the list of exclusions you specify. That explains why "a#b.com" doesn't match.
The second problem is partly caused by the character class range +-? which includes the . character. I think you wanted [-\w+?]+. (Do you really want question marks?) And then later I think you wanted to look for a literal . character but it really ends up matching the first character that didn't match the previous block.
Between the regex provided and the explanatory text I'm not sure what rules you intend to implement though. And since this is an exercise it's probably better to just give hints anyway.
You will also want to use the ^ and $ anchors to makes sure the entire string matches.
I am working with ASP.NET MVC 5 application in which I want to add dataannotation validation for Name field.
That should accept any combination of number,character and under score only.
I tried by this but not working :
RegularExpression("([a-zA-Z0-9_ .&'-]+)", ErrorMessage = "Invalid.")]
Try this regex written under the regexr.com site.
Criteria - alphanumeric,underscore and space.
http://regexr.com/3agii
([a-zA-Z0-9_\s]+)
You are using a character class, that is the thing between the square brackets ([a-zA-Z0-9_ .&'-]). Within that square brackets you can define all characters that should be matched by this class. So, now it is easy: you allow characters you don't want to match.
Based on your "try" you could change this to
[a-zA-Z0-9_]
that seem to be the characters you want to match. But is it really what you need? Are that really the only characters that are possible for that field?
If yes then you are done.
If no, you probably want to add all characters of all languages. Luckily there is a Unicode property for that:
\p{L} All letter characters
There is another predefined group that could be useful for you:
\w matches any word character (The definition can also be found in the first link, includes the Unicode categories Ll,Lu,Lt,Lo,Lm,Nd,Pc, that is basically [a-zA-Z0-9_] but Unicode style with all letters and more connecting characters)
But still, if you want to match real names this will not cover all possible names. I have another answer on this topic here
I've tried the following code, but it gives me nomatch.
re:run("qw#qc.com", "\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b").
regexp i got here http://www.regular-expressions.info/email.html
EDITED:
Next doesnt work to
re:run("345345", "\b[0-9]+\b").
If you got just en email in string when that one will match
re:run("qw#qc.com", "^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}$").
I hesitate to answer this question, since I believe it relies on an incorrect assumption - that you can determine whether an email address is valid or not with a regular expression. See this question for more details; from a short glance I'd note that the regexp in your question doesn't accept the .museum and .рф top-level domains.
That said, you need to escape the backslashes. You want the string to contain backslashes, but in Erlang, backslashes are used inside strings to escape various characters, so any literal backslash needs to be written as \\. Try this:
3> re:run("qw#qc.com", "\\b[a-z0-9._%+-]+#[a-z0-9.-]+\\.[a-z]{2,4}\\b").
{match,[{0,9}]}
Or even better, this:
8> re:run("qw#qc.com", "\\b[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*\\b").
{match,[{0,9}]}
That's the regexp used in the HTML 5 standard, modified to use \\b instead of ^ and $.
Looks like you need a case-insensitive match ?
Currently [A-Z0-9._%+-] (for example) only matches upper-case characters (plus numbers etc).
One solution is to specify [A-Za-z]. Another solution is to convert your email address to uppercase prior to matching.
I'm looking to match Twitter syntax with a regex.
How can I match anything that is "#______" that is, begins with an # symbol, and is followed by no spaces, just letters and numbers until the end of the word? (To tweeters, I want to match someone's name in a reply)
Go for
/#(\w+)/
to get the matching name extracted as well.
#\w+
That simple?
It should be noted that Twitter no longer allows usernames longer than 15 characters, so you can also match with:
#\w{1,15}
There are still apparently a few people with usernames longer than 15 characters, but testing on 15 would be better if you want to exclude likely false positives.
There are apparently no rules regarding whether underscores can be used the the beginning or end of usernames, multiple underscores, etc., and there are accounts with single-letter names, as well as someone with the username "_".
#[\d\w]+
\d for a digit character
\w for a word character
[] to denote a character class
+ to represent more than one instances of the character class
Note that these specifiers for word and digit characters are language dependent. Check the language specification to be sure.
There is a very extensive API for how to get valid twitter names, mentions, etc. The Java version of the API provided by Twitter can be found on github twitter-text-java. You may want to take a look at it to see if this is something you can use.
I have used it to validate Twitter names and it works very well.