What is the regular expression to allow uppercase/lowercase (alphabetical characters), periods, spaces and dashes only? - regex

I am having problems creating a regex validator that checks to make sure the input has uppercase or lowercase alphabetical characters, spaces, periods, underscores, and dashes only. Couldn't find this example online via searches. For example:
These are ok:
Dr. Marshall
sam smith
.george con-stanza .great
peter.
josh_stinson
smith _.gorne
Anything containing other characters is not okay. That is numbers, or any other symbols.

The regex you're looking for is ^[A-Za-z.\s_-]+$
^ asserts that the regular expression must match at the beginning of the subject
[] is a character class - any character that matches inside this expression is allowed
A-Z allows a range of uppercase characters
a-z allows a range of lowercase characters
. matches a period
rather than a range of characters
\s matches whitespace (spaces and tabs)
_ matches an underscore
- matches a dash (hyphen); we have it as the last character in the character class so it doesn't get interpreted as being part of a character range. We could also escape it (\-) instead and put it anywhere in the character class, but that's less clear
+ asserts that the preceding expression (in our case, the character class) must match one or more times
$ Finally, this asserts that we're now at the end of the subject
When you're testing regular expressions, you'll likely find a tool like regexpal helpful. This allows you to see your regular expression match (or fail to match) your sample data in real time as you write it.

Check out the basics of regular expressions in a tutorial. All it requires is two anchors and a repeated character class:
^[a-zA-Z ._-]*$
If you use the case-insensitive modifier, you can shorten this to
^[a-z ._-]*$
Note that the space is significant (it is just a character like any other).

Related

Regular expression for "This Specific A" formatting rule

I'm trying to find the regex expression that validates a specific rule, but I'm quite a beginner with regular expressions.
Rule
There can be any number of words
Words are space-separated
Words only contain letters
Words start with a capital
The last word must be a single capitalized character
Expression
Here is where I am so far: ([A-Z][a-z]+[ ]*)*[A-Z]
Examples
Match
Example Name A
A New Example C
No match
a Test B
Wrong Name
Another_Wrong_Name A
Nop3 A
Notes:
Your regex matches words with two or more letters only before the final one-letter word. You need to match one or more letter words using [A-Z][a-z]*
You use a character class, [ ], to match a single space, and this is redundant, remove brackets.
You need to match the entire string, with anchors, ^ and $, or \A and \z/\Z (depending on regex flavor).
You can use
^([A-Z][a-z]* )*[A-Z]$
^(?:[A-Z][a-z]* )*[A-Z]$
^(?:[A-Z][a-z]*\h)*[A-Z]$
^(?:[A-Z][a-z]*[^\S\r\n])*[A-Z]$
Note [^\S\r\n] and \h match horizontal whitespace, not just a regular space.
The non-capturing group, (?:...), is used merely for grouping patterns without keeping the text they matched in the dedicated memory slot, which is best practice, especially with repeated groups.
See this regex demo.

Regex that excludes spaces and requires 2 capital letters or more

I'm trying to create a regular expression that matches strings with:
19 to 90 characters
symbols
at least 2 uppercase alphabetical characters
lowercase alphabetical characters
no spaces
I already know that for the size and space exclusion the regex would be:
^[^ ]{19,90}$
And I know that this one will match any a string with at least 2 uppercase characters:
^(.*?[A-Z]){2,}.*$
What I don't know is how to combine them. There is no context for the strings.
Edit: I forgot to say that it is better ifthe regex excludes strings that finish with a .com or .jpeg or .png or any .something (that "something" being of 2-5 characters).
This regex should do what you want.
^(?=(?:\w*\W+)+\w*$)(?=(?:\S*?[A-Z]){2,}\S*?$)(?=(?:\S*?[a-z])+\S*?$)(?!.*?\.\w{2,5}$).{19,90}$
Basically it uses three positive lookaheads and a negative lookahead to guarantee the conditions that you specified:
(?=(?:\w*\W+)+\w*$)
ensures that there is at least one non-word (symbol) character
(?=(?:\S*?[A-Z]){2,}\S*?$)
ensures that there are at least two uppercase characters, and also excludes a match if there are any spaces in the string
(?=(?:\S*?[a-z])+\S*?$)
ensures that there is at least one lowercase character in the string. The negative lookahead
(?!.*?\.\w{2,5}$)
ensures that strings that end with a . and 2-5 characters are excluded
Finally,
.{19.90}
performs the actual match and ensures that there are between 19 and 90 characters.
Following your requrements, I suggest to use the following pattern:
^(?=.*[a-z])(?=.*[A-Z].*[A-Z])(?=.*[^\s]).{19,90}$
Demo
Instead of just excluding spaces, I used \ssince you probably don't want allow tabs, newlines, etc. either. However, it is still unclear which symbols you want to allow, e.g. [a-zA-Z!"§$%&\/()=?+]
^(?=.*[a-z])(?=.*[A-Z].*[A-Z])(?=.*[^\s])(?=[a-zA-Z!"§$%&\/()=?+]).{19,90}$
To match your additional requirement not to match file-like extensions at the end of the string, add a negative look-ahead: (?!.*\.\w{2,5}$)
^(?=.*[a-z])(?=.*[A-Z].*[A-Z])(?=.*[^\s])(?=[a-zA-Z!"§$%&\/()=?+]).{19,90}$
Demo2
You can use backreferences as described here: https://www.ocpsoft.org/tutorials/regular-expressions/and-in-regex/
Another reference with examples here: https://www.regular-expressions.info/refcapture.html

Regex pattern to match string that's not followed by a colon

Using regex, I'm trying to match any string of characters that meets the following conditions (in the order displayed):
Contains a dollar sign $; then
at least one letter [a-zA-Z]; then
zero or more letters, numbers, underscores, periods (dots), opening brackets, and/or closing brackets [a-zA-Z0-9_.\[\]]*; then
one pipe character |; then
one at sign #; then
at least one letter [a-zA-Z]; then
zero or more letters, numbers, and/or underscores [a-zA-Z0-9_]*; then
zero colons :
In other words, if a colon is found at the end of the string, then it should not count as a match.
Here are some examples of valid matches:
$tmp1|#hello
$x2.h|#hi_th3re
Valid match$here|#in_the middle of other characters
And here are some examples of invalid matches:
$tmp2|#not_a_match:"because there is a colon"
$c.4a|#also_no_match:
Here are some of the patterns I've tried:
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]*)(\|#)([a-zA-Z][a-zA-Z0-9_]*(?!.[:]))
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*(?![:]))
(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*)([^:])
This pattern will do what you need
\$[A-Za-z]+[\w.\[\]]*[|]#[A-Za-z]+[\w]*+(?!:)
Regex Demo
I am using possessive quantifiers to cut down the backtracking using [\w]*+. You can also use atomic groups instead of possessive quantifiers like
\$[A-Za-z]+[\w.\[\]]*[|]#[A-Za-z]+(?>[\w]*)(?!:)
NOTE
\w => [A-Za-z0-9_]
I tested your third pattern in Regex 101 and it appears to be working correctly:
^.*(\$[a-zA-Z])([a-zA-Z0-9_.\[\]]+)?(\|#)([a-zA-Z][a-zA-Z0-9_]*)([^:]).*$
The only change I needed to make to the regex to make it work was to add anchors ^ and $ to the start and end of the regex. I also allowed for your pattern to occur as a substring in the middle of a larger string.
By the way, you had the following example as a string which should not match:
$tmp2|#not_a_match:"because there is a colon"
However, even if we remove the colon from this string it will still not match because it contains quotes which are not allowed.
Regex101

What does \'.- mean in a Regular Expression

I'm new to regular expression and I having trouble finding what "\'.-" means.
'/^[A-Z \'.-]{2,20}$/i'
So far from my research, I have found that the regular expression starts (^) and requires two to twenty ({2,20}) alphabetical (A-Z) characters. The expression is also case insensitive (/i).
Any hints about what "\'.-" means?
The character class is the entire expression [A-Z \'.-], meaning any of A-Z, space, single quote, period, or hyphen. The \ is needed to protect the single quote, since it's also being used as the string quote. This charclass must be repeated 2 to 20 times, and because of the leading ^ and trailing $ anchors that must be the entire content of the matching string.
It means to escape the single quote (') that delmits the regex (as to not prematurely end the string), and then a . which means a literal . and a - which means a literal -.
Inside of the character range, the . is treated literally, and if the - isn't part of a valid range, e.g. a-z, then it is treated literally as well.
Your regex says Match the characters a-zA-Z '.- between 2 and 20 times as the entire string, with an optional trailing \n.
This regex is in a string. The backslash is there to escape the single quote so the string doesn't end early, in the middle of the regex. The dot and dash are just what they are, a period and a dash.
So, you were nearly right, except it's 2-20 characters that are letters, space, single quote, period, or dash.
It's quoting the quote.
The regular expression is ^[A-Z'.-]{2,20}$.
In the programming language you are using, you write it as a quoted string:
'SOMETHING'
To get a single quote in there, it's been backslashed.
Everything inside the square brackets is part of the character class, and will match a single character listed. In your example, the characters listed are the letters A through Z, a space, a single quote, a period, or a hyphen. (Note the hyphen must be listed last to avoid indicating a range, like A-Z.) Your full regular expression will match between 2 and 20 of the listed characters. The single quote is needed so the compiler knows you are not ending the string that defines the regular expression.
Some examples of things this will match:
....................
abaca af - .
AAfa- - ..
.z
And so on.

Regex for alphanumeric, but at least one letter

In my ASP.NET page, I have an input box that has to have the following validation on it:
Must be alphanumeric, with at least one letter (i.e. can't be ALL
numbers).
^\d*[a-zA-Z][a-zA-Z0-9]*$
Basically this means:
Zero or more ASCII digits;
One alphabetic ASCII character;
Zero or more alphanumeric ASCII characters.
Try a few tests and you'll see this'll pass any alphanumeric ASCII string where at least one non-numeric ASCII character is required.
The key to this is the \d* at the front. Without it the regex gets much more awkward to do.
Most answers to this question are correct, but there's an alternative, that (in some cases) offers more flexibility if you want to change the rules later on:
^(?=.*[a-zA-Z].*)([a-zA-Z0-9]+)$
This will match any sequence of alphanumerical characters, but only if the first group also matches the whole sequence. It's a little-known trick in regular expressions that allows you to handle some very difficult validation problems.
For example, say you need to add another constraint: the string should be between 6 and 12 characters long. The obvious solutions posted here wouldn't work, but using the look-ahead trick, the regex simply becomes:
^(?=.*[a-zA-Z].*)([a-zA-Z0-9]{6,12})$
^[\p{L}\p{N}]*\p{L}[\p{L}\p{N}]*$
Explanation:
[\p{L}\p{N}]* matches zero or more Unicode letters or numbers
\p{L} matches one letter
[\p{L}\p{N}]* matches zero or more Unicode letters or numbers
^ and $ anchor the string, ensuring the regex matches the entire string. You may be able to omit these, depending on which regex matching function you call.
Result: you can have any alphanumeric string except there's got to be a letter in there somewhere.
\p{L} is similar to [A-Za-z] except it will include all letters from all alphabets, with or without accents and diacritical marks. It is much more inclusive, using a larger set of Unicode characters. If you don't want that flexibility substitute [A-Za-z]. A similar remark applies to \p{N} which could be replaced by [0-9] if you want to keep it simple. See the MSDN page on character classes for more information.
The less fancy non-Unicode version would be
^[A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*$
^[0-9]*[A-Za-z][0-9A-Za-z]*$
is the regex that will do what you're after. The ^ and $ match the start and end of the word to prevent other characters. You could replace the [0-9A-z] block with \w, but i prefer to more verbose form because it's easier to extend with other characters if you want.
Add a regular expression validator to your asp.net page as per the tutorial on MSDN: http://msdn.microsoft.com/en-us/library/ms998267.aspx.
^\w*[\p{L}]\w*$
This one's not that hard. The regular expression reads: match a line starting with any number of word characters (letters, numbers, punctuation (which you might not want)), that contains one letter character (that's the [\p{L}] part in the middle), followed by any number of word characters again.
If you want to exclude punctuation, you'll need a heftier expression:
^[\p{L}\p{N}]*[\p{L}][\p{L}\p{N}]*$
And if you don't care about Unicode you can use a boring expression:
^[A-Za-z0-9]*[A-Za-z][A-Za-z0-9]*$
^[0-9]*[a-zA-Z][a-zA-Z0-9]*$
Can be
any number ended with a character,
or an alphanumeric expression started with a character
or an alphanumeric expression started with a number, followed by a character and ended with an alphanumeric subexpression