Regex to match only letters - regex

How can I write a regex that matches only letters?

Use a character set: [a-zA-Z] matches one letter from A–Z in lowercase and uppercase. [a-zA-Z]+ matches one or more letters and ^[a-zA-Z]+$ matches only strings that consist of one or more letters only (^ and $ mark the begin and end of a string respectively).
If you want to match other letters than A–Z, you can either add them to the character set: [a-zA-ZäöüßÄÖÜ]. Or you use predefined character classes like the Unicode character property class \p{L} that describes the Unicode characters that are letters.

\p{L} matches anything that is a Unicode letter if you're interested in alphabets beyond the Latin one

Depending on your meaning of "character":
[A-Za-z] - all letters (uppercase and lowercase)
[^0-9] - all non-digit characters

The closest option available is
[\u\l]+
which matches a sequence of uppercase and lowercase letters. However, it is not supported by all editors/languages, so it is probably safer to use
[a-zA-Z]+
as other users suggest

You would use
/[a-z]/gi
[]--checks for any characters between given inputs
a-z---covers the entire alphabet
g-----globally throughout the whole string
i-----getting upper and lowercase

Java:
String s= "abcdef";
if(s.matches("[a-zA-Z]+")){
System.out.println("string only contains letters");
}

In python, I have found the following to work:
[^\W\d_]
This works because we are creating a new character class (the []) which excludes (^) any character from the class \W (everything NOT in [a-zA-Z0-9_]), also excludes any digit (\d) and also excludes the underscore (_).
That is, we have taken the character class [a-zA-Z0-9_] and removed the 0-9 and _ bits. You might ask, wouldn't it just be easier to write [a-zA-Z] then, instead of [^\W\d_]? You would be correct if dealing only with ASCII text, but when dealing with unicode text:
\W
Matches any character which is not a word character. This is the opposite of \w. > If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_].
^ from the python re module documentation
That is, we are taking everything considered to be a word character in unicode, removing everything considered to be a digit character in unicode, and also removing the underscore.
For example, the following code snippet
import re
regex = "[^\W\d_]"
test_string = "A;,./>>?()*)&^*&^%&^#Bsfa1 203974"
re.findall(regex, test_string)
Returns
['A', 'B', 's', 'f', 'a']

Regular expression which few people has written as "/^[a-zA-Z]$/i" is not correct because at the last they have mentioned /i which is for case insensitive and after matching for first time it will return back. Instead of /i just use /g which is for global and you also do not have any need to put ^ $ for starting and ending.
/[a-zA-Z]+/g
[a-z_]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed
a-z a single character in the range between a and z (case sensitive)
A-Z a single character in the range between A and Z (case sensitive)
g modifier: global. All matches (don't return on first match)

/[a-zA-Z]+/
Super simple example. Regular expressions are extremely easy to find online.
http://www.regular-expressions.info/reference.html

For PHP, following will work fine
'/^[a-zA-Z]+$/'

Use character groups
\D
Matches any character except digits 0-9
^\D+$
See example here

Just use \w or [:alpha:]. It is an escape sequences which matches only symbols which might appear in words.

So, I've been reading a lot of the answers, and most of them don't take exceptions into account, like letters with accents or diaeresis (á, à, ä, etc.).
I made a function in typescript that should be pretty much extrapolable to any language that can use RegExp. This is my personal implementation for my use case in TypeScript. What I basically did is add ranges of letters with each kind of symbol that I wanted to add. I also converted the char to upper case before applying the RegExp, which saves me some work.
function isLetter(char: string): boolean {
return char.toUpperCase().match('[A-ZÀ-ÚÄ-Ü]+') !== null;
}
If you want to add another range of letters with another kind of accent, just add it to the regex. Same goes for special symbols.
I implemented this function with TDD and I can confirm this works with, at least, the following cases:
character | isLetter
${'A'} | ${true}
${'e'} | ${true}
${'Á'} | ${true}
${'ü'} | ${true}
${'ù'} | ${true}
${'û'} | ${true}
${'('} | ${false}
${'^'} | ${false}
${"'"} | ${false}
${'`'} | ${false}
${' '} | ${false}

If you mean any letters in any character encoding, then a good approach might be to delete non-letters like spaces \s, digits \d, and other special characters like:
[!##\$%\^&\*\(\)\[\]:;'",\. ...more special chars... ]
Or use negation of above negation to directly describe any letters:
\S \D and [^ ..special chars..]
Pros:
Works with all regex flavors.
Easy to write, sometimes save lots of time.
Cons:
Long, sometimes not perfect, but character encoding can be broken as well.

You can try this regular expression : [^\W\d_] or [a-zA-Z].

Lately I have used this pattern in my forms to check names of people, containing letters, blanks and special characters like accent marks.
pattern="[A-zÀ-ú\s]+"

JavaScript
If you want to return matched letters:
('Example 123').match(/[A-Z]/gi) // Result: ["E", "x", "a", "m", "p", "l", "e"]
If you want to replace matched letters with stars ('*') for example:
('Example 123').replace(/[A-Z]/gi, '*') //Result: "****** 123"*

/^[A-z]+$/.test('asd')
// true
/^[A-z]+$/.test('asd0')
// false
/^[A-z]+$/.test('0asd')
// false

pattern = /[a-zA-Z]/
puts "[a-zA-Z]: #{pattern.match("mine blossom")}" OK
puts "[a-zA-Z]: #{pattern.match("456")}"
puts "[a-zA-Z]: #{pattern.match("")}"
puts "[a-zA-Z]: #{pattern.match("#$%^&*")}"
puts "[a-zA-Z]: #{pattern.match("#$%^&*A")}" OK

Pattern pattern = Pattern.compile("^[a-zA-Z]+$");
if (pattern.matcher("a").find()) {
...do something ......
}

Related

Matching a lower-case character on the position before an uppercase character (camelCase)?

I have a regex in a piece of Typescript code that is used to match strings where there is a space, a dash/underscore or camelcase.
Because this pattern also is used to split the string later, in the case of the camelcase I actually need to match the lowercase character immediately before the camelcase/uppercase character, because I am trying to catch the camelcase. I am trying to reduce a string into two "initials" basically, so if I would input my alias for example "saddexProductions" or "Saddex Productions" etc, the output would be "SP". If there is no indicator that the string consists of two parts, for example "Saddexproductions", the output will be "Sa". If I match the uppercase character in the middle of the string though and split there, that character will be removed and the result with input "saddexProductions" would be "SR".
Here is what I have come up with so far:
const splitRegex: RegExp = /\s|(?<=.)([a-z](?<=[A-Z]{1}))|\-|\_/;
Specifically, it is this part that is relevant:
(?<=.)([a-z](?<=[A-Z]{1}))
All the other scenarios I have described but this one give the desired result. There can be pretty much anything in front and following the camelcase, but it is always the single lowercase character before the uppercase character that needs to be matched, not the uppercase character.
How would I accomplish this? Thanks in advance.
You can use
const splitRegex: RegExp = /[-_\s]|([a-z](?=[A-Z]))/;
Details:
[-_\s] - a character class matching a -, _ or a whitespace
| - or
([a-z](?=[A-Z])) - a capturing group with ID=1 that matches a lowercase ASCII letter followed with an uppercase ASCII letter without adding the latter to the overall match value (as it is inside a positive lookahead that is a non-consuming regex construct).

Regex that only allows empty string, letters, numbers or spaces?

Need help coming up with a regex that only allows numbers, letters, empty string, or spaces.
^[_A-z0-9]*((-|\s)*[_A-z0-9])*$
This one is the closest I've found but it allows underscores and hyphen.
Only letters, numbers, space, or empty string?
Then 1 character class will do.
^[A-Za-z0-9 ]*$
^ : start of the string or line (depending on the flag)
[A-Za-z0-9 ]* : zero or more upper-case or lower-case letters, or digits, or spaces.
$ : end of the string or line (depending on the flag)
The A-z range contains more than just letters.
You can see that in the ASCII table.
And \s for whitespace also includes tabs or linebreaks (depending on the flag).
But if you also want those, then just use that instead of the space.
^[A-Za-z0-9\s]*$
Also, depending on the regex engine/dialect that your language/tool uses, you could use \p{L} for any unicode letter.
Since [A-Za-z] only includes the normal ascii letters.
Reference here
Your regex is too complicated for what you need.
the first part is fine, you are allowing letter and number, you could simply add the space character with it.
Then, if you use the * character, which translate to 0 or any, you could take care of your empty string problem.
See here.
/^[a-z0-9 ]*$/gmi
Notice here that i'm not using A-z like you were because this translate to any character between the A in ascii (101) and the z(172). this mean it will also match char in between (133 to 141 that are not number nor letter). I've instead use a-z which allow lowercase letter and used the flag i which tell the regex to not take care of the case.
Here is a visual explanation of the regex
You can also test more cases in this regex101
Matching only certain characters is equivalent to not matching any other character, so you could use the regex r = /[^a-z\d ]/i to determine if the string contains any character other than the ones permitted. In Ruby that would be implemented as follows.
"aBc d01e e$9" !~ r #=> false
"aBc d01e ex9" !~ r #=> true
In this situation there may not much to choose between this approach and attempting to match /\A[a-z\d ]+\z/i, but in other situations the use of a negative match can simplify the regex considerably.

Why is this regex allowing a caret?

http://regexr.com/3ars8
^(?=.*[0-9])(?=.*[A-z])[0-9A-z-]{17}$
Should match "17 alphanumeric chars, hyphens allowed too, must include at least one letter and at least one number"
It'll correctly match:
ABCDF31U100027743
and correctly decline to match:
AB$DF31U100027743
(and almost any other non-alphanumeric char)
but will apparently allow:
AB^DF31U100027743
Because your character class [A-z] matches this symbol.
[A-z] matches [, \, ], ^, _, `, and the English letters.
Actually, it is a common mistake. You should use [a-zA-Z] instead to only allow English letters.
Here is a visualization from Expresso, showing what the range [A-z] actually covers:
So, this regex (with i option) won't capture your string.
^(?=.*[0-9])(?=.*[a-z])[0-9a-z-]{17}$
In my opinion, it is always safer to use Ignorecase option to avoid such an issue and shorten the regex.
regex uses ASCII printable characters from the space to the tilde range.
Whenever we use [A-z] token it matches the following table highlighted characters. If we use [ -~] token it matches starting from SPACE to tilde.
You're allowing A-z (capital 'A' through lower 'z'). You don't say what regex package you're using, but it's not necessarily clear that A-Z and a-z are contiguous; there could be other characters in between. Try this instead:
^(?=.*[0-9])(?=.*[A-Za-z])[0-9A-Za-z-]{17}$
It seems to meet your criteria for me in regexpal.

Regular Expression related: first character alphabet second onwards alphanumeric+some special characters

I have one question related with regular expression. In my case, I have to make sure that
first letter is alphabet, second onwards it can be any alphanumeric + some special characters.
Regards,
Anto
Try something like this:
^[a-zA-Z][a-zA-Z0-9.,$;]+$
Explanation:
^ Start of line/string.
[a-zA-Z] Character is in a-z or A-Z.
[a-zA-Z0-9.,$;] Alphanumeric or `.` or `,` or `$` or `;`.
+ One or more of the previous token (change to * for zero or more).
$ End of line/string.
The special characters I have chosen are just an example. Add your own special characters as appropriate for your needs. Note that a few characters need escaping inside a character class otherwise they have a special meaning in the regular expression.
I am assuming that by "alphabet" you mean A-Z. Note that in some other countries there are also other characters that are considered letters.
More information
Character Classes
Repetition
Anchors
Try this :
/^[a-zA-Z]/
where
^ -> Starts with
[a-zA-Z] -> characters to match
I think the simplest answer is to pick and match only the first character with regex.
String str = "s12353467457458";
if ((""+str.charAt(0)).matches("^[a-zA-Z]")){
System.out.println("Valid");
}

Regular Expression to test an entire word

i have this expression ([a-zA-Z]|ñ|Ñ)* which i want to use to block all characters but letters and Ñ to be entered on a textbox.
The problem is that return a match for: A9023 but also for 32""". How can i do to return a match for A9023 but not for 32""".
Thanks.
You need to add assertions for the start and the end of the string:
^([a-zA-Z]|ñ|Ñ)*$
Otherwise the regular expression matches at any position. Additionally, you can also write ([a-zA-Z]|ñ|Ñ)* as the character class [a-zA-ZñÑ]*:
^[a-zA-ZñÑ]*$
Sure that you don't mean ^([a-zA-Z]|ñ|Ñ)*$ -- you might be finding the characters you want but not excluding what you don't? The expression I mentioned will pin to the beginning ^ and the end $ of the string, so that nothing else will pass. Otherwise:
123ABC456
...will pass your match, because it found 0-or-more letters... though there were also other letters.
You didn't say which regex flavor (which programming language) you're using, but you might want to consider either
^\p{L}*$
if your regex flavor supports Unicode properties or
^[^\W\d_]*$
if it doesn't.
Reason: Your regex will allow only unaccented letters and Ñ - is there a real language that uses the latter without also having accented letters?
\p{L} means "any letter in any 'language'",
[^\W\d_] means "any character that is neither a non-alphanumeric, a digit or an underscore", which is just a fancy but necessary way to say "any letter" (\w is a shorthand for "letter, digit or underscore", \W is the inverse of that).