I need to match a string with an identifier.
Pattern
Any word will be considered as identifier if
Word doesn't contain any character rather than alpha-numeric characters.
Word doesn't start with number.
Input
The given input string will not contain any preceding or trailing spaces or white-space characters.
Code
I tried using the following regular expressions
\D[a-zA-Z]\w*\D
[ \t\n][a-zA-Z]\w*[ \t\n]
^\D[a-zA-Z]\w*$
None of them works.
How can I achieve this?
Note I want to match a string that contains multiple identifiers (also can be one). For example This is an i0dentifier 1abs, where i0dentifier, This, is, an are expected results.
Note that in your ^\D[a-zA-Z]\w*$ regex, \D can match non-alphanumeric chars since \D matches any non-digit chars, and \w also matches underscores, which is not an alphanumeric char.
I suggest
\b[A-Za-z]+[0-9][A-Za-z0-9]*\b
It matches
\b - word boundary
[A-Za-z]+ - one or more letters (the identifier should start with a letter)
[0-9] - a digit (required)
[A-Za-z0-9]* - zero or more ASCII letters/digits
\b - word boundary.
See the regex demo.
In Python:
identifiers = re.findall(r'\b[A-Za-z]+[0-9][A-Za-z0-9]*\b', text)
A \D matches any non-digit characters including not only alphabets but also punctuation characters, whitespace characters etc. and you definitely do not need them in the beginning.
You can use ^[A-Za-z][A-Za-z0-9]*$ which can be described as
^: Start of string
[A-Za-z]: An alphabet
[A-Za-z0-9]*: An alphanumeric character, zero or more times
$: End of string
Demo
An even simpler pattern for identifier - not using negative lookahead like Wiktor's answer:
^[^0-9][A-Za-z0-9]*$ decomposed and explained:
^[^0-9]: Word starts ^ not [^ with a number 0-9] (more exactly, first char is not a digit, but second character can be a digit!).
[A-Za-z0-9]*: Word doesn't contain any character rather than alpha-numeric characters (not even hyphen or underscore) until the end $.
See demo on regex101.
Positive alternative
As already suggested by Arvind Kumar Avinash:
If (according to both rules) the first char must not be a digit or numeric, but only an alpha, then we could also exchange the first part from above regex from "not-numeric" to "only-alpha".
[A-Za-z][A-Za-z0-9]* explained:
[A-Za-z]: first char must be an alpha
[A-Za-z0-9]*: optional second and following chars can be any alpha-numeric
Same effect, see demo on regex101.
Tests
input
result
reason
aB123
matches identifier
Ab123
matches identifier
XXXX12YZ
matches identifier
a2b3
matches identifier
a
matches identifier
Z
matches identifier
0
no match
starts with a digit
1Ab
no match
starts with a digit
12abc
no match
starts with a digit
abc_123
no match
contains underscore, not alphanum
r2-d2
no match
contains hyphen, not alphanum
Related
^([a-zA-Z0-9_-]+)$ matches:
BAP-78810
BAP-148080
But does not match:
B8241066 C
Q2111999 A
Q2111999 B
How can I modify regex pattern to match any space and/or special character?
For the example data, you can write the pattern as:
^[a-zA-Z0-9_-]+(?: [A-Z])?$
^ Start of string
[a-zA-Z0-9_-]+ Match 1+ chars listed in the character class
(?: [A-Z])? Optionally match a space and a char A-Z
$ End of string
Regex demo
Or a more exact match:
^[A-Z]+-?\d+(?: [A-Z])?$
^ Start of string
[A-Z]+-? Match 1+ chars A-Z and optional -
\d+(?: [A-Z])? Matchh 1+ digits and optional space and char A-Z
$ End of string
Regex demo
Whenever you want to match something that can either be a space or a special character, you would use the dot symbol .. Your regex pattern would then be modified to:
^([a-zA-Z0-9_-])+.$
This will match the empty space, or any other character. If you want to match the example provided, where strictly one alphabetical, numer character will follow the space, you could include \w such that:
^([a-zA-Z0-9_-])+.\w$
Note that \w is equivalent to [A-Za-z0-9_]
Further, be careful when you use . as it makes your pattern less specific and therefore more likely to false positives.
I suggest using this approach
^[A-Z][A-Z\d -]{6,}$
The first character must be an uppercase letter, followed by at least 6 uppercase letters, digits, spaces or -.
I removed the group because there was only one group and it was the entire regex.
You can also use \w - which includes A-Z,a-z and 0-9, as well as _ (underscore). To make it case-insensitive, without explicitly adding a-z or using \w, you can use a flag - often an i.
I am working on regex with the following conditions:
Must contain from 1 to 63 alphanumeric characters or hyphens.
First character must be a letter.
Cannot end with a hyphen or contain two consecutive hyphens.
I am able to get the regex like:
^[a-zA-Z0-9](?!.*--)[a-zA-Z0-9-]{0,61}[A-Za-z0-9]$
But it fails on the length constraint as well as allows patterns like "a-". How can I meet the conditions?
I would phrase your requirements as:
^(?=.{1,63}$)(?!.*--)[a-zA-Z]([a-zA-Z0-9\-]*[a-zA-Z0-9])?$
Demo
Here is a brief explanation of what each part of the above regex does:
^ from the start of the match
(?=.{1,63}$) assert that the string is between 1 63 characters
(?!.*--) assert that two hyphens do not appear together anywhere
[a-zA-Z] first character is a letter (mandatory in all matches)
([a-zA-Z0-9\-]*[a-zA-Z0-9])?
The final portion says to match a final character which is alphanumeric, but not dash, possibly preceded by alphanumeric characters or dash.
My take on this would be:
^[A-Za-z](?!.*?--)[A-Za-z0-9\-]{0,62}(?<!-)$
Try it out here
Explanation:
^ - Matches the start of the string.
[A-Za-z] - Matches the first letter.
(?!.*?--) - Ensures that there are no two consecutive hyphens in the rest of the string.
[A-Za-z0-9\-]{0,62} - Matches the remaining alphanumeric and hyphen characters.
(?<!-) - Ensures that the string doesn't end with a hyphen.
$ - Matches the end of the string.
I'm attempting to match the last character in a WORD.
A WORD is a sequence of non-whitespace characters
'[^\n\r\t\f ]', or an empty line matching ^$.
The expression I made to do this is:
"[^ \n\t\r\f]\(?:[ \$\n\t\r\f]\)"
The regex matches a non-whitespace character that follows a whitespace character or the end of the line.
But I don't know how to stop it from excluding the following whitespace character from the result and why it doesn't seem to capture a character preceding the end of the line.
Using the string "Hi World!", I would expect: the "i" and "!" to be captured.
Instead I get: "i ".
What steps can I take to solve this problem?
"Word" that is a sequence of non-whitespace characters scenario
Note that a non-capturing group (?:...) in [^ \n\t\r\f](?:[ \$\n\t\r\f]) still matches (consumes) the whitespace char (thus, it becomes a part of the match) and it does not match at the end of the string as the $ symbol is not a string end anchor inside a character class, it is parsed as a literal $ symbol.
You may use
\S(?!\S)
See the regex demo
The \S matches a non-whitespace char that is not followed with a non-whitespace char (due to the (?!\S) negative lookahead).
General "word" case
If a word consists of just letters, digits and underscores, that is, if it is matched with \w+, you may simply use
\w\b
Here, \w matches a "word" char, and the word boundary asserts there is no word char right after.
See another regex demo.
In Word text, if I want to highlight the last a in para. I search for all the words that have [space][para][space] to make sure I only have the word I want, then when it is found it should be highlighted.
Next, I search for the last [a ] space added, in the selection and I will get only the last [a] and I will highlight it or color it differently.
I'm trying to match a string that contains alphanumeric, hyphen, underscore and space.
Hyphen, underscore, space and numbers are optional, but the first and last characters must be letters.
For example, these should all match:
abc
abc def
abc123
ab_cd
ab-cd
I tried this:
^[a-zA-Z0-9-_ ]+$
but it matches with space, underscore or hyphen at the start/end, but it should only allow in between.
Use a simple character class wrapped with letter chars:
^[a-zA-Z]([\w -]*[a-zA-Z])?$
This matches input that starts and ends with a letter, including just a single letter.
There is a bug in your regex: You have the hyphen in the middle of your characters, which makes it a character range. ie [9-_] means "every char between 9 and _ inclusive.
If you want a literal dash in a character class, put it first or last or escape it.
Also, prefer the use of \w "word character", which is all letters and numbers and the underscore in preference to [a-zA-Z0-9_] - it's easier to type and read.
Check this working in fiddle http://refiddle.com/refiddles/56a07cec75622d3ff7c10000
This will fix the issue
^[a-zA-Z]+[a-zA-Z0-9-_ ]*[a-zA-Z0-9]$
I tried using following regex:
/^\w+([\s-_]\w+)*$/
This allows alphanumeric, underscore, space and dash.
More details
As per your requirement of including space, hyphen, underscore and alphanumeric characters you can use \w shorthand character set for [a-zA-Z0-9_]. Escape the hyphen using \- as it usually used for character range inside character set.
To negate the space and hyphen at the beginning and end I have used [^\s\-].
So complete regex becomes [^\s\-][\w \-]+[^\s\-]
Here is the working demo.
You can use this regex:
^[a-zA-Z0-9]+(?:[\w -]*[a-zA-Z0-9]+)*$
RegEx Demo
This will only allow alphanumerics at start and end.
I'm trying to use a regex to match the following:
I want to capture all characters that are followed by a - and then a numeric character.
So for example, if the string was python-proj-5.0 I would want to get python-proj.
I tried [^-0-9]* but it seems that only matches either a - or numeric characters but not a - preceded by numeric characters.
A pattern like this should work:
(.*)-[\d.]+
This will match any sequence of zero or more characters, captured in group 1, followed by a hyphen, then one or more digits or . characters.
Or using a lookahead:
.*(?=-[\d.]+)
This will match any sequence of zero or more characters which is followed by a hyphen, then one or more digits or . characters. The hyphen and the number which follows will not be included in the match.