Regex for name type - regex

I am working on regex with the following conditions:
Must contain from 1 to 63 alphanumeric characters or hyphens.
First character must be a letter.
Cannot end with a hyphen or contain two consecutive hyphens.
I am able to get the regex like:
^[a-zA-Z0-9](?!.*--)[a-zA-Z0-9-]{0,61}[A-Za-z0-9]$
But it fails on the length constraint as well as allows patterns like "a-". How can I meet the conditions?

I would phrase your requirements as:
^(?=.{1,63}$)(?!.*--)[a-zA-Z]([a-zA-Z0-9\-]*[a-zA-Z0-9])?$
Demo
Here is a brief explanation of what each part of the above regex does:
^ from the start of the match
(?=.{1,63}$) assert that the string is between 1 63 characters
(?!.*--) assert that two hyphens do not appear together anywhere
[a-zA-Z] first character is a letter (mandatory in all matches)
([a-zA-Z0-9\-]*[a-zA-Z0-9])?
The final portion says to match a final character which is alphanumeric, but not dash, possibly preceded by alphanumeric characters or dash.

My take on this would be:
^[A-Za-z](?!.*?--)[A-Za-z0-9\-]{0,62}(?<!-)$
Try it out here
Explanation:
^ - Matches the start of the string.
[A-Za-z] - Matches the first letter.
(?!.*?--) - Ensures that there are no two consecutive hyphens in the rest of the string.
[A-Za-z0-9\-]{0,62} - Matches the remaining alphanumeric and hyphen characters.
(?<!-) - Ensures that the string doesn't end with a hyphen.
$ - Matches the end of the string.

Related

Regular expression to find words starting with no number

I need to match a string with an identifier.
Pattern
Any word will be considered as identifier if
Word doesn't contain any character rather than alpha-numeric characters.
Word doesn't start with number.
Input
The given input string will not contain any preceding or trailing spaces or white-space characters.
Code
I tried using the following regular expressions
\D[a-zA-Z]\w*\D
[ \t\n][a-zA-Z]\w*[ \t\n]
^\D[a-zA-Z]\w*$
None of them works.
How can I achieve this?
Note I want to match a string that contains multiple identifiers (also can be one). For example This is an i0dentifier 1abs, where i0dentifier, This, is, an are expected results.
Note that in your ^\D[a-zA-Z]\w*$ regex, \D can match non-alphanumeric chars since \D matches any non-digit chars, and \w also matches underscores, which is not an alphanumeric char.
I suggest
\b[A-Za-z]+[0-9][A-Za-z0-9]*\b
It matches
\b - word boundary
[A-Za-z]+ - one or more letters (the identifier should start with a letter)
[0-9] - a digit (required)
[A-Za-z0-9]* - zero or more ASCII letters/digits
\b - word boundary.
See the regex demo.
In Python:
identifiers = re.findall(r'\b[A-Za-z]+[0-9][A-Za-z0-9]*\b', text)
A \D matches any non-digit characters including not only alphabets but also punctuation characters, whitespace characters etc. and you definitely do not need them in the beginning.
You can use ^[A-Za-z][A-Za-z0-9]*$ which can be described as
^: Start of string
[A-Za-z]: An alphabet
[A-Za-z0-9]*: An alphanumeric character, zero or more times
$: End of string
Demo
An even simpler pattern for identifier - not using negative lookahead like Wiktor's answer:
^[^0-9][A-Za-z0-9]*$ decomposed and explained:
^[^0-9]: Word starts ^ not [^ with a number 0-9] (more exactly, first char is not a digit, but second character can be a digit!).
[A-Za-z0-9]*: Word doesn't contain any character rather than alpha-numeric characters (not even hyphen or underscore) until the end $.
See demo on regex101.
Positive alternative
As already suggested by Arvind Kumar Avinash:
If (according to both rules) the first char must not be a digit or numeric, but only an alpha, then we could also exchange the first part from above regex from "not-numeric" to "only-alpha".
[A-Za-z][A-Za-z0-9]* explained:
[A-Za-z]: first char must be an alpha
[A-Za-z0-9]*: optional second and following chars can be any alpha-numeric
Same effect, see demo on regex101.
Tests
input
result
reason
aB123
matches identifier
Ab123
matches identifier
XXXX12YZ
matches identifier
a2b3
matches identifier
a
matches identifier
Z
matches identifier
0
no match
starts with a digit
1Ab
no match
starts with a digit
12abc
no match
starts with a digit
abc_123
no match
contains underscore, not alphanum
r2-d2
no match
contains hyphen, not alphanum

RegEx: How to match a whole string with fixed-length region with negative look ahead conditions that are overriden afterwards?

The strings I parse with a regular expression contain a region of fixed length N where there can either be numbers or dashes. However, if a dash occurs, only dashes are allowed to follow for the rest of the region. After this region, numbers, dashes, and letters are allowed to occur.
Examples (N=5, starting at the beginning):
12345ABC
12345123
1234-1
1234--1
1----1AB
How can I correctly match this? I currently am stuck at something like (?:\d|-(?!\d)){5}[A-Z0-9\-]+ (for N=5), but I cannot make numbers work directly following my region if a dash is present, as the negative look ahead blocks the match.
Update
Strings that should not be matched (N=5)
1-2-3-A
----1AB
--1--1A
You could assert that the first 5 characters are either digits or - and make sure that there is no - before a digit in the first 5 chars.
^(?![\d-]{0,3}-\d)(?=[\d-]{5})[A-Z\d-]+$
^ Start of string
(?![\d-]{0,3}-\d) Make sure that in the first 5 chars there is no - before a digit
(?=[\d-]{5}) Assert at least 5 digits or -
[A-Z\d-]+ Match 1+ times any of the listed characters
$ End of string
Regex demo
If atomic groups are available:
^(?=[\d-]{5})(?>\d+-*|-{5})[A-Z\d_]*$
^ Start of string
(?=[\d-]{5}) Assert at least 5 chars - or digit
(?> Atomic group
\d+-* Match 1+ digits and optional -
| or
-{5} match 5 times -
) Close atomic group
[A-Z\d_]* Match optional chars A-Z digit or _
$ End of string
Regex demo
Use a non-word-boundary assertion \B:
^[-\d](?:-|\B\d){4}[A-Z\d-]*$
A non word-boundary succeeds at a position between two word characters (from \w ie [A-Za-z0-9_]) or two non-word characters (from \W ie [^A-Za-z0-9_]). (and also between a non-word character and the limit of the string)
With it, each \B\d always follows a digit. (and can't follow a dash)
demo
Other way (if lookbehinds are allowed):
^\d*-*(?<=^.{5})[A-Z\d-]*$
demo

Regex match string 3-6 characters long, at least one letter, no duplicate "-"

I have to match a string that is 3-6 characters long, contains at least one letter, but can have letters, numbers and only 1 "-".
The "-" must not be at the start or at the beginning.
Match:
string
str-ng
st-ng
s1-1g
st-1g
Do not match:
strings
-string
string-
st--ng
s-tn-g
1111
st
The closest I've gotten is this:
^((?!-.*-)[0-9A-Z]{3,6})$
But this divides the regex match with - So it matches s-tri but not st-ri because there aren't 3 chars at each end
Maybe you can use:
^(?=.*[a-z])(?!-|.*-$|.*-.*-)[a-z\d-]{3,6}$
See the online demo
^ - Start string anchor.
(?=.*[a-z]) - Positive lookahead to make sure there is at least one letter.
(?!-|.*-$|.*-.*-) - Negative lookahead to prevent a hyphen at the beginning or at the end or multiple.
[a-z\d-]{3,6} - Three to six times a character from the give class.
$ - End string anchor.
Note that I used the case-insensitive flag.
You can use
^(?=.{3,6}$)(?=[^a-zA-Z]*[A-Za-z])[0-9a-zA-Z]+(?:-[0-9a-zA-Z]+)?$
See the regex demo. Details:
^ - start of string
(?=.{3,6}$) - string must contain three to six chars other than line break chars
(?=[^a-zA-Z]*[A-Za-z]) - there must be at least one ASCII letter in the string
[0-9a-zA-Z]+ - one or more alphanumeric ASCII chars
(?:-[0-9a-zA-Z]+)? - an optional sequence of - and then one or more alphanumeric ASCII chars
$ - end of string.
Looking at the pattern that you tried, you meant to exclude the match when there are 2 hyphens present using the negative lookahead.
Also this part [0-9A-Z]{3,6} does not match a hyphen.
Reading
The "-" must not be at the start or at the beginning.
You might do that using
^(?![^\n-]*-[^\n-]*-)(?=[^a-zA-Z\n]*[a-zA-Z])[a-zA-Z0-9][a-zA-Z0-9-]{2,5}$
Regex demo
If you meant also no - at the end:
^(?![^\n-]*-[^\n-]*-)(?=[^a-zA-Z\n]*[a-zA-Z])[a-zA-Z0-9][a-zA-Z0-9-]{1,4}[a-zA-Z0-9]$
Explanation
^ Start of string
(?![^\n-]*-[^\n-]*-) Assert not 2 times -
(?=[^a-zA-Z\n]*[a-zA-Z]) Assert a char a-zA-Z
[a-zA-Z0-9] Match One of the listed without -
[a-zA-Z0-9-]{1,4} Repeat 1-4 times any of the listed including -
[a-zA-Z0-9] Match One of the listed without -
$ End of string
Regex demo

Regexp at least 8 symbols and only one uppercase character

I need a regular expression for a string with has at least 8 symbols and only one uppercase character. Java
For example, it should match:
Asddffgf
asdAsadasd
asdasdaA
But not:
adadAasdasAsad
AsdaAadssadad
asdasdAsadasdA
I tried this: ^[a-z]*[A-Z][a-z]*$ This works good, but I need at least 8 symbols.
Then I tried this: (^[a-z]*[A-Z][a-z]*$){8,} But it doesn't work
^(?=[^A-Z]*[A-Z][^A-Z]*$).{8,}$
https://regex101.com/r/zTrbyX/6
Explanation:
^ - Anchor to the beginning of the string, so that the following lookahead restriction doesn't skip anything.
(?= ) - Positive lookahead; assert that the beginning of the string is followed by the contained pattern.
[^A-Z]*[A-Z][^A-Z]*$ - A sequence of any number of characters that are not capital letters, then a single capital letter, then more non capital letters until the end of the string. This insures that there will be one and only one capital letter throughout the string.
.{8,} - Any non-newline character eight or more times.
$ - Anchor at the end of the string (possibly unnecessary depending on your requirements).
In your first regex ^[a-z]*[A-Z][a-z]*$ you could append a positive lookahead (?=[a-zA-Z]{8,}) right after the ^.
That will assert that what follows matches at least 8 times a lower or uppercase character.
^(?=[a-zA-Z]{8,})[a-z]*[A-Z][a-z]*$

Character set not matching a pattern

I'm trying to use a regex to match the following:
I want to capture all characters that are followed by a - and then a numeric character.
So for example, if the string was python-proj-5.0 I would want to get python-proj.
I tried [^-0-9]* but it seems that only matches either a - or numeric characters but not a - preceded by numeric characters.
A pattern like this should work:
(.*)-[\d.]+
This will match any sequence of zero or more characters, captured in group 1, followed by a hyphen, then one or more digits or . characters.
Or using a lookahead:
.*(?=-[\d.]+)
This will match any sequence of zero or more characters which is followed by a hyphen, then one or more digits or . characters. The hyphen and the number which follows will not be included in the match.