what does this regular expression mean? - regex

^(?!-)[a-z\d\-]{1,100}$

Here's an explanation using regex comment mode, so this expanded form can itself be used as a regex:
(?x) # flag to enable comment mode
^ # start of line/string.
(?!-) # negative lookahead for literal hyphen (-) character, so fails if the next position contains one.
[a-z\d\-] # character class matches a single alpha (a-z), digit (\d) or hyphen (\-).
{1,100} # match the above [class] upto 100 times, at least once.
$ # end of line/string.
In short, it's matching upto 100 lowercase alphanumerics or hyphen, but the first character must not be hyphen.
Could be attempting to validate a serial number, or similar, but it's too general to say for sure.
Not all regex engines support negative lookaheads. If you're trying to figure out what it is doing in order to adapt for an engine without negative lookaheads, you can use:
^[a-z\d][a-z\d-]{0,99}$

(?!-) == negative lookahead
start of line not followed by a - that contains at least 1 to 100 characters that can be a-z or 0-9 or a - followed by the end of the line, though the \d in the character class is probably wrong and should be specified by 0-9 otherwise the a-z takes care of a 'd' character, depends on the regex flavor.

A string of letters, digits and dashes. Between 1 and 100 characters. The first character is not a dash.

Related

Regex - All before an underscore, and all between second underscore and the last period?

How do I get everything before the first underscore, and everything between the last underscore and the period in the file extension?
So far, I have everything before the first underscore, not sure what to do after that.
.+?(?=_)
EXAMPLES:
111111_SMITH, JIM_END TLD 6-01-20 THR LEWISHS.pdf
222222_JONES, MIKE_G URS TO 7.25 2-28-19 SA COOPSHS.pdf
DESIRED RESULTS:
111111_END TLD 6-01-20 THR LEWISHS
222222_G URS TO 7.25 2-28-19 SA COOPSHS
You can match the following regular expression that contains no capture groups.
^[^_]*|(?!.*_).*(?=\.)
Demo
This expression can be broken down as follows.
^ # match the beginning of the string
[^_]* # match zero or more characters other than an underscore
| # or
(?! # begin negative lookahead
.*_ # match zero or more characters followed by an underscore
) # end negative lookahead
.* # match zero or more characters greedily
(?= # begin positive lookahead
\. # match a period
) # end positive lookahead
.*_ means to match zero or more characters greedily, followed by an underscore. To match greedily (the default) means to match as many characters as possible. Here that includes all underscores (if there are any) before the last one. Similarly, .* followed by (?=\.) means to match zero or more characters, possibly including periods, up to the last period.
Had I written .*?_ (incorrectly) it would match zero or more characters lazily, followed by an underscore. That means it would match as few characters as possible before matching an underscore; that is, it would match zero or more characters up to, but not including, the first underscore.
If instead of capturing the two parts of the string of interest you wanted to remove the two parts of the string you don't want (as suggested by the desired results of your example), you could substitute matches of the following regular expression with empty strings.
_.*_|\.[^.]*$
Demo
This regular expression reads, "Match an underscore followed by zero of more characters followed by an underscore, or match a period followed by zero or more characters that are not periods, followed by the end of the string".
You could use 2 capture groups:
^([^_\n]+_).*\b([^\s_]*_.*)(?=\.)
^ Start of string
([^_\n]+_) Capture group 1, match any char except _ or a newline followed by matching a _
.*\b Match the rest of the line and match a word boundary
([^\s_]*_.*) Capture group 2, optionally match any char except _ or a whitespace char, then match _ and the rest of the line
(?=\.) Positive lookahead, assert a . to the right
See a regex demo.
Another option could be using a non greedy version to get to the first _ and make sure that there are no following underscores and then match the last dot:
^([^_\n]+_).*?(\S*_[^_\n]+)\.[^.\n]+$
See another regex demo.
Looks like you're very close. You could eliminate the names between the underscores by finding this
(_.+?_)
and replacing the returned value with a single underscore.
I am assuming that you did not intend your second result to include the name MIKE.

RegEx: How to match a whole string with fixed-length region with negative look ahead conditions that are overriden afterwards?

The strings I parse with a regular expression contain a region of fixed length N where there can either be numbers or dashes. However, if a dash occurs, only dashes are allowed to follow for the rest of the region. After this region, numbers, dashes, and letters are allowed to occur.
Examples (N=5, starting at the beginning):
12345ABC
12345123
1234-1
1234--1
1----1AB
How can I correctly match this? I currently am stuck at something like (?:\d|-(?!\d)){5}[A-Z0-9\-]+ (for N=5), but I cannot make numbers work directly following my region if a dash is present, as the negative look ahead blocks the match.
Update
Strings that should not be matched (N=5)
1-2-3-A
----1AB
--1--1A
You could assert that the first 5 characters are either digits or - and make sure that there is no - before a digit in the first 5 chars.
^(?![\d-]{0,3}-\d)(?=[\d-]{5})[A-Z\d-]+$
^ Start of string
(?![\d-]{0,3}-\d) Make sure that in the first 5 chars there is no - before a digit
(?=[\d-]{5}) Assert at least 5 digits or -
[A-Z\d-]+ Match 1+ times any of the listed characters
$ End of string
Regex demo
If atomic groups are available:
^(?=[\d-]{5})(?>\d+-*|-{5})[A-Z\d_]*$
^ Start of string
(?=[\d-]{5}) Assert at least 5 chars - or digit
(?> Atomic group
\d+-* Match 1+ digits and optional -
| or
-{5} match 5 times -
) Close atomic group
[A-Z\d_]* Match optional chars A-Z digit or _
$ End of string
Regex demo
Use a non-word-boundary assertion \B:
^[-\d](?:-|\B\d){4}[A-Z\d-]*$
A non word-boundary succeeds at a position between two word characters (from \w ie [A-Za-z0-9_]) or two non-word characters (from \W ie [^A-Za-z0-9_]). (and also between a non-word character and the limit of the string)
With it, each \B\d always follows a digit. (and can't follow a dash)
demo
Other way (if lookbehinds are allowed):
^\d*-*(?<=^.{5})[A-Z\d-]*$
demo

Regular expression to Star with number and then numbers or one letter

I have this regular expression
^[0-9]+[a-zA-Z0-9]*
But I need one that always stars with a number then it can be another number or a letter, but it can not be number letter number. The letter will always be the last. Like this example
102A OK
1A OK
2 OK
110 OK
10A1 WRONG
BV WRONG
The letter cannot be between two numbers.
You could use a negative lookahead.
^(?!\d+[a-z]\d)\d.*
with the case-indifferent flag set.
Demo
A match of this regular expression signifies that the string does not contain a 3-character substring consisting of a digit, a letter, a digit, in that order. If the entire string is to be matched when the match is successful, add .* to the end of the regex.
The regex engine performs the following operations.
^ match beginning of line
(?! begin negative lookahead
\d+[a-z]\d match digits-letter-digit
) end negative lookahead
\d match a digit
Note that \d at the end must follow the negative lookahead. If the regex were ^\d(?!.\d+[a-z]\d) and the string were 1A1 the negative lookahead would fail to find digit-letter-digit in A1 and the overall match would succeed (incorrectly).
Because the negative lookahead is pinned to the beginning of the line and consumes no characters, if it fails (match succeeds) the search for \d at the end of the regex begins at the beginning of the line.
You could match a char 1-9 followed by optional digits 0-9 and optional chars a-zA-Z.
If you use [a-zA-Z0-9] the character class will match any of the listed in any order.
If you separate the chars and the digits, the letter can not come before the digits and, as the * quantifier matches 0 or more times, you can also match a single digit.
^[1-9][0-9]*[a-zA-Z]*$
Regex demo

Regexp at least 8 symbols and only one uppercase character

I need a regular expression for a string with has at least 8 symbols and only one uppercase character. Java
For example, it should match:
Asddffgf
asdAsadasd
asdasdaA
But not:
adadAasdasAsad
AsdaAadssadad
asdasdAsadasdA
I tried this: ^[a-z]*[A-Z][a-z]*$ This works good, but I need at least 8 symbols.
Then I tried this: (^[a-z]*[A-Z][a-z]*$){8,} But it doesn't work
^(?=[^A-Z]*[A-Z][^A-Z]*$).{8,}$
https://regex101.com/r/zTrbyX/6
Explanation:
^ - Anchor to the beginning of the string, so that the following lookahead restriction doesn't skip anything.
(?= ) - Positive lookahead; assert that the beginning of the string is followed by the contained pattern.
[^A-Z]*[A-Z][^A-Z]*$ - A sequence of any number of characters that are not capital letters, then a single capital letter, then more non capital letters until the end of the string. This insures that there will be one and only one capital letter throughout the string.
.{8,} - Any non-newline character eight or more times.
$ - Anchor at the end of the string (possibly unnecessary depending on your requirements).
In your first regex ^[a-z]*[A-Z][a-z]*$ you could append a positive lookahead (?=[a-zA-Z]{8,}) right after the ^.
That will assert that what follows matches at least 8 times a lower or uppercase character.
^(?=[a-zA-Z]{8,})[a-z]*[A-Z][a-z]*$

Regex to find repeating numbers

Can anyone help me or direct me to build a regex to validate repeating numbers
eg : 11111111, 2222, 99999999999, etc
It should validate for any length.
\b(\d)\1+\b
Explanation:
\b # match word boundary
(\d) # match digit remember it
\1+ # match one or more instances of the previously matched digit
\b # match word boundary
If 1 should also be a valid match (zero repetitions), use a * instead of the +.
If you also want to allow longer repeats (123123123) use
\b(\d+)\1+\b
If the regex should be applied to the entire string (as opposed to finding "repeat-numbers in a longer string), use start- and end-of-line anchors instead of \b:
^(\d)\1+$
Edit: How to match the exact opposite, i. e. a number where not all digits are the same (except if the entire number is simply a digit):
^(\d)(?!\1+$)\d*$
^ # Start of string
(\d) # Match a digit
(?! # Assert that the following doesn't match:
\1+ # one or more repetitions of the previously matched digit
$ # until the end of the string
) # End of lookahead assertion
\d* # Match zero or more digits
$ # until the end of the string
To match a number of repetitions of a single digit, you can write ([0-9])\1*.
This matches [0-9] into a group, then matches 0 or more repetions (\1) of that group.
You can write \1+ to match one or more repetitions.
Use a backreference:
(\d)\1+
Probably you want to use some sort of anchors ^(\d)\1+$ or \b(\d)\1+\b
I used this expression to give me all phone numbers that are all the same digit.
Basically, it means to give 9 repetitions of the original first repetition of a given number, which results in 10 of the same number in a row.
([0-9])\1{9}
(\d)\1+? matches any digit repeating
you can get repeted text or numbers easily by backreference take a look on following example:
this code simply means whatever the pattern inside [] . ([inside pattern]) the \1 will go finding same as inside pattern forward to that.