regex pattern - what is ((?=.*\d)|(?=.*\W+)) and (?![.\n]) - regex

can someone kindly explain this regex pattern to me?
under
(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$
what exactly is
((?=.*\d)|(?=.*\W+))
&
(?![.\n])
thank you

These are all lookahead assertions (positive and negative) that are making sure the following text respects some rules without actually capturing the text.
# assert that
(?=^.{8,}$) # there are at least 8 characters
( # and
(?=.*\d) # there is at least a digit
| # or
(?=.*\W+) # there is one or more "non word" characters (\W is equivalent to [^a-zA-Z0-9_])
) # and
(?![.\n]) # there is no . or newline and
(?=.*[A-Z]) # there is at least an upper case letter and
(?=.*[a-z]).*$ # there is at least a lower case letter
.*$ # in a string of any characters
(?! ... ) is the syntax for a negative lookahead (match if there is no ...), (?= ... ) is for a positive lookahead (match if there is ...). This looks a lot like password validation!

String matches to end of line
At least 8 characters in length
At least one digit or non-word character exists (not a-zA-Z0-9_)
No new line found (ie. string is one line long)
At least one uppercase letter exists
At least one lowercase letter exists
This seems to be a RegEx for validating a password.

Related

Modify this yup validation to change max length to 9 if the string does not include a dash

I'm trying to write a yup validator that validates a field's max length, depending on whether a dash is included in the string. If a dash is included, the max length is 10, if there is no dash, the max length should be 9.
For example:
'string-111' should have a max length of 10.
'string111' should have a max length of 9.
My current code looks like:
import * as Yup from 'yup';
export default Yup.object().shape({
description: Yup.string()
.matches(
/^[a-zA-Z0-9-]*$/,
'Invoice # can only contain letters, numbers and dashes'
)
.max(10, 'Invoice # has a max length of 10 characters'),
});
I see the yup documentation https://github.com/jquense/yup has a .when() method, but it seems to be used in very specific cases in their examples. Here, the user can place the dash anywhere in the string.
Any ideas on how to rewrite this validator, so that when there is no dash in the string, the maxlength should be 9?
You could match either match 10 chars where a hyphen can occur at any place using a positive lookahad, or match 9 chars consisting only of a-z0-9.
^(?:(?=[a-z0-9-]{10}$)[a-z0-9]*-[a-z0-9]*|[a-z0-9]{9})$
Explanation
^ Start of string
(?: Non capture group
(?= Positive lookahead, assert what is on the right is
[a-z0-9-]{10}$ Match 10 times either a-z0-9 or - till the end of the string
) Close lookahead
[a-z0-9]*-[a-z0-9]* Match a hyphen between chars a-z0-9
| Or
[a-z0-9]{9} Match 9 chars a-z0-9
) Close group
$ End of string
Regex demo
I worked up a solution I liked but found it had already been posted by #Thefourthbird, so I tried a different tack and came up with this:
/^(?=(?:-*[^-]-*){9}$)(?=(?:[^-]*-[^-]*){0,1}$).*/gm
You can see that this regex contains two positive lookaheads, both beginning at the start of a line. The first ensures that the string contains 9 non-hyphens; the second requires that there be at most one hyphen.
demo
The demo provides a detailed and thorough explanation of how this regex works, but we can also make it self-documenting by writing it in free-spacing mode:
/
^ # match beginning of string
(?= # begin a positive lookahead
(?:-*[^-]-*){9} # match 9 strings, each with one char that is
# not a hyphen, possibly preceded and/or
# followed by hyphens
$ # match the end of a line
) # end positive lookahead
(?= # begin a positive lookahead
(?:[^-]*-[^-]*){0,1} # match 0 or 1 strings, each containing one hyphen,
# possibly preceded and/or followed by non-hyphens
$ # match the end of the string
) # end positive lookahead
.* # match 0+ characters (the entire string)
/gmx # global, multiline and free-spacing regex
# definition modes
If desired, [^-] could replaced with [a-zA-Z0-9], \p{Alnum} or something else, depending on requirements.

Regular Expression to match strings

I want to match all the strings satifying following rules-
should consist of lower-case letters and digits and dashes
should start with a letter or a number
should end with a letter or number
total string length should be atleast 3 and atmost 20 characters
dot . is optional, there shouldn't be two or more consecutive dots .
dash - is optional, there shouldn't be two or more consecutive dashes -
dot . and dash - shouldn't be consecutive // the string aaa.-aaabbb is invalid
underscore not allowed
I have come up with this regex:
^[a-z0-9]([a-z0-9]+\.?\-?[a-z0-9]+){1,18}[a-z0-9]$
[a-z0-9] //should start/end with a letter or a number
([a-z0-9]+\.?\-?[a-z0-9]+){1,18} //other rules
However it is failing in some scenarios like -
abcdefghijklmnopqrstuvwxyz //should fail total number of chars greater than 20
aaa.-aaabbb //should fail as dot '.' and dash '-' are consecutive
Can anyone please help me in correcting this regex?
You can achieve this with a lookahead assertion:
^(?!.*[.-]{2})[a-z0-9][a-z0-9.-]{1,18}[a-z0-9]$
Explanation:
^ # Start of string
(?! # Assert that the following can't be matched:
.* # Any number of characters
[.-]{2} # followed by .. or -- or .- or -.
) # End of lookahead
[a-z0-9] # Match lowercase letter/digit
[a-z0-9.-]{1,18} # Match 1-18 of the allowed characters
[a-z0-9] # Match lowercase letter/digit
$ # End of string
I came up with this which uses a negative lookahead similar to Tim's solution but a different way of appying it. Because it only does the look ahead when it sees a dot or a dash it may not need to do quite so much back tracking which may make it perform very slightly faster.
^[a-z0-9]([a-z0-9]|([-.](?![.-]))){1,18}[a-z0-9]$
Explanation:
^ # Start of string
[a-z0-9] # Must start with a letter or number
( # Begin Group
[a-z0-9] # Match a letter or number
| # OR
([-.](?![.-])) # Match a dot or dash that is not followed by a dot or dash
){1,18} # Match group 1 to 18 times
[a-z0-9] # Must end with a letter or number
$ # End of string

Utf8 correct regex for CamelCase (WikiWord) in perl

Here was a question about the CamelCase regex. With the combination of tchrist post i'm wondering what is the correct utf-8 CamelCase.
Starting with (brian d foy's) regex:
/
\b # start at word boundary
[A-Z] # start with upper
[a-zA-Z]* # followed by any alpha
(?: # non-capturing grouping for alternation precedence
[a-z][a-zA-Z]*[A-Z] # next bit is lower, any zero or more, ending with upper
| # or
[A-Z][a-zA-Z]*[a-z] # next bit is upper, any zero or more, ending with lower
)
[a-zA-Z]* # anything that's left
\b # end at word
/x
and modifying to:
/
\b # start at word boundary
\p{Uppercase_Letter} # start with upper
\p{Alphabetic}* # followed by any alpha
(?: # non-capturing grouping for alternation precedence
\p{Lowercase_Letter}[a-zA-Z]*\p{Uppercase_Letter} ### next bit is lower, any zero or more, ending with upper
| # or
\p{Uppercase_Letter}[a-zA-Z]*\p{Lowercase_Letter} ### next bit is upper, any zero or more, ending with lower
)
\p{Alphabetic}* # anything that's left
\b # end at word
/x
Have a problem with lines marked '###'.
In addition, how to modify the regex when assuming than numbers and the underscore are equivalent to lowercase letters, so W2X3 is an valid CamelCase word.
Updated: (ysth comment)
for the next,
any: mean "uppercase or lowercase or number or underscore"
The regex should match CamelWord, CaW
start with uppercase letter
optional any
lowercase letter or number or underscore
optional any
upper case letter
optional any
Please, do not mark as duplicate, because it is not. The original question (and answers too) thought only ascii.
I really can’t tell what you’re trying to do, but this should be closer to what your original intent seems to have been. I still can’t tell what you mean to do with it, though.
m{
\b
\p{Upper} # start with uppercase code point (NOT LETTER)
\w* # optional ident chars
# note that upper and lower are not related to letters
(?: \p{Lower} \w* \p{Upper}
| \p{Upper} \w* \p{Lower}
)
\w*
\b
}x
Never use [a-z]. And in fact, don’t use \p{Lowercase_Letter} or \p{Ll}, since those are not the same as the more desirable and more correct \p{Lowercase} and \p{Lower}.
And remember that \w is really just an alias for
[\p{Alphabetic}\p{Mark}\p{Decimal_Number}\p{Letter_Number}\p{Connector_Punctuation}]

Confused on why this regular expression does not work?

/^(?=.*\d)(?=.*[!#&.$#]).{7,16}$/
It should allow between 7 and 16 characters and contain at least 1 numeric character and 1 special character and can't start with a number. I tried testing it but it does not work?
The only thing that I assume "does not work", which is a bit of a vague problem description to be honest, is the fact that it CAN start with a digit. Besides that, it works as you described.
Fix it like this:
/^(?=.*\d)(?=.*[!#&.$#])\D.{6,15}$/
A short explanation (in case you did not write the regex yourself):
^ # match the beginning of the input
(?= # start positive look ahead
.* # match any character except line breaks and repeat it zero or more times
\d # match a digit: [0-9]
) # end positive look ahead
(?= # start positive look ahead
.* # match any character except line breaks and repeat it zero or more times
[!#&.$#] # match any character from the set {'!', '#', '$', '&', '.', '#'}
) # end positive look ahead
\D # match a non-digit: [^0-9]
.{6,15} # match any character except line breaks and repeat it between 6 and 15 times
$ # match the end of the input
The first two conditions are fulfilled but the third (must not start with a digit) is not. Because .* in ^(?=.*\d) does match when there is a digit at the first position.
Try this instead:
/^(?=\D+\d)(?=.*[!#&.$#]).{7,16}$/
Here \D (anything except a digit) ensures that that is at least one non-digit character at the start.

what does this regular expression mean?

^(?!-)[a-z\d\-]{1,100}$
Here's an explanation using regex comment mode, so this expanded form can itself be used as a regex:
(?x) # flag to enable comment mode
^ # start of line/string.
(?!-) # negative lookahead for literal hyphen (-) character, so fails if the next position contains one.
[a-z\d\-] # character class matches a single alpha (a-z), digit (\d) or hyphen (\-).
{1,100} # match the above [class] upto 100 times, at least once.
$ # end of line/string.
In short, it's matching upto 100 lowercase alphanumerics or hyphen, but the first character must not be hyphen.
Could be attempting to validate a serial number, or similar, but it's too general to say for sure.
Not all regex engines support negative lookaheads. If you're trying to figure out what it is doing in order to adapt for an engine without negative lookaheads, you can use:
^[a-z\d][a-z\d-]{0,99}$
(?!-) == negative lookahead
start of line not followed by a - that contains at least 1 to 100 characters that can be a-z or 0-9 or a - followed by the end of the line, though the \d in the character class is probably wrong and should be specified by 0-9 otherwise the a-z takes care of a 'd' character, depends on the regex flavor.
A string of letters, digits and dashes. Between 1 and 100 characters. The first character is not a dash.