what is regular expression means [duplicate]

what is regular expression means [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I want to understand what regular expression means
(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$
Below is what I understand the expression to me. Can you confirm or explain?
(?=^.{8,}$) string should be 8 character or more
((?=.*\d)|(?=.*\W+)) string should consist of one number or one special character. What is the plus after W?
(?![.\n]) Not sure what this part means.
(?=.*[A-Z])(?=.*[a-z]).*$ My understanding is that $ means end of expression, so it's kinda confusing why there's a dot and an asterisk before it. Also why is there only one $? Shouldn't it have two: one $ after .*[A-Z] and one after .*[a-z]? To say that this section is supposed to make sure that user typed one small and one capital letter?
I am using this code in html form for practice and it's working fine.
All together this regular should achieve this and it's doing it
UpperCase, LowerCase, Number/SpecialChar and min 8 Chars
Edit: regex101.com i am also trying to understand on this side as #ymonad said in comment

Regex:
(?=^.{8,}$)((?=.*\d)|(?=.*\W+))(?![.\n])(?=.*[A-Z])(?=.*[a-z]).*$
Explanation:
(?=^.{8,}$) - Positive Lookahead to validate that the test string has atleast 8 characters(except a new-line character) between the start and the end of string
(?=.*\d) - Positive lookahead to validate that the test string contains a digit
| - OR
(?=.*\W+) - Positive lookahead to validate that the input string has atleast 1 or more non-Word characters which DO NOT fall in this range [a-zA-Z0-9_]
(?![.\n]) - a Negative lookahead to validate that the input string does not have a newline character \n or a dot . at the current position
(?=.*[A-Z]) - Positive lookahead to validate that the input string has an Upper case letter [A-Z]
(?=.*[a-z]) - Positive lookahead to validate that the input string has a lower case letter [a-z]
.* - Until this point we were just validating our input string against various rules. Now, using .* we are matching 0+ occurrences of any character except a new line character
$ - asserts the end of the string
Also, as pointed out in the comments, THIS SITE can be a good start.

Related

Regex for a promo code that has the following rules

I need to build a regex that have the following:
Rules to be applied:
exactly 14 characters
only letters (latin characters) and numbers
at least 3 letters
Regex still confuses me so I am struggling to get the correct output. I want to use it with swift and swiftui in an app I am making
(?=(.*[a-zA-Z]){3,}([0-9]){0,}){14,14}$
I tried this. But I know it is not the way

I would use a positive lookahead for the length requirement:
^(?=.{14}$)(?:[A-Za-z0-9]*[A-Za-z]){3}[A-Za-z0-9]*$
This pattern says to match:
^ from the start of the input
(?=.{14}$) assert exact length of 14
(?:
[A-Za-z0-9]*[A-Za-z] zero or more alphanumeric followed by one alpha
)
[A-Za-z0-9]* any alphanumeric zero or more times
$ end of the input

You need to use
^(?=(?:[0-9]*[a-zA-Z]){3})[a-zA-Z0-9]{14}$
Details
^ - start of string
(?=(?:[0-9]*[a-zA-Z]){3}) - at least three repeations of a letter after any zero or more digits sequence required
[a-zA-Z0-9]{14} - fourteen letters/digits
$ - end of string.
See the regex demo.

Validating an obfuscation token

I am building a secured algorithm to get rid of obfuscation attacks. The user is validated with the token which should satisfy following condition:
username in lowercase letters only and username is at least 5 digit long.
username is followed with #.
After # first two characters are important. A digit and a character always. This part contains at least a digit, a lowercase and an upperCase Letter.
In between there could be any number of digits or letters only.
In the last the digit and character should exactly match point-3's digit and character.
It should end with #.
The characters in the middle of two # should be at least 5 characters long.
The complete token consists only of two #, lowercase and uppercase letters and digits. And
I don't know about regular expression but my guide told me this task is easily achieved at validation time by regular expressions. After I looked for long on the internet and found some links which are similar and tried to combine them and got this:
^[a-z]{5,}#[a-zA-Z0-9]{2}[A-Z][0-9A-Za-z]*[a-zA-Z0-9]{2}#$
But this only matches 1 test case. I don't know how I can achieve the middle part of two hashes. I tried to explain my problem as per my english. Please help.
Below test cases should pass
userabcd#4a39A234a#
randomuser#4A39a234A#
abcduser#2Aa39232A#
abcdxyz#1q39A231q#
randzzs#1aB1a#
Below test cases should fail:
randuser#1aaa1a#
randuser#1112#
randuser#a1a1##
randuser#1aa#
u#4a39a234a#
userstre#1qqeqe123231q$
user#1239a23$a#
useabcd#4a39a234a#12

You may try:
^[a-z]{5,}#(?=[^a-z\n]*[a-z])(?=[^A-Z\n]*[A-Z])(\d[a-zA-Z])[a-zA-Z\d]*\1#$
Explanation of the above regex:
^, $ - Represents start and end of the line respectively.
[a-z]{5,} - Matches lower case user names 5 or more times.
# - Matches # literally.
(?=[^a-z]*[a-z]) - Represents a positive look-ahead asserting at least a lowercase letters.
(?=[^A-Z]*[A-Z]) - Represents a positive look-ahead asserting at least an uppercase letters.
(\d[a-zA-Z]) - Represents a capturing group matching first 2 character i.e. a digit and a letter. If you want other way then use [a-zA-Z]\d.
[a-zA-Z\d]* - Matching zero or more of the characters in mentioned character set.
\1 - Represents back-reference exactly matching the captured group.
You can find the demo of the above regex in here.
Note: If you want to match one string at a time i.e. for practical purposes; remove \n from the character sets.
You can use this regex as an alternative.
^[a-z]{5,}#(?=.*?[a-z])(?=.*?[A-Z])(\d[a-zA-Z])[a-zA-Z\d]*\1#$
Recommended reading: Principle of contrast

Regex for excluding strings that start with consecutive leading zeroes or are only alphabets [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I am looking for a regex to select only the strings that are not starting with consecutive zeroes or consecutive alphabets before underscore in below strings.
For ex:
ABC_DE-001 is invalid
abc is invalid (only alphabets)
0_DE-001 is invalid (1 zero before underscore)
000_DE-001 is invalid (sequence of 3 consecutive zeroes)
00_DE-001 is invalid (sequence of 2 consecutive zeroes)
01_DE-001 is valid (0 followed by some other number is valid)
10_DE-001 is valid (starts with 1)
100_DE-001 is valid (starts with 1)
One of the approach I tried was:
(0[1-9]+|[1-9][0-9]+|0[0*$][1-9])_[A-Z0-9]+[-][0-9]{3}
I am not sure though if any scenario is missed with this. Also, how can the same thing be achieved using negative or positive lookaround?

For your examople data, you might match using an optional zero ^0? as that can occur but not more than 1 zero.
^0?[1-9][0-9]*_[A-Z]+-[0-9]{3}$
Regex demo
That will match
^0? An optional zero at the start of the string
[1-9][0-9]* Match a digit 1-9 followed by 0+ digits
_[A-Z]+ Match an _ followed by 1+ times A-Z
-[0-9]{3} Match-` followed by 3 digits
$ Assert the end of the string

You can try with negative look ahead groups:
grep -Pi '^(?![a-z]+(?:_|$|\s)|0+(?:_|$|\s))' test.txt
Explanation:
-Pi - use PCRE and process ignore case. This is grep specific, you can adapt these options to your case. If you cannot make the regex processor to ignore case, just replace [a-z] with [a-zA-Z]. And of course, PCRE support is required.
^ - beginning of the line
(?!rgx) - look forward without moving the cursor to check the line doesn't match the enclosed regular expression rgx.
[a-z]+(?:_|$|\s)|0+(?:_|$|\s) :
don't keep consecutive letters ([a-z]+) followed by an underscore, and end of line or a blank character ((?:_|$|\s))
don't keep consecutive zeroes (0+) followed by an underscore, and end of line or a blank character ((?:_|$|\s))
(?:) stands for a non capturing group (got content is not stored, use it if so to improve performances)
Output got:
01_DE-001 is valid (0 followed by some other number is valid)
10_DE-001 is valid (starts with 1)
100_DE-001 is valid (starts with 1)
Since grep only keeps valid lines (default behavior), non displayed lines were processed as invalid.

Regex quantifier not restricting match [duplicate]

This question already has an answer here:
Restricting character length in a regular expression
(1 answer)
Closed 4 years ago.
I would like to match 1 or more capital letters, [A-Z]+ followed by 0 or more numbers, [0-9]* but the entire string needs to be less than or equal to 8 characters in total.
No matter what regex I come up with the total length seems to be ignored. Here is what I've tried.
^[A-Z]+[0-9]*{1,8}$ //Range ignored, will not work on regex101.com but will on rubular.com/
^([A-Z]+[0-9]*){1,8}$ //Range ignored
^(([A-Z]+[0-9]*){1,8})$ //Range ignored
Is this not possible in regex? Do I just need to do the range check in the language I'm writing in? That's fine but I thought it would be cleaner to keep in all in regex syntax. Thanks

The behaviour is expected. When you write the following pattern:
^([A-Z]+[0-9]*){1,8}$
The {1,8} quantifier is telling the regex to repeat the previous pattern, therefore the capturing group in this case, between one to eight times. Due to the greedyness of your operators, you will match and capture indefinitely.
You need to use a lookahead to obtain the desired behaviour:
^(?=.{1,8}$)[A-Z]+[0-9]*$
^ Assert beginning of string.
(?=.{1,8}$) Ensure that the string that follows is between one and eight characters in length.
[A-Z]+[0-9]*$ Match any upper case letters, one or more, and any digits, zero or more.
$ Asserts position end of string.
See working demo here.

The regex ^([A-Z]+[0-9]*){1,8}$ would match [A-Z]+[0-9]* 1 - 8 times. That would match for example a repetition of 8 times A1A1A1A1A1A1A1A1 but not a repetition of 9 times A1A1A1A1A1A1A1A1A1
You might use a positive lookahead (?=[A-Z0-9]{1,8}$) to assert the length of the string:
^(?=[A-Z0-9]{1,8}$)[A-Z]+[0-9]*$
That would match
^ From the start of the string
(?=[A-Z0-9]{1,8}$) Positive lookahead to assert that what follows matches any of the characters in the character class [A-Z0-9] 1 - 8 times and assert the end of the string.
[A-Z]+[0-9]*$ Match one or more times an uppercase character followed by zero or more times a digit and assert the end of the string. $

Matching any password except one containing repeating characters [duplicate]

Edit: Thanks for the advice to make my question clearer :)
The Match is looking for 3 consecutive characters:
Regex Match =AaA653219
Regex Match = AA5556219
The code is ASP.NET 4.0. Here is the whole function:
public ValidationResult ApplyValidationRules()
{
ValidationResult result = new ValidationResult();
Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
bool valid = regEx.IsMatch(_Password);
if (!valid)
result.Errors.Add("Passwords must be 8-20 characters in length, contain at least one alpha character and one numeric character");
return result;
}
I've tried for over 3 hours to make this work, referencing the below with no luck =/
How can I find repeated characters with a regex in Java?
.net Regex for more than 2 consecutive letters
I have started with this for 8-20 characters a-Z 0-9 :
^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$
As Regex regEx = new Regex(#"^(?=.*\d)(?=.*[a-zA-Z]).{8,20}$");
I've tried adding variations of the below with no luck:
/(.)\1{9,}/
.*([0-9A-Za-z])\\1+.*
((\\w)\\2+)+".
Any help would be much appreciated!

http://regexr.com?34vo9
The regular expression:
^(?=.{8,20}$)(([a-z0-9])\2?(?!\2))+$
The first lookahead ((?=.{8,20}$)) checks the length of your string. The second portion does your double character and validity checking by:
(
([a-z0-9]) Matching a character and storing it in a back reference.
\2? Optionally match one more EXACT COPY of that character.
(?!\2) Make sure the upcoming character is NOT the same character.
)+ Do this ad nauseum.
$ End of string.
Okay. I see you've added some additional requirements. My basic forumla still works, but we have to give you more of a step by step approach. SO:
^...$
Your whole regular expression will be dropped into start and end characters, for obvious reasons.
(?=.{n,m}$)
Length checking. Put this at the beginning of your regular expression with n as your minimum length and m as your maximum length.
(?=(?:[^REQ]*[REQ]){n,m})
Required characters. Place this at the beginning of your regular expression with REQ as your required character to require N to M of your character. YOu may drop the (?: ..){n,m} to require just one of that character.
(?:([VALID])\1?(?!\1))+
The rest of your expression. Replace VALID with your valid Characters. So, your Password Regex is:
^(?=.{8,20}$)(?=[^A-Za-z]*[A-Za-z])(?=[^0-9]*[0-9])(?:([\w\d*?!:;])\1?(?!\1))+$
'Splained:
^
(?=.{8,20}$) 8 to 20 characters
(?=[^A-Za-z]*[A-Za-z]) At least one Alpha
(?=[^0-9]*[0-9]) At least one Numeric
(?:([\w\d*?!:;])\1?(?!\1))+ Valid Characters, not repeated thrice.
$
http://regexr.com?34vol Here's the new one in action.

Tightened up matching criteria as it was too broad; for example, "not A-Za-z" matches a lot more than is intended. The previous REGEX was matching on the string "ThiIsNot". For the most part, passwords are only going to contain alphanumeric and punctation characters, so I limited the scope, which made all matches more accurate. Used character classes for human readability. Added and exclusion list, and differentiated upper and lower case letters.
^(?=.{8,20}$)(?!(?:.*[01IiLlOo]))(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1})(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1})(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1})(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1})(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$
The breakdown:
^(?=.{8,20}$) - Positive lookahead that the string is between 8 and 20 chars
(?!(?:.*[01IiLlOo])) - Negative lookahead for any blacklisted chars
(?=(?:[\[[:digit:]\]\[[:punct:]\]]*[\[[:alpha:]\]]){2}) - Verify that at least 2 alpha chars exist
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:upper:]\]]*[\[[:lower:]\]]){1}) - Verify that at least 1 lowercase alpha exists
(?=(?:[\[[:digit:]\]\[[:punct:]\]\[[:lower:]\]]*[\[[:upper:]\]]){1}) - Verify that at least 1 uppercase alpha exists
(?=(?:[\[[:alpha:]\]\[[:punct:]\]]*[\[[:digit:]\]]){1}) - Verify that at least 1 digit exists
(?=(?:[\[[:alnum:]\]]*[\[[:punct:]\]]){1}) - Verify that at least 1 special/punctuation char exists
(?:([\[[:alnum:]\]\[[:punct:]\]])\1?(?!\1))+$ - Verify that no char is repeated more than twice in a row

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js