I could not seem to understand (?=.*?[A-Z]) this expression [duplicate]

I could not seem to understand (?=.*?[A-Z]) this expression [duplicate] - regex

This question already has answers here:
Regex lookahead, lookbehind and atomic groups
(5 answers)
Closed 5 years ago.
I'm trying to learn a more advanced regular expressions for a password validator I'm working on because I think using regular expressions would be the best way out. I am using Java as my programming language
So for my pattern people suggested this (?=.*?[A-Z]) as to say "at least one upper case in the string". I have tried searching it at least but nothing seems to make it clear ?=.*? how this part makes sure it at least there.
here is the whole pattern ^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[#?!#$%^&*-]).{8,}$
from what i understand
? means optional and occurs once
= means well i don't know yet
. is a wildcard
[A-Z] is the range of uppercase letters from A-Z
TLDR: So my question is how does this (?=.*?[A-Z]) make it sure atleast one uppercase letter is included? Any in-depth explanation?

(?= is the start of a look-ahead group — the question mark does not mean the same as a ? elsewhere
.*? is a non-greedy match against anything or nothing. The question-mark here also does not mean 'optional'.
[A-Z] is a character set containing the upper case ASCII letters A through to Z.
) is the end of the look-ahead group
So the net result is:
"Look ahead and see if, after maybe some characters, there is an upper case letter."
Your full expression, ^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[#?!#$%^&*-]).{8,}$, can be read as:
"Match if the string contains an upper case letter, and a lower case letter, and a digit, and a non-alphanumeric, and there are at least 8 characters in total."

The regex is using a feature named positive lookahead, this is part of the regex lookarounds:
Positive lookahead: (?=...). Ex: a(?=b) matches a if followed by b
Negative lookahead: (?!...). Ex: a(?!b) matches a if not followed by b
Positive lookbehind: (?<=...). Ex: (?<=a)b matches b if preceded by a
Negative lookbehind: (?<!...). Ex: (?<=a)b matches b if not preceded by a
For your whole regex, you can see easily your pattern with this diagram:
Diagram link
Related to (?=.*?[A-Z]), it is being used after the ^. So, ^(?=.*?[A-Z])$ means match a line that start and end with whatever thing but having a uppercase character at the end

Related

Using regex to find abbreviations

I am trying to create a regular expression that will identify possible abbreviations within a given string in Python. I am kind of new to RegEx and I am having difficulties creating an expression though I beleive it should be somewhat simple. The expression should pick up words that have two or more capitalised letter. The expression should also be able to pick up words where a dash have been used in-between and report the whole word (both before and after the dash). If numbers are also present they should also be reported with the word.
As such, it should pick up:
ABC, AbC, ABc, A-ABC, a-ABC, ABC-a, ABC123, ABC-123, 123-ABC.
I have already made the following expression: r'\b(?:[a-z]*[A-Z\-][a-z\d[^\]*]*){2,}'.
However this does also pick up these wrong words:
A-bc, a-b-c
I believe the problem is that it looks for either multiple capitalised letters or dashes. I wish for it to only give me words that have atleast two or more capitalised letters. I understand that it will also "mistakenly" take words as "Abc-Abc" but I don't believe there is a way to avoid these.

If a lookahead is supported and you don't want to match double -- you might use:
\b(?=(?:[a-z\d-]*[A-Z]){2})[A-Za-z\d]+(?:-[A-Za-z\d]+)*\b
Explanation
\b A word boundary
(?= Positive lookahead, assert that from the current location to the right is
(?:[a-z\d-]*[A-Z]){2} Match 2 times the optionally the allowed characters and an uppercase char A-Z
) Close the lookahead
[A-Za-z\d]+ match 1+ times the allowed characters without the hyphen
(?:-[A-Za-z\d]+)* Optionally repeat - and 1+ times the allowed characters
\b A word boundary
See a regex101 demo.
To also not not match when there are hyphens surrounding the characters you can use negative lookarounds asserting not a hyphen to the left or right.
\b(?<!-)(?=(?:[a-z\d-]*[A-Z]){2})[A-Za-z\d]+(?:-[A-Za-z\d]+)*\b(?!-)
See another regex demo.

What does "(?!$)" inside a regexp mean? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed last year.
In a section of Sevelte tutorial, there's a piece of code like this:
view = pin ? pin.replace(/\d(?!$)/g, '•') : 'enter your pin';
I know \d means a digit, but can't figure out what (?!$) means.
(And because it's composed of all punctuation, I can't manage to google for an explanation.)
Please help, thanks.

(?!$) Is a negative lookahead stipulation, where (?!) declares the negative lookahead and $ is what that the expression is "looking ahead" for (in this case, an end anchor).
A negative lookahead is an inverse of a positive lookahead, so it will be more intuitive to understand if you know what a positive lookahead is first: A digit followed by a positive lookahead \d(?=$) basically looks for anything that would be matched by \d$ but does not return the part inside the lookahead stipulation when returning a match. \d(?=$) will match any digit that is directly behind the end of the string. A negative lookahead will simply match every digit that is NOT directly behind the end of the string instead, ergo using \d(?!$) and replacing matches with a * basically turns every digit in the string into a * except for the last one.
For the sake of being thorough, you should know that (?<=) is a positive lookbehind that looks for matches in the characters immediately before the given token instead of after, and (?<!) is a negative lookbehind.
Regex101.com and RegExr.com are fantastic resources to use when you are learning regex, because you can insert a regular expression you don't understand and get a piece-by-piece explanation of an expression you don't understand and test strings in real time to experiment with what the expression captures and what it doesn't. Even if the built-in explanations don't make sense, you can still use them in situations like this to find out what something is called so you can search for it.

\d matches all digits
(?!something) means 'Negative Lookahead' for something
$ matches the end of a string
So when \d(?!$) is used, it matches all digits before the last character
In this string:
$$//www12.example#news.com<~>998123000dasas00--987
This will be matched (7 will not because it is the last character):
129981230000098
Referred to this answer
and Regex Cheat Sheet

Regex - Why don't these two expressions produce the same result? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I'm currently using this website to create some regular expressions for a programming language I want to build, at the moment I'm just setting up an expression for identifiers.
In my language, identifiers are expressed like most languages:
They cannot begin with a digit, or special character other than an underscore
After the first character they can contain alphanumeric and underscore characters
Given those rules I've come up with the following expression by myself:
^\D\w+$
Obviously, it doesn't account for special characters, however the following expression does (which I didn't make myself):
^(?!\d)\w+$
Why does the second expression account special characters? Shouldn't they be producing the same results?

I will explain why the second regex works.
The second regex uses a lookahead. After matching the start of the string, the engine checks whether the next character is a digit but it does not match it! This is important because if the next character is not a digit, it tries to use \w to match that same character, which it couldn't if the character is a symbol, if it is a digit, the negative lookahead fails and nothing is matched.
\D on the other hand, will match the character if it is not a digit, and \w will match whatever comes after that. That means all symbols are accepted.

This ^(?!\d)\w+$ means a string consisted of word characters [a-zA-Z0-9_] that doesn't start with a digit.
This ^\D\w+$ means a non-digit character followed by at least one character from [a-zA-Z0-9_] set.
So #ab01 is matched by second regex while first regex rejects it.

(?!\d)\w+ means "match a word which is not prepended with digits". But as you're wrapping it with ^ and $ characters it is basically the same as just ^\w+$ which is obviously not the same as ^\D\w+$. ^(?!\d).+\w+$ (note ".+" in the middle) would behave the same as ^\D\w+$

RegEx more than multiple characters before number

I really don't use RegEx that much. You could say I am RegEx n00b. I have been working on this issue for a half a day.
I am trying to write a pattern that looks backward from a number character. For example:
1. bob1 => bob
2. cat3 => cat
3. Mary34 => Mary
So far I have this (?![A-Z][a-z]{1,})([A-Za-z_])
It only matches for individual characters, I want all the characters before the number character. I tried to add the ^ and $ into my pattern and using an online simulator. I am unsure where to put the ^ and $.
NOTE: I am using RegEx for the .NET Framework

You may use a regex like
[\p{L}_]+(?=\d)
or
[\w-[\d]]+(?=\d)
See the regex demo
Pattern details
[\p{L}_]+ - any 1 or more letters (both lower- and uppercase) and/or _
OR
[\w-[\d]]+ - 1 or more word chars except digits (the -[] inside a character class is a character class subtraction construct)
(?=\d) - a positive lookahead that requires a digit to appear immediately to the right of the current location

If we break down your RegEx, we see:
(?![A-Z][a-z]{1,}) which says "look ahead to find a string that is NOT one uppercase letter followed one or more lowercase letters" and ([A-Za-z_]) which says "match one letter or underscore". This should end up matching any single lowercase letter.
If I understand what you want to achieve, then you want all of the letters before a number. I would write something like that as:
\b([a-zA-Z]+)[0-9]
This will start at a word boundary \b, match one or more letters, and require a digit right after the matched string.
(The syntax I used seems to match this document about .NET RegEx: https://learn.microsoft.com/en-us/dotnet/standard/base-types/regular-expressions)
In light of Wiktor Stribizew's comment, here is a pure match RegEx:
\b[a-zA-Z_]+(?=[0-9])
This matches the pattern and then looks ahead for the digit. This is better than my first lookahead attempt. (Thank you Wiktor.)
http://www.rexegg.com/regex-lookarounds.html

regex included a-zA-Z, digit, and some symbol with length limit

I try to create a regex to match lower and uppercase of A-Z, digits and ##$_ symbols with length limit of 4 to 16 for all of string.
My useless regex:
/^([a-zA-Z])|(\d)|(##\$_){4,16}$/
I test Online regex generators Like http://www.jslab.dk/tools.regex.php but don't have a good result .

Your regex /^([a-zA-Z])|(\d)|(##\$_){4,16}$/ matches for a single letter OR a single digit OR 4 to 16 characters of "##\$_".
The groups around the alternatives are useless.
One solution would be to make a group around the whole alternation
/^([a-zA-Z]|\d|##\$_){4,16}$/
but the better solution would be to add everything to one character class
/^[a-zA-Z##$_\d]{4,16}$/
See it here on Regexr
you can maybe simplify it further, since [a-zA-Z\d_] is the same than \w, when \w is not unicode based!
/^[\w##$]{4,16}$/

\w includes lowercase and UPPERCASE letters, digits and the _ character
RegEx Pattern: ^[\w#\#\$]{4,16}$
Explained demo here: http://regex101.com/r/rK1yH2

The expression that you need is this one:
( ([a-zA-Z])|(\d)|(##\$_) ){4,6}
The problem that you have in yours is that the last {2,6} are affecting only to the last group of brackets, not to the whole expression. Also make sure that the "/^" and "$/" are mandatory for your case, because the "^" means "not", so I'm not sure why you have it there.
You can also see it graphically here: http://www.debuggex.com/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

I could not seem to understand (?=.*?[A-Z]) this expression [duplicate] - regex

Related

Using regex to find abbreviations

What does "(?!$)" inside a regexp mean? [duplicate]

Regex - Why don't these two expressions produce the same result? [duplicate]

RegEx more than multiple characters before number

regex included a-zA-Z, digit, and some symbol with length limit

Categories

Resources