Confusion in JavaScript RegExp ?= Quantifier [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
What the difference between
(?=.\d)(?=.[a-z])(?=.[A-Z])
and
(.\d)(.[a-z])(.[A-Z])
When I test the string a2A only the first RegExp returns true. Can anyone explain this for me?

The difference is in the lookahead operator for each of the terms in the regex. The LA operator matches the sub-regex it guards as usual, but effectively locks the initial matching position for the subsequent regex portion.
This means that the first regex should not match (contrary to your tests, which engine have you used ?) - Given any initial matching position, the second character would have to be a number, a lowercase letter, and an uppercase letter, all at the same time.
Observe that this will not happen if the . ('any char') is quantified:
(?=.*\d)(?=.*[a-z])(?=.*[A-Z])
Each LA term may skip an arbitrary amount of material before matching the character class, and this amount may differ between the subexpressions.
The second alternative (with and without quantification) will never match as it invariably requires a subsequence of digit-letter-letter, which the test string a2A does not provide.

Related

RegEx to find count of special characters in String [duplicate]

This question already has answers here:
How to get the count of only special character in a string using Regex?
(6 answers)
Closed 2 years ago.
I need to form the RegEx to produce the output only if more than two occurrences of special characters exists in the given string.
1) abcd##qwer - Match
2) abcd#dsfsdg#fffj-Match
3) abcd#qwetg- No Match
4) acwexyz - No Math
5) abcd#ds#$%fsdg#fffj-Match
Can anyone help me on this?
Note: I need to use this regular expression in one of the existing tool not in any programming language.
UPDATE after OP edit
The edited OP introduces a small amount of additional complexity that necessitates a different pattern entirely. The keys here are that (a) there is now a significantly limited set of "special characters" and (b) that these characters must appear at least twice (c) in any position in the string.
To implement this, you would use something like:
(?:.*?[##$%].*?){2,}
Asserts a non-capturing group,
Which contains any number of characters, followed by
Any character in the set ##$%
Followed by any number of characters
Ensures this pattern happens twice in a given string.
Original answer
By "special characters", I assume you mean anything outside standard alphanumeric characters. You can use the pattern below in most flavors of Regex:
([^A-Za-z0-9])\1
This (a) creates a set of all characters not including alphanumeric characters and matches a character against it, then (b) checks to see if the same character appears adjacent.
Regex101

negation classes regex [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
i wrote this regex for tokenize a text: "\b\w+\b"
but someone suggets me to convert it into \b[^\W\d_]+\b
can anyone explaing to me why this second way (using negation) is better?
thanks
The first one matches all letters, numbers and the underscore. Depending on the regex engine, this may include unicode letters and numbers. (the word boundaries are superfluous in this case btw.)
The second regex matches only letters (excluding non-word-charcters, digits and the underscore). Due to the word boundary, it will only match them, if they are surrounded by non-word-characters or start/end of th string.
If your regex engine supports this, you might want to use [[:alpha:]] or \p{L} (or [A-Za-z] in case of non-unicode) instead to make your intent clearer.

How to build a regular expression which prohibits hyphens from appearing at the start and end of a string? [duplicate]

This question already has answers here:
RegEx for allowing alphanumeric at the starting and hyphen thereafter
(4 answers)
Closed 5 years ago.
I want to build a regular expression which only matches [A-Za-z0-9\-] with an additional rule that hyphens (-) are not allowed to appear at the start and at the end.
For example:
my-site is matched.
m is matched.
mysite- is not matched.
-mysite is not matched.
Currently, I've come up with ^[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9]+$.
But this doesn't match m.
How can I change my regular expression so that it fits my needs?
Use look arounds:
^(?!-)[A-Za-z0-9-]*(?<!-)$
The reason this works is that look arounds don't consume input, so the look ahead and the look behind can both assert on the same character.
Note that you don't need to escape the dash within the character class if it's the first or last character.

The use of ".*" in regex for password validation [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I came across this regex used for password validation:
(?=.*[a-z])(?=.*[A-Z])(?=.*[\d])(?=.*[^a-zA-Z\d])(?=\S+$).{8,}
There are only two things that are unclear to me about this regex:
what are .* used for and why this regex doesn't work without them?
what is the difference/benefit or using [\d] instead of \d, because the regex works just fine in both cases
.* matches any sequence of characters; . matches any character (other than newline, which is not relevant here) and * matches zero or more of the preceding pattern. This is used in the lookaheads to search for matches anywhere in the password. If you didn't have it,then it would require that you have those types of characters in a specific order: a lowercase letter followed by an uppercase letter followed by a digit. With .*, it means the password must contain at least one of each of them, but they can be anywhere in the password.
There's no difference between \d and [\d]. Whoever write this might just use the brackets out of habit, or perhaps to make it easier to modify it to put other characters into the character class.

My regular expression matches too much. How can I tell it to match the smallest possible pattern? [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I have this RegEx:
('.+')
It has to match character literals like in C. For example, if I have 'a' b 'a' it should match the a's and the ''s around them.
However, it also matches the b also (it should not), probably because it is, strictly speaking, also between ''s.
Here is a screenshot of how it goes wrong (I use this for syntax highlighting):
I'm fairly new to regular expressions. How can I tell the regex not to match this?
It is being greedy and matching the first apostrophe and the last one and everything in between.
This should match anything that isn't an apostrophe.
('[^']+')
Another alternative is to try non-greedy matches.
('.+?')
Have you tried a non-greedy version, e.g. ('.+?')?
There are usually two modes of matching (or two sets of quantifiers), maximal (greedy) and minimal (non-greedy). The first will result in the longest possible match, the latter in the shortest. You can read about it (although in perl context) in the Perl Cookbook (Section 6.15).
Try:
('[^']+')
The ^ means include every character except the ones in the square brackets. This way, it won't match 'a' b 'a' because there's a ' in between, so instead it'll give both instances of 'a'
You need to escape the qutoes:
\'[^\']+\'
Edit: Hmm, we'll I suppose this answer depends on what lang/system you're using.