What does "(?!$)" inside a regexp mean? [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed last year.
In a section of Sevelte tutorial, there's a piece of code like this:
view = pin ? pin.replace(/\d(?!$)/g, '•') : 'enter your pin';
I know \d means a digit, but can't figure out what (?!$) means.
(And because it's composed of all punctuation, I can't manage to google for an explanation.)
Please help, thanks.

(?!$) Is a negative lookahead stipulation, where (?!) declares the negative lookahead and $ is what that the expression is "looking ahead" for (in this case, an end anchor).
A negative lookahead is an inverse of a positive lookahead, so it will be more intuitive to understand if you know what a positive lookahead is first: A digit followed by a positive lookahead \d(?=$) basically looks for anything that would be matched by \d$ but does not return the part inside the lookahead stipulation when returning a match. \d(?=$) will match any digit that is directly behind the end of the string. A negative lookahead will simply match every digit that is NOT directly behind the end of the string instead, ergo using \d(?!$) and replacing matches with a * basically turns every digit in the string into a * except for the last one.
For the sake of being thorough, you should know that (?<=) is a positive lookbehind that looks for matches in the characters immediately before the given token instead of after, and (?<!) is a negative lookbehind.
Regex101.com and RegExr.com are fantastic resources to use when you are learning regex, because you can insert a regular expression you don't understand and get a piece-by-piece explanation of an expression you don't understand and test strings in real time to experiment with what the expression captures and what it doesn't. Even if the built-in explanations don't make sense, you can still use them in situations like this to find out what something is called so you can search for it.

\d matches all digits
(?!something) means 'Negative Lookahead' for something
$ matches the end of a string
So when \d(?!$) is used, it matches all digits before the last character
In this string:
$$//www12.example#news.com<~>998123000dasas00--987
This will be matched (7 will not because it is the last character):
129981230000098
Referred to this answer
and Regex Cheat Sheet

Related

Having trouble deciphering a sentence tokenizer regex

The following regex is suppose to act as a sentence tokenizer pattern, but I'm having some trouble deciphering what exactly it's doing:
(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<![A-Z]\.)(?<=\.|\?|\!)\s
I understand that it's using positive and negative lookbehinds, as the accepted answer of this post explains (they give the example of a negative lookbehind like this: (?<!B)A). But what is considered A in the above regex?
The regex is checking for breaks between sentences. The negative lookbehinds prevent false matches that represent abbreviations instead of the ends of sentences. They mean:
(?<!\w\.\w.) Don't match anything that looks like A.b., 2.c., or 1.3. (Probably they meant for the second period to also be \. to match only a period, but as written it will match any character at the end, for example A.b! or g.Z4)
(?<![A-Z][a-z]\.) Don't match anything that looks like Cf., Dr., Mr., etc. Note this only checks two characters, so "Mrs." will be matched incorrectly.
(?<![A-Z]\.) Don't match anything that looks like A. or C.
Then if these all pass, it has a positive lookbehind (?<=\.|\?|\!) to check for ., ? or !.
And finally it matches on any whitespace \s.
Demo

Negative Lookahead Faults Regex

I have a regular expression:
^\/admin\/(?!(e06772ed-7575-4cd4-8cc6-e99bb49498c5)).*$
My input string:
/admin/e06772ed-7575-4cd4-8cc6-e99bb49498c5
As I understand, negative lookahead should check if a group (e06772ed-7575-4cd4-8cc6-e99bb49498c5) has a match, or am I incorrect?
Since input string has a group match, why does negative lookahead not work? By that I mean that I expect my regex to e06772ed-7575-4cd4-8cc6-e99bb49498c5 to match input string e06772ed-7575-4cd4-8cc6-e99bb49498c5.
Removing negative lookahead makes this regex work correctly.
Tested with regex101.com
The takeway message of this question is: a lookaround matches a position, not a string.
(?!e06772ed-7575-4cd4-8cc6-e99bb49498c5)
will match any position, that is not followed by e06772ed-7575-4cd4-8cc6-e99bb49498c5.
Which means, that:
^\/admin\/(?!(e06772ed-7575-4cd4-8cc6-e99bb49498c5)).*$
will match:
/admin/abc
and even:
/admin/e99bb49498c5
but not:
/admin/e06772ed-7575-4cd4-8cc6-e99bb49498c5/daffdakjf;adjk;af
This is exactly the explanation why there is a match whenever you get rid of the ?!. The string matches exactly.
Next, you can lose the parentheses inside your lookahead, they do not have their usual function of grouping here.

I could not seem to understand (?=.*?[A-Z]) this expression [duplicate]

This question already has answers here:
Regex lookahead, lookbehind and atomic groups
(5 answers)
Closed 5 years ago.
I'm trying to learn a more advanced regular expressions for a password validator I'm working on because I think using regular expressions would be the best way out. I am using Java as my programming language
So for my pattern people suggested this (?=.*?[A-Z]) as to say "at least one upper case in the string". I have tried searching it at least but nothing seems to make it clear ?=.*? how this part makes sure it at least there.
here is the whole pattern ^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[#?!#$%^&*-]).{8,}$
from what i understand
? means optional and occurs once
= means well i don't know yet
. is a wildcard
[A-Z] is the range of uppercase letters from A-Z
TLDR: So my question is how does this (?=.*?[A-Z]) make it sure atleast one uppercase letter is included? Any in-depth explanation?
(?= is the start of a look-ahead group — the question mark does not mean the same as a ? elsewhere
.*? is a non-greedy match against anything or nothing. The question-mark here also does not mean 'optional'.
[A-Z] is a character set containing the upper case ASCII letters A through to Z.
) is the end of the look-ahead group
So the net result is:
"Look ahead and see if, after maybe some characters, there is an upper case letter."
Your full expression, ^(?=.*?[A-Z])(?=.*?[a-z])(?=.*?[0-9])(?=.*?[#?!#$%^&*-]).{8,}$, can be read as:
"Match if the string contains an upper case letter, and a lower case letter, and a digit, and a non-alphanumeric, and there are at least 8 characters in total."
The regex is using a feature named positive lookahead, this is part of the regex lookarounds:
Positive lookahead: (?=...). Ex: a(?=b) matches a if followed by b
Negative lookahead: (?!...). Ex: a(?!b) matches a if not followed by b
Positive lookbehind: (?<=...). Ex: (?<=a)b matches b if preceded by a
Negative lookbehind: (?<!...). Ex: (?<=a)b matches b if not preceded by a
For your whole regex, you can see easily your pattern with this diagram:
Diagram link
Related to (?=.*?[A-Z]), it is being used after the ^. So, ^(?=.*?[A-Z])$ means match a line that start and end with whatever thing but having a uppercase character at the end

Vim syntax region - lookbehind confusion

Define the following in .vimrc or execute within vim command line:
syn match ndbMethods "[^. \t\n\r]\#<=[_a-z][_a-zA-Z0-9]*(\#="
hi ndbMethods guibg=#222222
View results with a C-style method call in the active buffer:
foo();
You will see the initial character of the method name is not matched.
The intention is for the lookbehind pattern to force a beginning of line, literal . or whitespace to precede any matched method's first character.
Oddly enough, making this a negative lookahead (\#<!) seems to work!
Would someone be kind enough to explain why this lookbehind is incorrect?
Updated: At f, looking behind, you probably want to check for [. \t\n\r], not [^. \t\n\r]. Because currently, you're saying "something that doesn't follow one of these characters", so only upon reaching the o is that condition met, since f is indeed not one of those characters. So you have to either un-negate the character class, or as you discovered, negate the lookbehind.
I think you're getting your terms confused, too.
\#<= positive lookbehind
\#<! negative lookbehind
\#= positive lookahead
\#! negative lookahead

Regular expression to match last number in a string

I need to extract the last number that is inside a string. I'm trying to do this with regex and negative lookaheads, but it's not working. This is the regex that I have:
\d+(?!\d+)
And these are some strings, just to give you an idea, and what the regex should match:
ARRAY[123] matches 123
ARRAY[123].ITEM[4] matches 4
B:1000 matches 1000
B:1000.10 matches 10
And so on. The regex matches the numbers, but all of them. I don't get why the negative lookahead is not working. Any one care to explain?
Your regex \d+(?!\d+) says
match any number if it is not immediately followed by a number.
which is incorrect. A number is last if it is not followed (following it anywhere, not just immediately) by any other number.
When translated to regex we have:
(\d+)(?!.*\d)
Rubular Link
I took it this way: you need to make sure the match is close enough to the end of the string; close enough in the sense that only non-digits may intervene. What I suggest is the following:
/(\d+)\D*\z/
\z at the end means that that is the end of the string.
\D* before that means that an arbitrary number of non-digits can intervene between the match and the end of the string.
(\d+) is the matching part. It is in parenthesis so that you can pick it up, as was pointed out by Cameron.
You can use
.*(?:\D|^)(\d+)
to get the last number; this is because the matcher will gobble up all the characters with .*, then backtrack to the first non-digit character or the start of the string, then match the final group of digits.
Your negative lookahead isn't working because on the string "1 3", for example, the 1 is matched by the \d+, then the space matches the negative lookahead (since it's not a sequence of one or more digits). The 3 is never even looked at.
Note that your example regex doesn't have any groups in it, so I'm not sure how you were extracting the number.
I still had issues with managing the capture groups
(for example, if using Inline Modifiers (?imsxXU)).
This worked for my purposes -
.(?:\D|^)\d(\D)