Vim syntax region - lookbehind confusion - regex

Define the following in .vimrc or execute within vim command line:
syn match ndbMethods "[^. \t\n\r]\#<=[_a-z][_a-zA-Z0-9]*(\#="
hi ndbMethods guibg=#222222
View results with a C-style method call in the active buffer:
foo();
You will see the initial character of the method name is not matched.
The intention is for the lookbehind pattern to force a beginning of line, literal . or whitespace to precede any matched method's first character.
Oddly enough, making this a negative lookahead (\#<!) seems to work!
Would someone be kind enough to explain why this lookbehind is incorrect?

Updated: At f, looking behind, you probably want to check for [. \t\n\r], not [^. \t\n\r]. Because currently, you're saying "something that doesn't follow one of these characters", so only upon reaching the o is that condition met, since f is indeed not one of those characters. So you have to either un-negate the character class, or as you discovered, negate the lookbehind.
I think you're getting your terms confused, too.
\#<= positive lookbehind
\#<! negative lookbehind
\#= positive lookahead
\#! negative lookahead

Related

What does "(?!$)" inside a regexp mean? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed last year.
In a section of Sevelte tutorial, there's a piece of code like this:
view = pin ? pin.replace(/\d(?!$)/g, '•') : 'enter your pin';
I know \d means a digit, but can't figure out what (?!$) means.
(And because it's composed of all punctuation, I can't manage to google for an explanation.)
Please help, thanks.
(?!$) Is a negative lookahead stipulation, where (?!) declares the negative lookahead and $ is what that the expression is "looking ahead" for (in this case, an end anchor).
A negative lookahead is an inverse of a positive lookahead, so it will be more intuitive to understand if you know what a positive lookahead is first: A digit followed by a positive lookahead \d(?=$) basically looks for anything that would be matched by \d$ but does not return the part inside the lookahead stipulation when returning a match. \d(?=$) will match any digit that is directly behind the end of the string. A negative lookahead will simply match every digit that is NOT directly behind the end of the string instead, ergo using \d(?!$) and replacing matches with a * basically turns every digit in the string into a * except for the last one.
For the sake of being thorough, you should know that (?<=) is a positive lookbehind that looks for matches in the characters immediately before the given token instead of after, and (?<!) is a negative lookbehind.
Regex101.com and RegExr.com are fantastic resources to use when you are learning regex, because you can insert a regular expression you don't understand and get a piece-by-piece explanation of an expression you don't understand and test strings in real time to experiment with what the expression captures and what it doesn't. Even if the built-in explanations don't make sense, you can still use them in situations like this to find out what something is called so you can search for it.
\d matches all digits
(?!something) means 'Negative Lookahead' for something
$ matches the end of a string
So when \d(?!$) is used, it matches all digits before the last character
In this string:
$$//www12.example#news.com<~>998123000dasas00--987
This will be matched (7 will not because it is the last character):
129981230000098
Referred to this answer
and Regex Cheat Sheet

How do I match what's between the quotes excluding these?

I want to match what's between the quotes but excluding these. I tried positive and negative lookahead, which works for the end quote but I cannot exclude the first one. What am I doing wrong?
Here is the example I'm using:
A: $("div"),
B: $("img.some_class"),
B: $("img.some_class.another_class"),
C: $("#some_id"),
D: $(".some_class"),
E: $("input#some_id"),
F: $("div#some_id.some_class.some_other"),
G: $("div.some_class#some_id")
Here is my regex so far:
/(?!").*(?=")/g
Try this:
/\("\K[^"]+/g
\K means that the return value will start here.
For example, it will find: A: $("div but return as match just: div.
Here Is Demo
There are not two, but four different lookaround modifiers, because you need to specify two different aspects:
Are you asserting that something is there (positive) or is not there (negative)?
Are you asserting that it's before the specified pattern (lookbehind) or after it (lookahead)?
The four combinations are generally written like this:
?= for positive lookahead
?! for negative lookahead
?<= for positive lookbehind
?<! for negative lookbehind
You've used a negative lookahead when you wanted a positive lookbehind, so the fixed version of what you wrote would be:
/(?<=").*(?=")/g
Beware the "greediness" of .*, which will match as much of the string as possible; you might want to use .*? to make it "non-greedy", or explicitly say "anything other than a quote mark" ([^"]*).
Another approach is to match the quotes normally, rather than with a lookaround, but "capture" the part between them: /"(.*?)"/. How you get to the "captured group" will vary depending on your programming language / tool, which you haven't specified.
The pattern (?!").*(?=") first asserts what is directly on the right is not a double quote (?!") which succeeds because for the example data that is a $.
Then .* is greedy and will match 0+ times any character except a newline and will match until the end of the string. Then it will backtrack to fulfill the assertion (?=") where directly on the right is a double quote.
If a positive lookbehind is supported, you might change the (?!") to (?<=") and the pattern could look like (?<=\$\(")[^"]+(?="\)) to not match empty double quotes.
Taking the dollar sign and the opening and closing parenthesis into account, you could use a capturing group and a negated character class [^"]+ to match any char except a double quote:
\$\("([^"]+)"\)
Regex demo
Using lookahead and lookbehinds as you asked :
/(?<=").*(?=")/g
Test Here : https://regex101.com/r/kCEuow/2
You might also consider using substrings :
/"([^"]+)"/g
Test the regex : https://regex101.com/r/kCEuow/1

Don't match regex when trailed by character

Current regex: [[\/\!]*?[^\[\]]*?]
The goal it to successfully match [size=16] and [/size] in the following test case but not match [abc].
[size=16]1234[/size]
[abc](htt)
Regex currently matches the 3rd test case; which is specific to always being followed by a parenthesis. So I was thinking about using the logic where if group's next char == "(", do not match
But- I don't really know how to write logic like that in regex...
Look assertions look before or ahead to see if there's a match and then proceed (or not) depending on whether there's a match.
A negative lookahead assertion looks like this:
(?!regex)
Stick it on the end, supplying it the parantheses and you're good to go:
[[\/\!]*?[^\[\]]*?](?!\()
https://regex101.com/r/2jEApI/1
What you want is a "negative lookahead".
A "lookaround" is a group which gets matched, but not included in the result. They start with (? and end with ).
There are two types of lookaround, lookahead and lookbehind:
A "lookbehind" looks backward and is indicated with a < immediately after the ? (i.e. ?<), but that's not what you're here for.
A "lookahead" looks forward and is the default if there is no < after the ?.
Both types can be either positive or negative:
A positive lookaround requires the included group to be present to form a match and is indicated with an =.
A negative lookaround requires that the included group is NOT present to form a match and is indicated with an !.
After you have the basic structure for a positive or negative lookahead or lookbehind the contents in the middle is the normal regular expression syntax, the same as if it were any other group, so in your case you'll need an escaped left parenthesis \(.
Put it all together and you just need to tack this on the end of what you have: (?!\()

Perl Regex "Not" (negative lookahead)

I'm not terribly certain what the correct wording for this type of regex would be, but basically what I'm trying to do is match any string that starts with "/" but is not followed by "bob/", as an example.
So these would match:
/tom/
/tim/
/steve
But these would not
tom
tim
/bob/
I'm sure the answer is terribly simple, but I had a difficult time searching for "regex not" anywhere. I'm sure there is a fancier word for what I want that would pull good results, but I'm not sure what it would be.
Edit: I've changed the title to indicate the correct name for what I was looking for
You can use a negative lookahead (documented under "Extended Patterns" in perlre):
/^\/(?!bob\/)/
TLDR: Negative Lookaheads
If you wanted a negative lookahead just to find "foo" when it isn't followed by "bar"...
$string =~ m/foo(?!bar)/g;
Working Demo Online
Source
To quote the docs...
(?!pattern)
(*nla:pattern)
#(*negative_lookahead:pattern)
A zero-width negative lookahead assertion. For example /foo(?!bar)/ matches any occurrence of "foo" that isn't followed by "bar". Note however that lookahead and lookbehind are NOT the same thing. You cannot use this for lookbehind. (Source: PerlDocs.)
Negative Lookaheads For Your Case
The accepted answer is great, but it leaves no explanation, so let me add one...
/^\/(?!bob\/)/
^ — Match only the start of strings.
\/ — Match the / char, which we need to escape because it is a character in the regex format (i.e. s/find/replacewith/, etc.).
(?!...) — Do not match if the match is followed by ....
bob\/ — This is the ... value, don't match bob/', once more, we need to escape the /`.

How does the regular expression ‘(?<=#)[^#]+(?=#)’ work?

I have the following regex in a C# program, and have difficulties understanding it:
(?<=#)[^#]+(?=#)
I'll break it down to what I think I understood:
(?<=#) a group, matching a hash. what's `?<=`?
[^#]+ one or more non-hashes (used to achieve non-greediness)
(?=#) another group, matching a hash. what's the `?=`?
So the problem I have is the ?<= and ?< part. From reading MSDN, ?<name> is used for naming groups, but in this case the angle bracket is never closed.
I couldn't find ?= in the docs, and searching for it is really difficult, because search engines will mostly ignore those special chars.
They are called lookarounds; they allow you to assert if a pattern matches or not, without actually making the match. There are 4 basic lookarounds:
Positive lookarounds: see if we CAN match the pattern...
(?=pattern) - ... to the right of current position (look ahead)
(?<=pattern) - ... to the left of current position (look behind)
Negative lookarounds - see if we can NOT match the pattern
(?!pattern) - ... to the right
(?<!pattern) - ... to the left
As an easy reminder, for a lookaround:
= is positive, ! is negative
< is look behind, otherwise it's look ahead
References
regular-expressions.info/Lookarounds
But why use lookarounds?
One might argue that lookarounds in the pattern above aren't necessary, and #([^#]+)# will do the job just fine (extracting the string captured by \1 to get the non-#).
Not quite. The difference is that since a lookaround doesn't match the #, it can be "used" again by the next attempt to find a match. Simplistically speaking, lookarounds allow "matches" to overlap.
Consider the following input string:
and #one# and #two# and #three#four#
Now, #([a-z]+)# will give the following matches (as seen on rubular.com):
and #one# and #two# and #three#four#
\___/ \___/ \_____/
Compare this with (?<=#)[a-z]+(?=#), which matches:
and #one# and #two# and #three#four#
\_/ \_/ \___/ \__/
Unfortunately this can't be demonstrated on rubular.com, since it doesn't support lookbehind. However, it does support lookahead, so we can do something similar with #([a-z]+)(?=#), which matches (as seen on rubular.com):
and #one# and #two# and #three#four#
\__/ \__/ \____/\___/
References
regular-expressions.info/Flavor Comparison
As another poster mentioned, these are lookarounds, special constructs for changing what gets matched and when. This says:
(?<=#) match but don't capture, the string `#`
when followed by the next expression
[^#]+ one or more characters that are not `#`, and
(?=#) match but don't capture, the string `#`
when preceded by the last expression
So this will match all the characters in between two #s.
Lookaheads and lookbehinds are very useful in many cases. Consider, for example, the rule "match all bs not followed by an a." Your first attempt might be something like b[^a], but that's not right: this will also match the bu in bus or the bo in boy, but you only wanted the b. And it won't match the b in cab, even though that's not followed by an a, because there are no more characters to match.
To do that correctly, you need a lookahead: b(?!a). This says "match a b but don't match an a afterwards, and don't make that part of the match". Thus it'll match just the b in bolo, which is what you want; likewise it'll match the b in cab.
They're called look-arounds: http://www.regular-expressions.info/lookaround.html