solving regex with positive lookbehind - regex

Regexp problem. I'd like to have the first four strings below matching. Output should be the 3 characters between _ and . only.
Therefore these will match:
_20101_Bp16tt20_KG2.asc
_201_Bondp0_KGB.ASC
_2011_rndiep16tt20_232.AsC
_20101_odiep16tt20_ab3.ASC
and should return respectively KG2, KGB, 232, ab3.
And these will not match:
_2_ordep16tt.asc
__Bndt20_pippo_K.asc
I am able to select the whole block _KG2.asc, by doing ((?<=_)(...)(\.(?i)(asc))). However, I just want KG2. I think I should apply a positive lookbehind, but my tries all failed. Could you help me?

You could make use of \K and a positive lookahead:
_\K[A-Za-z0-9]{3}(?=\.(?i)asc$)
Regex demo
That would match
_ Match literally
\K Forget previous match
[A-Za-z0-9]{3} Match 3 times an upper/lower case character or a digit (Replace with a dot if you want to match any character)
(?=\.(?i)asc$) Positive lookahead to assert that what follows is a dot and asc in lower or uppercase and assert the end of the string

Use a lookahead as well
((?<=_)(...)(?=\.(?i)(asc)))
See https://regexr.com/40jfa

May be this expression is helping you..
'_201_Bondp0_KGB.ASC'.match(/(?<=_)(...)(?=\.)/g)

Related

Javascript Negative Lookaround

I want to match ___ except within {}:
https://regex101.com/r/PYRWIA/1
I don't understand why it match though with
/___\s*\n*(?!})/
FIrst of all there is no need to use \n in your regex since \s matches line break also.
Second issue is with use of * (0 or more occurrences) in your regex since \s* will let negative lookahead condition being met right after last dash since next character is a line break not }.
You can use any of these 2 patterns:
___(?!\s*})
___\s+(?!})
Updated RegEx Demo

RegEx: Excluding a pattern from the match

I know some basics of the RegEx but not a pro in it. And I am learning it. Currently, I am using the following very very simple regex to match any digit in the given sentence.
/d
Now, I want that, all the digits except some patterns like e074663 OR e123444 OR e7736 should be excluded from the match. So for the following input,
Edit 398e997979 the Expression 9798729889 & T900980980098ext to see e081815 matches. Roll over matches or e081815 the expression e081815 for details.e081815 PCRE & JavaScript flavors of RegEx are e081815 supported. Validate your expression with Tests mode e081815.
Only bold digits should be matched and not any e081815. I tried the following without the success.
(^[e\d])(\d)
Also, going forward, some more patterns needs to be added for exclusion. For e.g. cg636553 OR cg(any digits). Any help in this regards will be much appreciated. Thanks!
Try this:
(?<!\be)(?<!\d)\d+
Test it live on regex101.com.
Explanation:
(?<!\be) # make sure we're not right after a word boundary and "e"
(?<!\d) # make sure we're not right after a digit
\d+ # match one or more digits
If you want to match individual digits, you can achieve that using the \G anchor that matches at the position after a successful match:
(?:(?<!\be)(?<=\D)|\G)\d
Test it here
Another option is to use a capturing group with lookarounds
(?:\b(?!e|cg)|(?<=\d)\D)[A-Za-z]?(\d+)
(?: Non capture group
\b(?!e|cg) Word boundary, assert what is directly to the right is not e or cg
| Or
(?<=\d)\D Match any char except a digit, asserting what is directly on the left is a digit
) Close group
[A-Za-z]? Match an optional char a-zA-Z
(\d+) Capture 1 or more digits in group 1
Regex demo

Regex pattern matching for contains a character

I'm looking for a regex pattern which can do this exactly.
Should match the length which is 12 characters alphaNumeric
Should also check for the occurrence of hyphen - twice in the word
No spaces are allowed.
I have tried the following regex:
^([a-zA-Z0-9]*-[a-zA-Z0-9]*){2}$
Some sample cases
-1234abcd-ab
abcd12-avc-a
-abcd-abcdacb
ac12-acdsde-
The regex should match for all the above.
And should be wrong for the below
-abcd-abcd--a
abcd-abcdefg
I've been using this regex ^([a-zA-Z0-9]*-[a-zA-Z0-9]*){2}$ for matching the above patterns, but the problem is, it doesn't have a length check of 12. I'm not sure how to add that into the above pattern. Help would be appreciated.
Use this:
(?=^.{12}$)(?=^[^-]*-[^-]*-[^-]*$)[a-zA-Z0-9-]+ /gm
The first positive lookahead asserts the total length to be 12.
The second positive lookahead asserts the presence of exactly two hyphens.
Rest is just matching the possible characters in the character set.
Demo

How do I match what's between the quotes excluding these?

I want to match what's between the quotes but excluding these. I tried positive and negative lookahead, which works for the end quote but I cannot exclude the first one. What am I doing wrong?
Here is the example I'm using:
A: $("div"),
B: $("img.some_class"),
B: $("img.some_class.another_class"),
C: $("#some_id"),
D: $(".some_class"),
E: $("input#some_id"),
F: $("div#some_id.some_class.some_other"),
G: $("div.some_class#some_id")
Here is my regex so far:
/(?!").*(?=")/g
Try this:
/\("\K[^"]+/g
\K means that the return value will start here.
For example, it will find: A: $("div but return as match just: div.
Here Is Demo
There are not two, but four different lookaround modifiers, because you need to specify two different aspects:
Are you asserting that something is there (positive) or is not there (negative)?
Are you asserting that it's before the specified pattern (lookbehind) or after it (lookahead)?
The four combinations are generally written like this:
?= for positive lookahead
?! for negative lookahead
?<= for positive lookbehind
?<! for negative lookbehind
You've used a negative lookahead when you wanted a positive lookbehind, so the fixed version of what you wrote would be:
/(?<=").*(?=")/g
Beware the "greediness" of .*, which will match as much of the string as possible; you might want to use .*? to make it "non-greedy", or explicitly say "anything other than a quote mark" ([^"]*).
Another approach is to match the quotes normally, rather than with a lookaround, but "capture" the part between them: /"(.*?)"/. How you get to the "captured group" will vary depending on your programming language / tool, which you haven't specified.
The pattern (?!").*(?=") first asserts what is directly on the right is not a double quote (?!") which succeeds because for the example data that is a $.
Then .* is greedy and will match 0+ times any character except a newline and will match until the end of the string. Then it will backtrack to fulfill the assertion (?=") where directly on the right is a double quote.
If a positive lookbehind is supported, you might change the (?!") to (?<=") and the pattern could look like (?<=\$\(")[^"]+(?="\)) to not match empty double quotes.
Taking the dollar sign and the opening and closing parenthesis into account, you could use a capturing group and a negated character class [^"]+ to match any char except a double quote:
\$\("([^"]+)"\)
Regex demo
Using lookahead and lookbehinds as you asked :
/(?<=").*(?=")/g
Test Here : https://regex101.com/r/kCEuow/2
You might also consider using substrings :
/"([^"]+)"/g
Test the regex : https://regex101.com/r/kCEuow/1

Regex invalid on match

I need regex to return invalid on a match. Specifically, the match is a string that starts with an A or an M and is followed by four numbers ie, A1223. The four numbers could be any random sequence.
I'm sure lookarounds are the way to handle this but I haven't grasped regex as a concept just yet. Thus far I've discovered how to capture the matched strings separate from other strings with the following.
([\s\S]*?)(A[\d][\d][\d][\d]|M[\d][\d][\d][\d])
Appreciate the help.
Regex doesn't really have match negation, but you can (ab)use a negative lookahead assertion to do inverted matching:
^((?!\s[AM]\d{4}).){6}
to match all strings not starting with A or M followed by 4 digits:
with negative lookahead:
^(?![AM]\d{4}).*
with consuming pattern using () capture groups:
[AM]\d{4}.*|(.+)