Shortest match in regex from end

Shortest match in regex from end - regex

Given an input string fooxxxxxxfooxxxboo I am trying to write a regex that matches fooxxxboo i.e. starting from the second foo till the last boo.
I tried the following
foo.*?boo matches the complete string fooxxxxxxfooxxxboo
foo.*boo also matches the complete string fooxxxxxxfooxxxboo
I read this Greedy vs. Reluctant vs. Possessive Quantifiers and I understand their difference, but I am trying to match the shortest string from the end which matches the regex i.e. something like the regex to be evaluated from back.
Is there any way I can match only the last portion?

Use negative lookahead assertion.
foo(?:(?!foo).)*?boo
DEMO
(?:(?!foo).)*? - Non-greedy match of any character but not of foo zero or more times. That is, before matching each character, it would check that the character is not the letter f followed by two o's. If yes, then only the corresponding character will be matched.
Why the regex foo.*?boo matches the complete string fooxxxxxxfooxxxboo?
Because the first foo in your regex matches both the foo strings and the following .*? will do a non-greedy match upto the string boo, so we got two matches fooxxxxxxfooxxxboo and fooxxxboo. Because the second match present within the first match, regex engine displays only the first.

.*(foo.*?boo)
Try this. Grab the capture i.e $1 or \1.
See demo.
https://regex101.com/r/nL5yL3/9

Related

How do greedy / lazy (non-greedy) / possessive quantifiers work internally?

I noticed that there are 3 different classes of quantifiers: greedy, lazy (i.e. non-greedy) and possessive.
I know that, loosely speaking, greedy quantifiers try to get the longest match by first reading in the entire input string and then truncate the characters one by one if the attempts keep failing; lazy quantifiers try to get the shortest match by first reading in the empty string and then add in the characters one by one if the attempts keep failing; possessive quantifiers try the same way as greedy quantifiers while they will stop matching if the first attempt fails.
However, I'm not sure how exactly the aboves are being implemented 'internally', and would like to ask for clarification (hopefully with examples).
For example, say we have the input string as "fooaaafoooobbbfoo".
If the regex is "foo.*" (greedy), will the foo in the regex first match the foo in the input string, and then .* reads in aaafoooobbbfoo as 'the entire string'? Or will .* first read in fooaaafoooobbbfoo as 'the entire string', and then truncates fooaaafoooobbbfoo to try matching the foo in the regex? If it is the latter, will fooaaafoooobbbfoo be truncated from its left or from its right in each attempt?
Will the answers to the above questions change if I replace "foo.*" with ".*foo" or "foo.*foo" as my regex? What about if I change those greedy quantifiers to lazy ones and possessive ones?
And if there are more than one quantifiers in a regex, how will the engine deal with the priority (if that matters)?
Thanks in advance!

For your input string fooaaafoooobbbfoo.
Case 1: When you're using this regex:
foo.*
First remember this fact that engine traverses from left to right.
With that in mind above regex will match first foo which is at the start of input and then .* will greedily match longest possible match which is rest of the text after foo till end. At this point matching stops as there is nothing to match after .* in your pattern.
Case 2: When you're using this regex:
.*foo
Here again .* will greedily match longest possible match before matching last foo which is right the end of input.
Case 3: When you're using this regex:
foo.*foo
Which will match first foo found in input i.e. foo at the start then .* will greedily match longest possible match before matching last foo which is right the end of input.
Case 4: When you're using this regex with lazy quantifier:
foo.*?foo
Which will match first foo found in input i.e. foo at the start then .*? will lazily match shortest possible match before matching next foo which is second instance of foo starting at position 6 in input.
Case 5: When you're using this regex with possessive quantifier:
foo.*+foo
Which will match first foo found in input i.e. foo at the start then .*+ is using possessive quantifier which means match as many times as possible, without giving back. This will match greedily longest possible match till end and since possessive quantifier doesn't allow engine to backtrack hence presence of foo at the end of part will cause failure as engine will fail to match last foo.

Regex match last occurrence of substring among the same substrings in the string

For example we have a string:
asd/asd/asd/asd/1#s_
I need to match this part: /asd/1#s_ or asd/1#s_
How is it possible to do with plain regex?
I've tried negative lookahead like this
But it didn't work
\/(?:.(?!\/))?(asd)(\/(([\W\d\w]){1,})|)$
it matches this '/asd/asd/asd/asd/asd/asd/1#s_'
from this 'prefix/asd/asd/asd/asd/asd/asd/1#s_'
and I need to match '/asd/1#s_' without all preceding /asd/'s
Match should work with plain regex
Without any helper functions of any programming language
https://regexr.com/
I use this site to check if regex matches or not
here's the possible strings:
prefix/asd/asd/asd/1#s
prefix/asd/asd/asd/1s#
prefix/asd/asd/asd/s1#
prefix/asd/asd/asd/s#1
prefix/asd/asd/asd/#1s
prefix/asd/asd/asd/#s1
and asd part could be replaced with any word like
prefix/a1sd/a1sd/a1sd/1#s
prefix/a1sd/a1sd/a1sd/1s#
...
So I need to match last repeating part with everything to the right
And everything to the right could be character, not character, digit, in any order
A more complicated string example:
prefix/a1sd/a1sd/a1sd/1s#/ds/dsse/a1sd/22$$#!/123/321/asd
this should match that part:
/a1sd/22$$#!/123/321/asd

If you want the match only, you can use \K to reset the match buffer right before the parts that you want to match:
^.*\K/a\d?sd/\S+
The pattern will match
^ Start of string
.* Match any char except a newline until end of the line
\K Forget what is matched until now
/a\d?sd/ match a, optional digits and sd between forward slashes
\S+ Match 1+ non whitespace chars
See a regex demo

How to match the closest pattern on a capture group excluding overlap? [duplicate]

Given an input string fooxxxxxxfooxxxboo I am trying to write a regex that matches fooxxxboo i.e. starting from the second foo till the last boo.
I tried the following
foo.*?boo matches the complete string fooxxxxxxfooxxxboo
foo.*boo also matches the complete string fooxxxxxxfooxxxboo
I read this Greedy vs. Reluctant vs. Possessive Quantifiers and I understand their difference, but I am trying to match the shortest string from the end which matches the regex i.e. something like the regex to be evaluated from back.
Is there any way I can match only the last portion?

Use negative lookahead assertion.
foo(?:(?!foo).)*?boo
DEMO
(?:(?!foo).)*? - Non-greedy match of any character but not of foo zero or more times. That is, before matching each character, it would check that the character is not the letter f followed by two o's. If yes, then only the corresponding character will be matched.
Why the regex foo.*?boo matches the complete string fooxxxxxxfooxxxboo?
Because the first foo in your regex matches both the foo strings and the following .*? will do a non-greedy match upto the string boo, so we got two matches fooxxxxxxfooxxxboo and fooxxxboo. Because the second match present within the first match, regex engine displays only the first.

.*(foo.*?boo)
Try this. Grab the capture i.e $1 or \1.
See demo.
https://regex101.com/r/nL5yL3/9

RegEx negative lookahead on pattern

I want to find all expressions that don't end with ":"
I tried to do it like that:
[a-z]{2,}(?!:)
On this text:
foobar foobaz:
foobaz
foobaz:
The problem is, that it just takes away the last character befor the ":" and not the whole match.
Here is the example: https://regex101.com/r/jtLRvz/1
How can I get the negative lookahead work for the whole regular expression?

When [a-z]{2,}(?!:) matches baz:, [a-z]{2,} grabs 2 or more lowercase ASCII letters at once (baz) and the negative lookahead (?!:) checks the char immediately to the right. It is :, so the engine asks itself if there is a way to match the string in a different way. Since {2,} can match two chars, not currently matched three, it backtracks, and finds a valid match.
Add a-z to the lookahead pattern to make sure the char right after 2 or more lowercase ASCII letters is not a letter and not a colon:
[a-z]{2,}(?![a-z:])
^^^
See the regex demo
If your regex engine supports possessive modifiers, or atomic groups, you may use them to prevent backtracking into the [a-z]{2,} subpattern:
[a-z]{2,}+(?!:)
(?>[a-z]{2,})(?!:)
See another regex demo.

RegEx - String To Help Match

I read somewhere that it is possible to have a RegEx in which strings preceding and following are not to be matched, but instead help with ambiguities.
For example, I would like a RegEx that matches only "TESTING" from the second line ("defTESTINGghi") and nothing from line one and line two.
abcTESTINGdef
defTESTINGghi
ghiTESTINGjkl

If supported you can use the \K escape sequence. \K resets the starting point of the reported match and any previously consumed characters are no longer included. The Positive Lookahead asserts that the preceded is followed by ghi.
def\KTESTING(?=ghi)
Live Demo
Or depending on what your definition of the preceded and following not being matched are, why not simply use a capturing group to capture only the desired subpattern?
def(TESTING)ghi
Live Demo

You could try the below regexes to match the string TESTING only on the second line,
Through positive lookahead and lookbehind,
(?<=def)TESTING(?=ghi)
Matches the string TESTING only if it's present just after to the def and must be follwed by ghi.
Through positive lookahead,
TESTING(?=ghi)
Matches the string TESTING only if it's followed by ghi.
Through negative lookahead,
TESTING(?!def|jkl)
Matches the string TESTING if it's not followed by def or jkl.
Reference

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js