RegEx lookahead on .* - regex

I have a pattern that needs to find the last occurrence of string1 unless string2 is found anywhere in the subject, then it needs the first occurrence of string1. In order to solve this I wrote this inefficient negative lookahead.
/(.(?!.*?string2))*string1/
It takes several seconds to run (prohibitively long on subjects lacking any occurrence of either string). Is there a more efficient way to accomplish this?

You should be able to use the following:
/string1(?!.*?string2)/
This will match string1 as long as string2 is not found later in the string, which I think meets your requirements.
Edit: After seeing your update, try the following:
/.*?string1(?=.*?string2)|.*string1/

You could also do if/else statements in your regex !
(?(?=.*string2).*(string1).*$|^.*?(string1))
Explanation:
(? # If
(?=.*string2) # Lookahead, if there is string2
.*(string1).*$ # Then match the last string1
| # Else
^.*?(string1) # Match the first string1
)
If string1 is found, you'll find it in group 1.

Ok now, i have understand what you want, a bit long but optimized to be fast:
nutria\d. -> string1
RABBIT -> string2
The pattern (example in PHP):
$pattern = <<<LOD
~(?J) # allow multiple capture groups with the same name
### capture the first nutria if RABBIT isn't found before ###
^ (?>[^Rn]++|R++(?!ABBIT)|n++(?!utria\d.))* (?<res>nutria\d.)
### try to capture the last nutria without RABBIT until the end ###
(?>
(?>
(?> [^Rn]++ | R++(?!ABBIT) | n++(?!utria\d.) )*
(?<res>nutria\d.)
)* # repeat as possible to catch the last nutria
(?> [^R]++ | R++(?!ABBIT) )* $ # the end without RABBIT
)? # /!\important/!\ this part is optional, then only the first captured
# nutria is in the result when RABBIT is found in this part
| # OR
### capture the first nutria when RABBIT is found before
^(?> [^n]++ | n++(?!utria\d.) )* (?<res>nutria\d.)
~x
LOD;
$subjects = array( 'groundhog nutria1A beaver nutria1B',
'polecat nutria2A badger RABBIT nutria2B',
'weasel RABBIT nutria3A nutria3B nutria3C',
'vole nutria4A marten nutria4B marmot nutria4C RABBIT');
foreach($subjects as $subject) {
if (preg_match($pattern, $subject, $match))
echo '<br/>'.$match['res'];
}
The pattern is designed to fail as fast as possible using atomic groups and possessive quantifiers with alternations and thus avoids catastrophic backtracking using the least possible lookaheads (only when a n or an R is found, and it fails quickly)

Try this regex:
string1(?!.*?string1)|string1(?=.*?string2)
Live Demo: http://www.rubular.com/r/uAjOqaTkYH
Edit live on Debuggex

Try using the possessive operator .*+, it uses less memory (it doesn't store the entire backtrace of matching cases). It may also run faster because of this.

Related

Remove the text outside the first brackets in R

I know that it was asked a lot of times, but I've tried to adapt the other answers to my need and I was not able to make it work using SKIP and FAIL (I'm a bit confused, I've to admit)
I'm using R actually.
The url I need to clean is:
url <- "posts.fields(id,from.fields(id,name),message,comments.summary(true).limit(0),likes.summary(true).limit(0))"
and I need to retain only the content inside the first brackets that are always prefixed by the word "fields" (while "posts" may vary). In other words something like
id,from.fields(id,name),message,comments.summary(true).limit(0),likes.summary(true).limit(0)
As you may see there're some nesting inside. But I eventually could change my source code to accept this string too (removing every parhentesis by every prefix)
id,from,message,comments,likes
I don't know on how to remove the trailing parhentesis which balances the first.
If it's good enough to just remove everything up to and including the first open parenthesis and also remove the last close parenthesis and thereafter then:
sub("^.*?\\((.*)\\)[^)]*$", "\\1", url)
Note:
If it's good enough to just remove the first open parenthesis and last close parenthesis then try this:
sub("\\((.*)\\)", "\\1", url)
Using lazy .* instead of greedy:
sub(".*?fields\\((.*)\\)", "\\1", url)
[1] "id,from.fields(id,name),message,comments.summary(true).limit(0),likes.summary(true).limit(0)"
You need to use a recursive pattern:
sub("[^.]*+(?:\\.(?!fields\\()[^.]*)*+\\.fields\\(([^()]*+(?:\\((?1)\\)[^()]*)*+)\\)(?s:.*)", "\\1", url, perl=T)
demo
details:
# reach the dot before "fields("
[^.]*+ # all except a dot (possessive)
(?: # open a non-capturing group
\\. # a literal dot
(?!fields\\() # not followed by "fields("
[^.]* # all except a dot
)*+ # repeat the group zero or more times
\\.fields\\(
# match a content between parenthesis with any level of nesting
( # open the capture group 1
[^()]*+ # 0 or more character that are not brackets (possessive)
(?: # open a non capturing group
\\(
(?1) # recursion in group 1
\\) #
[^()]* # all that is not a bracket
)*+ # close the non capturing group and repeat 0 or more time (possessive)
) # close the capture group 1
\\)
(?s:.*) # end of the string
Possessive quantifiers are used here to limit the backtracking when for any reason a part of the pattern fails.

Match double hyphens in comments of malformed XML

I'm to parse XML files that do not conform to the "no double hyphens in comments" -standard, which makes MSXML complain. I am looking for a way of deleting offending hyphens.
I am using StringRegExpReplace(). I attempted following regular expressions:
<!--(.*)--> : correctly gets comments
<!--(-*)--> : fails to be a correct regex (also tried escaping and using \x2D)
Given the right pattern, I would call:
StringRegExpReplace($xml_string,$correct_pattern,"") ;replace with nothing
How to match remaining extra hyphens within an XML comment, while leaving the remaining text alone?
You can use this pattern:
(?|\G(?!\A)(?|-{2,}+([^->][^-]*)|(-[^-]+)|-+(?=-->)|-->[^<]*(*SKIP)(*FAIL))|[^<]*<+(?>[^<]+<+)*?(?:!--\K|[^<]*\z\K(*ACCEPT))(?|-*+([^->][^-]*)|-+(?=-->)|-?+([^-]+)|-->[^<]*(*SKIP)(*FAIL)()))
details:
(?|
\G(?!\A) # contiguous to the precedent match (inside a comment)
(?|
-{2,}+([^->][^-]*) # duplicate hyphens, not part of the closing sequence
|
(-[^-]+) # preserve isolated hyphens
|
-+ (?=-->) # hyphens before closing sequence, break contiguity
|
-->[^<]* # closing sequence, go to next <
(*SKIP)(*FAIL) # break contiguity
)
|
[^<]*<+ # reach the next < (outside comment)
(?> [^<]+ <+ )*? # next < until !-- or the end of the string
(?: !-- \K | [^<]*\z\K (*ACCEPT) ) # new comment or end of the string
(?|
-*+ ([^->][^-]*) # possible hyphens not followed by >
|
-+ (?=-->) # hyphens before closing sequence, break contiguity
|
-?+ ([^-]+) # one hyphen followed by >
|
-->[^<]* # closing sequence, go to next <
(*SKIP)(*FAIL) () # break contiguity (note: "()" avoids a mysterious bug
) # in regex101, you can remove it)
)
With this replacement: \1
online demo
The \G feature ensures that matches are consecutive.
Two ways are used to break the contiguity:
a lookahead (?=-->)
the backtracking control verbs (*SKIP)(*FAIL) that forces the pattern to fail and all characters matched before to not be retried.
So when contiguity is broken or at the begining the first main branch will fail (cause of the \G anchor) and the second branch will be used.
\K removes all on the left from the match result.
(*ACCEPT) makes the pattern succeed unconditionnaly.
This pattern uses massively the branch reset feature (?|...(..)...|...(..)...|...), so all capturing groups have the same number (in other words there is only one group, the group 1.)
Note: even this pattern is long, it needs few steps to obtain a match. The impact of non-greedy quantifiers is reduced as much as possible, and each alternatives are sorted and as efficient as possible. One of the goals is to reduce the total number of matches needed to treat a string.
(?<!<!)--+(?!-?>)(?=(?:(?!-->).)*-->)
matches -- (or ---- etc.) only between <!-- and -->. You need to set the /s parameter to allow the dot to match newlines.
Explanation:
(?<!<!) # Assert that we're not right at the start of a comment
--+ # Match two or more dashes --
(?= # only if the following can be matched further onwards:
(?!-?>) # First, make sure we're not at the end of the comment.
(?: # Then match the following group
(?!-->) # which must not contain -->
. # but may contain any character
)* # any number of times
--> # as long as --> follows.
) # End of lookahead assertion.
Test it live on regex101.com.
I suppose the correct AutoIt syntax would be
StringRegExpReplace($xml_string, "(?s)(?<!<!)--+(?!-?>)(?=(?:(?!-->).)*-->)", "")

Regex to find strings not containing a specified value

I'm using notepad++'s regular expression search function to find all strings in a .txt document that do not contain a specific value (HIJ in the below example), where all strings begin with the same value (ABC in the below example).
How would I go about doing this?
Example
Every String starts with ABC
ABC is never used in a string other than at the beginning,
ABCABC123 would be two strings --"ABC" and "ABC123"
HIJ may appear multiple times in a string
I need to find the strings that do not contain HIJ
Input is one long file with no line breaks, but does contain special characters (*, ^, #, ~, :) and spaces
Example Input:
ABC1234HIJ56ABC7#HIJABC89ABCHIJ0ABE:HIJABC12~34HI456J
Example Input would be viewed as the following strings
ABC1234HIJ56
ABC7#HIJ
ABC89
ABCHIJ0ABE:HIJ
ABC12%34HI456J
The Third and Fifth strings both lack "HIJ" and therefore are included in the output, all others are not included in the output.
Example desired output:
ABC89
ABC12~34HI456J
I am 99% new to RegEx and will be looking more into it in the future, as my job description suddenly changed earlier this week when someone else in the company left suddenly, and therefore I have been doing this manually by searching (ABC|HIJ) and going through the search function's results looking for "ABC" appearing twice in a row. Supposedly the former employee was able to do this in an automated way, but left no documentation.
Any help would be appreciated!
This question is a repeat of a prior question I asked, but I was very very bad at formatting a question and it seems to have sunk beyond noticeable levels.
You can find the items you want with:
ABC(?:[^HA]+|H(?!IJ)|A(?!BC))*+(?=ABC|$)
Note: in this first pattern, you can replace (?=ABC|$) with (?!HIJ)
pattern details:
ABC
(?: # non-capturing group
[^HA]+ # all that is not a H or an A
| # OR
H(?!IJ) # an H not followed by IJ
|
A(?!BC) # an A not followed by BC
)*+ # repeat the group
(?=ABC|$) # followed by "ABC" or the end of the string
Note: if you want to remove all that is not the items you want you can make this search replace:
search: (?:ABC(?:[^HA]+|H(?!IJ)|A(?!BC))*+HIJ.*?(?=ABC|$))+|(?=ABC)
replace: \r\n
you could use this pattern
(ABC(?:(?!HIJ).)*?)(?=ABC|\R)
Demo
( # Capturing Group (1)
ABC # "ABC"
(?: # Non Capturing Group
(?! # Negative Look-Ahead
HIJ # "HIJ"
) # End of Negative Look-Ahead
. # Any character except line break
) # End of Non Capturing Group
*? # (zero or more)(lazy)
) # End of Capturing Group (1)
(?= # Look-Ahead
ABC # "ABC"
| # OR
\R # <line break>
) # End of Look-Ahead
You can use the following expression to match your criterion:
(^ABC(?:(?!HIJ).)*$)
This starts with ABC and looks ahead (negative) for HIJ pattern. The pattern works for the separated strings.
For a single line pattern (as provided in your question), a slight modification of this works (as follows):
(ABC(?:(?!HIJ).)*?)(?=ABC|$)

Can a Regex Return the Number of the Line where the Match is Found?

In a text editor, I want to replace a given word with the number of the line number on which this word is found. Is this is possible with Regex?
Recursion, Self-Referencing Group (Qtax trick), Reverse Qtax or Balancing Groups
Introduction
The idea of adding a list of integers to the bottom of the input is similar to a famous database hack (nothing to do with regex) where one joins to a table of integers. My original answer used the #Qtax trick. The current answers use either Recursion, the Qtax trick (straight or in a reversed variation), or Balancing Groups.
Yes, it is possible... With some caveats and regex trickery.
The solutions in this answer are meant as a vehicle to demonstrate some regex syntax more than practical answers to be implemented.
At the end of your file, we will paste a list of numbers preceded with a unique delimiter. For this experiment, the appended string is :1:2:3:4:5:6:7 This is a similar technique to a famous database hack that uses a table of integers.
For the first two solutions, we need an editor that uses a regex flavor that allows recursion (solution 1) or self-referencing capture groups (solutions 2 and 3). Two come to mind: Notepad++ and EditPad Pro. For the third solution, we need an editor that supports balancing groups. That probably limits us to EditPad Pro or Visual Studio 2013+.
Input file:
Let's say we are searching for pig and want to replace it with the line number.
We'll use this as input:
my cat
dog
my pig
my cow
my mouse
:1:2:3:4:5:6:7
First Solution: Recursion
Supported languages: Apart from the text editors mentioned above (Notepad++ and EditPad Pro), this solution should work in languages that use PCRE (PHP, R, Delphi), in Perl, and in Python using Matthew Barnett's regex module (untested).
The recursive structure lives in a lookahead, and is optional. Its job is to balance lines that don't contain pig, on the left, with numbers, on the right: think of it as balancing a nested construct like {{{ }}}... Except that on the left we have the no-match lines, and on the right we have the numbers. The point is that when we exit the lookahead, we know how many lines were skipped.
Search:
(?sm)(?=.*?pig)(?=((?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?:(?1)|[^:]+)(:\d+))?).*?\Kpig(?=.*?(?(2)\2):(\d+))
Free-Spacing Version with Comments:
(?xsm) # free-spacing mode, multi-line
(?=.*?pig) # fail right away if pig isn't there
(?= # The Recursive Structure Lives In This Lookahead
( # Group 1
(?: # skip one line
^
(?:(?!pig)[^\r\n])* # zero or more chars not followed by pig
(?:\r?\n) # newline chars
)
(?:(?1)|[^:]+) # recurse Group 1 OR match all chars that are not a :
(:\d+) # match digits
)? # End Group
) # End lookahead.
.*?\Kpig # get to pig
(?=.*?(?(2)\2):(\d+)) # Lookahead: capture the next digits
Replace: \3
In the demo, see the substitutions at the bottom. You can play with the letters on the first two lines (delete a space to make pig) to move the first occurrence of pig to a different line, and see how that affects the results.
Second Solution: Group that Refers to Itself ("Qtax Trick")
Supported languages: Apart from the text editors mentioned above (Notepad++ and EditPad Pro), this solution should work in languages that use PCRE (PHP, R, Delphi), in Perl, and in Python using Matthew Barnett's regex module (untested). The solution is easy to adapt to .NET by converting the \K to a lookahead and the possessive quantifier to an atomic group (see the .NET Version a few lines below.)
Search:
(?sm)(?=.*?pig)(?:(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*+.*?\Kpig(?=[^:]+(?(1)\1):(\d+))
.NET version: Back to the Future
.NET does not have \K. It its place, we use a "back to the future" lookbehind (a lookbehind that contains a lookahead that skips ahead of the match). Also, we need to use an atomic group instead of a possessive quantifier.
(?sm)(?<=(?=.*?pig)(?=(?>(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*).*)pig(?=[^:]+(?(1)\1):(\d+))
Free-Spacing Version with Comments (Perl / PCRE Version):
(?xsm) # free-spacing mode, multi-line
(?=.*?pig) # lookahead: if pig is not there, fail right away to save the effort
(?: # start counter-line-skipper (lines that don't include pig)
(?: # skip one line
^ #
(?:(?!pig)[^\r\n])* # zero or more chars not followed by pig
(?:\r?\n) # newline chars
)
# for each line skipped, let Group 1 match an ever increasing portion of the numbers string at the bottom
(?= # lookahead
[^:]+ # skip all chars that are not colons
( # start Group 1
(?(1)\1) # match Group 1 if set
:\d+ # match a colon and some digits
) # end Group 1
) # end lookahead
)*+ # end counter-line-skipper: zero or more times
.*? # match
\K # drop everything we've matched so far
pig # match pig (this is the match!)
(?=[^:]+(?(1)\1):(\d+)) # capture the next number to Group 2
Replace:
\2
Output:
my cat
dog
my 3
my cow
my mouse
:1:2:3:4:5:6:7
In the demo, see the substitutions at the bottom. You can play with the letters on the first two lines (delete a space to make pig) to move the first occurrence of pig to a different line, and see how that affects the results.
Choice of Delimiter for Digits
In our example, the delimiter : for the string of digits is rather common, and could happen elsewhere. We can invent a UNIQUE_DELIMITER and tweak the expression slightly. But the following optimization is even more efficient and lets us keep the :
Optimization on Second Solution: Reverse String of Digits
Instead of pasting our digits in order, it may be to our benefit to use them in the reverse order: :7:6:5:4:3:2:1
In our lookaheads, this allows us to get down to the bottom of the input with a simple .*, and to start backtracking from there. Since we know we're at the end of the string, we don't have to worry about the :digits being part of another section of the string. Here's how to do it.
Input:
my cat pi g
dog p ig
my pig
my cow
my mouse
:7:6:5:4:3:2:1
Search:
(?xsm) # free-spacing mode, multi-line
(?=.*?pig) # lookahead: if pig is not there, fail right away to save the effort
(?: # start counter-line-skipper (lines that don't include pig)
(?: # skip one line that doesn't have pig
^ #
(?:(?!pig)[^\r\n])* # zero or more chars not followed by pig
(?:\r?\n) # newline chars
)
# Group 1 matches increasing portion of the numbers string at the bottom
(?= # lookahead
.* # get to the end of the input
( # start Group 1
:\d+ # match a colon and some digits
(?(1)\1) # match Group 1 if set
) # end Group 1
) # end lookahead
)*+ # end counter-line-skipper: zero or more times
.*? # match
\K # drop match so far
pig # match pig (this is the match!)
(?=.*(\d+)(?(1)\1)) # capture the next number to Group 2
Replace: \2
See the substitutions in the demo.
Third Solution: Balancing Groups
This solution is specific to .NET.
Search:
(?m)(?<=\A(?<c>^(?:(?!pig)[^\r\n])*(?:\r?\n))*.*?)pig(?=[^:]+(?(c)(?<-c>:\d+)*):(\d+))
Free-Spacing Version with Comments:
(?xm) # free-spacing, multi-line
(?<= # lookbehind
\A #
(?<c> # skip one line that doesn't have pig
# The length of Group c Captures will serve as a counter
^ # beginning of line
(?:(?!pig)[^\r\n])* # zero or more chars not followed by pig
(?:\r?\n) # newline chars
) # end skipper
* # repeat skipper
.*? # we're on the pig line: lazily match chars before pig
) # end lookbehind
pig # match pig: this is the match
(?= # lookahead
[^:]+ # get to the digits
(?(c) # if Group c has been set
(?<-c>:\d+) # decrement c while we match a group of digits
* # repeat: this will only repeat as long as the length of Group c captures > 0
) # end if Group c has been set
:(\d+) # Match the next digit group, capture the digits
) # end lokahead
Replace: $1
Reference
Qtax trick
On Which Line Number Was the Regex Match Found?
Because you didn't specify which text editor, in vim it would be:
:%s/searched_word/\=printf('%-4d', line('.'))/g (read more)
But as somebody mentioned it's not a question for SO but rather Super User ;)
I don't know of an editor that does that short of extending an editor that allows arbitrary extensions.
You could easily use perl to do the task, though.
perl -i.bak -e"s/word/$./eg" file
Or if you want to use wildcards,
perl -MFile::DosGlob=glob -i.bak -e"BEGIN { #ARGV = map glob($_), #ARGV } s/word/$./eg" *.txt

RegExp infinite loop only in perl, why?

I have a regular expression to test whether a CSV cell contains a correct file path:
EDIT The CSV lists filepaths that does not yet exists when script runs (I cannot use -e), and filepath can include * or %variable% or {$variable}.
my $FILENAME_REGEXP = '^(|"|""")(?:[a-zA-Z]:[\\\/])?[\\\/]{0,2}(?:(?:[\w\s\.\*-]+|\{\$\w+}|%\w+%)[\\\/]{0,2})*\1$';
Since CSV cells sometimes contains wrappers of double quotes, and sometimes the filename itself needs to be wrapped by double quotes, I made this grouping (|"|""") ... \1
Then using this function:
sub ValidateUNCPath{
my $input = shift;
if ($input !~ /$FILENAME_REGEXP/){
return;
}
else{
return "This is a Valid File Path.";
}
}
I'm trying to test if this phrase is matching my regexp (It should not match):
"""c:\my\dir\lord"
but my dear Perl gets into infinite loop when:
ValidateUNCPath('"""c:\my\dir\lord"');
EDIT actually it loops on this:
ValidateUNCPath('"""\aaaaaaaaa\bbbbbbb\ccccccc\Netwxn00.map"');
I made sure in http://regexpal.com that my regexp correctly catches those non-symmetric """ ... " wrapping double quotes, but Perl got his own mind :(
I even tried the /g and /o flags in
/$FILENAME_REGEXP/go
but it still hangs. What am I missing ?
First off, nothing you have posted can cause an infinite loop, so if you're getting one, its not from this part of the code.
When I try out your subroutine, it returns true for all sorts of strings that are far from looking like paths, for example:
.....
This is a Valid File Path.
.*.*
This is a Valid File Path.
-
This is a Valid File Path.
This is because your regex is rather loose.
^(|"|""") # can match the empty string
(?:[a-zA-Z]:[\\\/])? # same, matches 0-1 times
[\\\/]{0,2} # same, matches 0-2 times
(?:(?:[\w\s\.\*-]+|\{\$\w+}|%\w+%)[\\\/]?)+\1$ # only this is not optional
Since only the last part actually have to match anything, you are allowing all kinds of strings, mainly in the first character class: [\w\s\.\*-]
In my personal opinion, when you start relying on regexes that look like yours, you're doing something wrong. Unless you're skilled at regexes, and hope noone who isn't will ever be forced to fix it.
Why don't you just remove the quotes? Also, if this path exists in your system, there is a much easier way to check if it is valid: -e $path
If the regex engine was naïve,
('y') x 20 =~ /^.*.*.*.*.*x/
would take a very long time to fail since it has to try
20 * 20 * 20 * 20 * 20 = 3,200,000 possible matches.
Your pattern has a similar structure, meaning it has many components match wide range of substrings of your input.
Now, Perl's regex engine is highly optimised, and far far from naïve. In the above pattern, it will start by looking for x, and exit very very fast. Unfortunately, it doesn't or can't similarly optimise your pattern.
Your patterns is a complete mess. I'm not going to even try to guess what it's suppose to match. You will find that this problem will solve itself once you switch to a correct pattern.
Update
Edit: From trial and error, the below grouping sub-expression [\w\s.*-]+ is causing backtrack problem
(?:
(?:
[\w\s.*-]+
| \{\$\w+\}
| %\w+%
)
[\\\/]?
)+
Fix #1,
Unrolled method
'
^
( # Nothing
|" # Or, "
|""" # Or, """
)
# Here to end, there is no provision for quotes (")
(?: # If there are no balanced quotes, this will fail !!
[a-zA-Z]
:
[\\\/]
)?
[\\\/]{0,2}
(?:
[\w\s.*-]
| \{\$\w+\}
| %\w+%
)+
(?:
[\\\/]
(?:
[\w\s.*-]
| \{\$\w+\}
| %\w+%
)+
)*
[\\\/]?
\1
$
'
Fix #2, Independent Sub-Expression
'
^
( # Nothing
|" # Or, "
|""" # Or, """
)
# Here to end, there is no provision for quotes (")
(?: # If there are no balanced quotes, this will fail !!
[a-zA-Z]
:
[\\\/]
)?
[\\\/]{0,2}
(?>
(?:
(?:
[\w\s.*-]+
| \{\$\w+\}
| %\w+%
)
[\\\/]?
)+
)
\1
$
'
Fix #3, remove the + quantifier (or add +?)
'
^
( # Nothing
|" # Or, "
|""" # Or, """
)
# Here to end, there is no provision for quotes (")
(?: # If there are no balanced quotes, this will fail !!
[a-zA-Z]
:
[\\\/]
)?
[\\\/]{0,2}
(?:
(?:
[\w\s.*-]
| \{\$\w+\}
| %\w+%
)
[\\\/]?
)+
\1
$
'
Thanks to sln this is my fixed regexp:
my $FILENAME_REGEXP = '^(|"|""")(?:[a-zA-Z]:[\\\/])?[\\\/]{0,2}(?:(?:[\w\s.-]++|\{\$\w+\}|%\w+%)[\\\/]{0,2})*\*?[\w.-]*\1$';
(I also disallowed * char in directories, and only allowed single * in (last) filename)