I am trying to make regular expression for Valid sharepoint folder name, which have conditions:
Cannot begin or end with a dot,
Cannot contain consecutive dots and
Cannot contain any of the following characters: ~ " # % & * : < > ? / \ { | }.
Wrote Regex for 1st and 3rd point:
[^\.]([^~ " # % & * : < > ? / \ { | }]+) [^\.]$
and for third (?!.*\.\.).*)$ but they are not working properly and have to integrate them into one expression.
Please help.
What about just
^\w(?:\w+\.?)*\w+$
I made a small test here
EDIT
This also works
^\w(?:\w\.?)*\w+$
How about:
/^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?\/\\{|}]+).+$/
explanation:
The regular expression:
(?-imsx:^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?/\\{|}]+).+$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
$ before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
[~"#%&*:<>?/\\{|}] any character of: '~', '"', '#', '%',
+ '&', '*', ':', '<', '>', '?', '/', '\\',
'{', '|', '}' (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
In action (perl script):
my $re = qr/^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?\/\\{|}]+).+$/;
while(<DATA>) {
chomp;
say /$re/ ? "OK : $_" : "KO : $_";
}
__DATA__
.abc
abc.
a..b
abc
output:
KO : .abc
KO : abc.
KO : a..b
OK : abc
Related
I have a string of text:
\n new"test \n aaaa" \n ta \n `this is a \n newline that should be kept`
My goal is to match all \n's outside of backticks (`), quotes ("), or single quotes ('). Based off another question (https://stackoverflow.com/a/48953880/14465957), I switched the positive lookahead used to a negative one, which now matches all newlines outside of quotes ("). However, it doesn't work when I attempted to ignore single and back ticks.
What am I doing wrong?
Working quotes:
https://regex101.com/r/ooqz5d/1/
If you're using PCRE, you can use a control verb to skip everything inside of a quote closure:
(['"`]).*?\1(*SKIP)(*F)|\\n
(['"`]) any type of quote, put it in group 1
.*? any characters, non greedy
\1 the quote that captured in group 1
(*SKIP)(*F) skip the current match, which is a quote closure
|\\n match a \n
See the test cases
Also, if you need to ignore escaped quotes(\", \' etc), you may try
(['"`])(?:(?<!\\)\\(?:\\\\)*\1|(?!\1).)*\1(*SKIP)(*F)|\\n
Check the test cases
Using JavaScript
For JavaScript, you can't use control verbs. But you can use group capture to replace outbound \n
Regex
((['"`])[\s\S]*?\2)|\\n
Substitution
$1
const regex = /((['"`])[\s\S]*?\2)|\\n/g;
const text = String.raw`\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;
console.log('before\n', text);
const result = text.replace(regex, '$1');
console.log('after\n', result);
Real line breaks
const regex = /((['"`])[\s\S]*?\2)|\n/g;
const text = `\nnew"test\naaaa"\nta\n\`this is a \nnewline that should be kept\`\ntest\n'this \n should also be kept'\n`;
console.log('before\n----\n', text);
const result = text.replace(regex, '$1');
console.log('after\n----\n', result);
Use
text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1')
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
[^'\\]* any character except: ''', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^'\\]* any character except: ''', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
` '`'
--------------------------------------------------------------------------------
[^`\\]* any character except: '`', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
. any character except \n
--------------------------------------------------------------------------------
[^`\\]* any character except: '`', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
` '`'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
n 'n'
JavaScript code:
const text = String.raw`\nnew"test\naaaa\\\n"\nta\n\`this is a \nnewline that should be kept\`\n'this is a \nnew test'\n`
console.log(text.replace(/("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'|`[^`\\]*(?:\\.[^`\\]*)*`)|\\n/g, '$1'))
I am having a great trouble to remove the errors in unicode encoded corpus.
In following form
രണവര്ഗ്ഗത്തിനകത്തു=ഭരണവര്ഗ്ഗത്തിന്:stemഅകത്തു|:suffix
ഭസ്മമാക്കിക്കളയുകയും=ഭസ്മം:stemആക്കിക്കളയുകയും|:suffix
ഭസ്മമാക്കി=ഭസ്മം:stemആക്കി|:suffix
ഭാഗത്തുനിന്നുണ്ടാകണം=ഭാഗത്ത്:stemനിന്ന്:stemഉണ്ടാകണം|:suffix,:
ഭാഗമായ=ഭാഗം:stemആയ|:suffix
ഭാര്യമാരില്നിന്നും=ഭാര്യമാരില്:stemനിന്നും|:suffix:suffix
ഭാര്യമാരുണ്ടായിരുന്നവരില്നിന്നു=ഭാര്യമാര്:stemഉണ്ടായിരുന്നവരില്:stemനിന്നു|:suffix,:suffix:suffix
ഭാര്യയായി=ഭാര്യ:stemആയി|:suffix
ഭാഷ്യകര്ത്താവായ=ഭാഷ്യകര്ത്താവ്:stemആയ|:suffix:suffix
ഭിത്തികളൊക്കെ=ഭിത്തികള്:stemഒക്കെ|:suffix
ഭിന്നതയില്ലെന്നും=ഭിന്നത:stemഇല്ല:stemഎന്നും|:suffix,:suffix0
ഭൂപ്രഭുക്കളെന്ന്=ഭൂപ്രഭുക്കള്:stemഎന്ന്|:suffix0
ഭൂമിയില്നിന്ന്=ഭൂമിയില്:stemനിന്ന്|:suffix
ഭൂമിയിലുള്ള=ഭൂമിയില്:stemഉള്ള|:suffix
ഭൂമിയെപ്പോലൊരു=ഭൂമിയെ:stemപോലെ:stemഒരു|:suffix,:suffix0
ഭൂമുഖവീക്ഷണനായി=ഭൂമുഖവീക്ഷണന്:stemആയി|:suffix:suffix
ഭൂസഞ്ചാരംപോലെ=ഭൂസഞ്ചാരം:stemപോലെ|:suffix
ഭേദിക്കേണ്ടതായി=ഭേദിക്കേണ്ടതാ്:stemആയി|:suffix:suffix
ഭൗതികവാദികളാണ്=ഭൗതികവാദികള്:stemആണ്|:suffix0
മക്കളയച്ചു=മക്കള്:stemഅയച്ചു|:suffix
മക്കള്ക്കാണ്=മക്കള്ക്ക്:stemആണ്|:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix:suffix
മഞ്ചേശ്വരത്താണ്=മഞ്ചേശ്വരത്ത്:stemആണ്|:suffix:suffix
മഞ്ഞുവെള്ളത്തിലാഴ്ത്തി=മഞ്ഞുവെള്ളത്തില്:stemആഴ്ത്തി|:suffix:suffix
മടങ്ങാണിതിന്=മടങ്ങ്:stemആണ്:stemഇതിന്|:suffix,:suffix
മടിയനായിരുന്നു=മടിയന്:stemആയിരുന്നു|:suffix
Where I need to remove two stem together and two suffixes together. In the case of two stems I need keep first stem and convert the second into suffix. In the case of two suffixes like this :suffix:suffix, :suffix,:suffix0 I need to keep only one suffix
use strict;
use warnings qw/ all FATAL /;
use List::Util 'reduce';
while ( <> ) {
my ($word, $ss) = / \( ( /[^()]* ) \) /gx;
my #ss = split ' ', $ss;
my $str = reduce { sprintf 'S (%s) (%s)', $a, $b } #ss;
printf "%s (%s)\n", $word, $str;
}
This is the perl code I am trying to change but that code is not sufficient to handle the complexities. Is there any way to handle the kinds of errors.
**Expected output**
`ഭാര്യമാരുണ്ടായിരുന്നവരില്നിന്നു=ഭാര്യമാര്:stemഉണ്ടായിരുന്നവരില്:stemനിന്നു|:suffix,:suffix:suffix` to
ഭാര്യമാരുണ്ടായിരുന്നവരില്നിന്നു=ഭാര്യമാര്:stemഉണ്ടായിരുന്നവരില്:suffixനിന്നു|:suffix
ഭാഷ്യകര്ത്താവായ=ഭാഷ്യകര്ത്താവ്:stemആയ|:suffix:suffix to
ഭാഷ്യകര്ത്താവായ=ഭാഷ്യകര്ത്താവ്:stemആയ|:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix:suffix to
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix
Any one interested in helping me?
Description
^([^:]+:stem[^:]+)(?::stem(?=.*?(:suffix))|)([^:]+?\|:suffix[^:]*)(?::suffix[^:]*)*$
Replace with: \1\2\3
This regular expression will do the following:
Assumes that each line will have a suffix string this is then pattern matched and pulled into the capture group 2
If there is a second stem it is replaced with suffix
Removes all but the first suffix entries
Example
Live Demo
https://regex101.com/r/rJ9gW3/2
Sample text
ഭാര്യമാരുണ്ടായിരുന്നവരില്നിന്നു=ഭാര്യമാര്:stemഉണ്ടായിരുന്നവരില്:stemനിന്നു|:suffix,:suffix:suffix
ഭാഷ്യകര്ത്താവായ=ഭാഷ്യകര്ത്താവ്:stemആയ|:suffix:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix:suffix
Sample Matches
ഭാര്യമാരുണ്ടായിരുന്നവരില്നിന്നു=ഭാര്യമാര്:stemഉണ്ടായിരുന്നവരില്:suffixനിന്നു|:suffix,
ഭാഷ്യകര്ത്താവായ=ഭാഷ്യകര്ത്താവ്:stemആയ|:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
:stem ':stem'
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
:stem ':stem'
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
:suffix ':suffix'
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
[^:]+? any character except: ':' (1 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
\| '|'
----------------------------------------------------------------------
:suffix ':suffix'
----------------------------------------------------------------------
[^:]* any character except: ':' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
:suffix ':suffix'
----------------------------------------------------------------------
[^:]* any character except: ':' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------
I'm trying to make regex that picks all words that are a-z and with or without the symbol '.
the word needs to be at least 2 characters
cant start with the ' symbol
two ' symbols can't be next to each other
and "two character" words can't end with the ' symbol
I have being working for hours on that regex and i can't make it work:
/\b[a-z]([a-z(\')](?!\1))+\b/
it does not work and i don't know why! (the two ' symbols next to each other)
any ideas?
([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})
Live # RegExPal
You probably will not need to use \b as regex is greedy and will consume all words as a whole.
This version can't be tested with RegexPal (does not recognize the lookbehind) but has custom word borders:
(?<![a-z'])([a-z](?:[a-z]|'(?!'))+[a-z']|[a-z]{2})(?![a-z'])
This should work (disclaimer: untested)
/\b(?![a-z]{2}'\b)[a-z]((?!'')['a-z])+\b/
Yours does not because you are attempting to nest a parenthesized expression inside a character class. That only adds ( and ) to the class, it will not set the value of your next \1 code.
(Edit) Added the constraint on aa'.
Assuming words are delimited by spaces:
(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)
In action in a perl script:
my $re = qr/(?:^|\s)((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))(?:$|\s)/;
while(<DATA>) {
chomp;
say (/$re/ ? "OK: $_" : "KO: $_");
}
__DATA__
ab
abc
a'
ab''
abc'
a''b
:!ù
output:
OK: ab
OK: abc
KO: a'
OK: ab''
OK: abc'
KO: a''b
KO: :!ù
Explanation:
The regular expression:
(?-imsx:\b((?:[a-z]{2})|(?:[a-z](?!.*'')[a-z']{2,}))\b)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[a-z]{2} any character of: 'a' to 'z' (2 times)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[a-z] any character of: 'a' to 'z'
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
'' '\'\''
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
[a-z']{2,} any character of: 'a' to 'z', ''' (at
least 2 times (matching the most
amount possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
I want any path string that leads to a file with an extension of '.log' or a path that contains the directory 'tmp' to be excluded from the match
I'm nearly there:
(?!tmp).+?\.(?!log|tmp).+
http://rubular.com/r/Ubkz7MIEGH
What I want is for
tmp/hello.jpg
to be excluded in the same way that
hello.log
hmm.tmp
Are excluded.
Just try with following regex:
^(?!(?:.*log$)|tmp).*$
How about:
^(?!.*\btmp\b)(?!.+\.log\b)(.+)$
Explanation:
The regular expression:
(?-imsx:^(?!.*\btmp\b)(?!.+\.log\b)(.+)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
----------------------------------------------------------------------
tmp 'tmp'
----------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
log 'log'
----------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
^(?!tmp).*(?<!\.tmp|log)$
It's just a negative lookbehind. Live demo
When learning regular expressions, I once saw the following four examples. How can I understand their differences?
/ABC (?i:s) XYZ/
/ABC (?x: [A-Z] \.? \s )?XYZ/
/ABC (?ix: [A-Z] \.? \s )?XYZ/
/ABC (?x-i: [A-Z] \.? \s )?XYZ/i
What do the i and x flags mean?
Those are very straightforward. A quick look at the documentation would answer your questions. You might also find YAPE::Regex::Explain useful.
$ perl -MYAPE::Regex::Explain -e'
print YAPE::Regex::Explain->new($_)->explain
for
qr/ABC (?i:s) XYZ/,
qr/ABC (?x: [A-Z] \.? \s )?XYZ/,
qr/ABC (?ix: [A-Z] \.? \s )?XYZ/,
qr/ABC (?x-i: [A-Z] \.? \s )?XYZ/i;
'
The regular expression:
(?-imsx:ABC (?i:s) XYZ)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
ABC 'ABC '
----------------------------------------------------------------------
(?i: group, but do not capture (case-
insensitive) (with ^ and $ matching
normally) (with . not matching \n)
(matching whitespace and # normally):
----------------------------------------------------------------------
s 's'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
XYZ ' XYZ'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
The regular expression:
(?-imsx:ABC (?x: [A-Z] \.? \s )?XYZ)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
ABC 'ABC '
----------------------------------------------------------------------
(?x: group, but do not capture (disregarding
whitespace and comments) (case-sensitive)
(with ^ and $ matching normally) (with .
not matching \n) (optional (matching the
most amount possible)):
----------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
----------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
XYZ 'XYZ'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
The regular expression:
(?-imsx:ABC (?ix: [A-Z] \.? \s )?XYZ)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
ABC 'ABC '
----------------------------------------------------------------------
(?ix: group, but do not capture (case-
insensitive) (disregarding whitespace and
comments) (with ^ and $ matching normally)
(with . not matching \n) (optional
(matching the most amount possible)):
----------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
----------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
XYZ 'XYZ'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
The regular expression:
(?i-msx:ABC (?x-i: [A-Z] \.? \s )?XYZ)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?i-msx: group, but do not capture (case-insensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
ABC 'ABC '
----------------------------------------------------------------------
(?x-i: group, but do not capture (disregarding
whitespace and comments) (case-sensitive)
(with ^ and $ matching normally) (with .
not matching \n) (optional (matching the
most amount possible)):
----------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
----------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
XYZ 'XYZ'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
/expr/flags applies flags to expr.
(?flags:subexpr) applies flags to subexpr.
i sets to ignore case, x sets to ignore whitespaces in the regexp body.
More detailed information is available on www.regular-expressions.info.