Regarding the differences among these four regular expressions with minor differences

Regarding the differences among these four regular expressions with minor differences - regex

When learning regular expressions, I once saw the following four examples. How can I understand their differences?
/ABC (?i:s) XYZ/
/ABC (?x: [A-Z] \.? \s )?XYZ/
/ABC (?ix: [A-Z] \.? \s )?XYZ/
/ABC (?x-i: [A-Z] \.? \s )?XYZ/i
What do the i and x flags mean?

Those are very straightforward. A quick look at the documentation would answer your questions. You might also find YAPE::Regex::Explain useful.
$ perl -MYAPE::Regex::Explain -e'
print YAPE::Regex::Explain->new($_)->explain
for
qr/ABC (?i:s) XYZ/,
qr/ABC (?x: [A-Z] \.? \s )?XYZ/,
qr/ABC (?ix: [A-Z] \.? \s )?XYZ/,
qr/ABC (?x-i: [A-Z] \.? \s )?XYZ/i;
'
The regular expression:
(?-imsx:ABC (?i:s) XYZ)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
ABC 'ABC '
----------------------------------------------------------------------
(?i: group, but do not capture (case-
insensitive) (with ^ and $ matching
normally) (with . not matching \n)
(matching whitespace and # normally):
----------------------------------------------------------------------
s 's'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
XYZ ' XYZ'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
The regular expression:
(?-imsx:ABC (?x: [A-Z] \.? \s )?XYZ)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
ABC 'ABC '
----------------------------------------------------------------------
(?x: group, but do not capture (disregarding
whitespace and comments) (case-sensitive)
(with ^ and $ matching normally) (with .
not matching \n) (optional (matching the
most amount possible)):
----------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
----------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
XYZ 'XYZ'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
The regular expression:
(?-imsx:ABC (?ix: [A-Z] \.? \s )?XYZ)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
ABC 'ABC '
----------------------------------------------------------------------
(?ix: group, but do not capture (case-
insensitive) (disregarding whitespace and
comments) (with ^ and $ matching normally)
(with . not matching \n) (optional
(matching the most amount possible)):
----------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
----------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
XYZ 'XYZ'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
The regular expression:
(?i-msx:ABC (?x-i: [A-Z] \.? \s )?XYZ)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?i-msx: group, but do not capture (case-insensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
ABC 'ABC '
----------------------------------------------------------------------
(?x-i: group, but do not capture (disregarding
whitespace and comments) (case-sensitive)
(with ^ and $ matching normally) (with .
not matching \n) (optional (matching the
most amount possible)):
----------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
----------------------------------------------------------------------
\.? '.' (optional (matching the most amount
possible))
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
XYZ 'XYZ'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

/expr/flags applies flags to expr.
(?flags:subexpr) applies flags to subexpr.
i sets to ignore case, x sets to ignore whitespaces in the regexp body.
More detailed information is available on www.regular-expressions.info.

Related

Printing in patterns in perl

I am having a great trouble to remove the errors in unicode encoded corpus.
In following form
രണവര്‍ഗ്ഗത്തിനകത്തു=ഭരണവര്‍ഗ്ഗത്തിന്:stemഅകത്തു|:suffix
ഭസ്മമാക്കിക്കളയുകയും=ഭസ്മം:stemആക്കിക്കളയുകയും|:suffix
ഭസ്മമാക്കി=ഭസ്മം:stemആക്കി|:suffix
ഭാഗത്തുനിന്നുണ്ടാകണം=ഭാഗത്ത്:stemനിന്ന്:stemഉണ്ടാകണം|:suffix,:
ഭാഗമായ=ഭാഗം:stemആയ|:suffix
ഭാര്യമാരില്‍നിന്നും=ഭാര്യമാരില്‍:stemനിന്നും|:suffix:suffix
ഭാര്യമാരുണ്ടായിരുന്നവരില്‍നിന്നു=ഭാര്യമാര്‍:stemഉണ്ടായിരുന്നവരില്‍:stemനിന്നു|:suffix,:suffix:suffix
ഭാര്യയായി=ഭാര്യ:stemആയി|:suffix
ഭാ‌ഷ്യകര്‍ത്താവായ=ഭാ‌ഷ്യകര്‍ത്താവ്:stemആയ|:suffix:suffix
ഭിത്തികളൊക്കെ=ഭിത്തികള്‍:stemഒക്കെ|:suffix
ഭിന്നതയില്ലെന്നും=ഭിന്നത:stemഇല്ല:stemഎന്നും|:suffix,:suffix0
ഭൂപ്രഭുക്കളെന്ന്=ഭൂപ്രഭുക്കള്‍:stemഎന്ന്|:suffix0
ഭൂമിയില്‍നിന്ന്=ഭൂമിയില്‍:stemനിന്ന്|:suffix
ഭൂമിയിലുള്ള=ഭൂമിയില്‍:stemഉള്ള|:suffix
ഭൂമിയെപ്പോലൊരു=ഭൂമിയെ:stemപോലെ:stemഒരു|:suffix,:suffix0
ഭൂമുഖവീക്ഷണനായി=ഭൂമുഖവീക്ഷണന്‍:stemആയി|:suffix:suffix
ഭൂസഞ്ചാരംപോലെ=ഭൂസഞ്ചാരം:stemപോലെ|:suffix
ഭേദിക്കേണ്ടതായി=ഭേദിക്കേണ്ടതാ്:stemആയി|:suffix:suffix
ഭൗതികവാദികളാണ്=ഭൗതികവാദികള്‍:stemആണ്|:suffix0
മക്കളയച്ചു=മക്കള്‍:stemഅയച്ചു|:suffix
മക്കള്‍ക്കാണ്=മക്കള്‍ക്ക്:stemആണ്|:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix:suffix
മഞ്ചേശ്വരത്താണ്=മഞ്ചേശ്വരത്ത്:stemആണ്|:suffix:suffix
മഞ്ഞുവെള്ളത്തിലാഴ്ത്തി=മഞ്ഞുവെള്ളത്തില്‍:stemആഴ്ത്തി|:suffix:suffix
മടങ്ങാണിതിന്=മടങ്ങ്:stemആണ്:stemഇതിന്|:suffix,:suffix
മടിയനായിരുന്നു=മടിയന്‍:stemആയിരുന്നു|:suffix
Where I need to remove two stem together and two suffixes together. In the case of two stems I need keep first stem and convert the second into suffix. In the case of two suffixes like this :suffix:suffix, :suffix,:suffix0 I need to keep only one suffix
use strict;
use warnings qw/ all FATAL /;
use List::Util 'reduce';
while ( <> ) {
my ($word, $ss) = / \( ( /[^()]* ) \) /gx;
my #ss = split ' ', $ss;
my $str = reduce { sprintf 'S (%s) (%s)', $a, $b } #ss;
printf "%s (%s)\n", $word, $str;
}
This is the perl code I am trying to change but that code is not sufficient to handle the complexities. Is there any way to handle the kinds of errors.
**Expected output**
`ഭാര്യമാരുണ്ടായിരുന്നവരില്‍നിന്നു=ഭാര്യമാര്‍:stemഉണ്ടായിരുന്നവരില്‍:stemനിന്നു|:suffix,:suffix:suffix` to
ഭാര്യമാരുണ്ടായിരുന്നവരില്‍നിന്നു=ഭാര്യമാര്‍:stemഉണ്ടായിരുന്നവരില്‍:suffixനിന്നു|:suffix
ഭാ‌ഷ്യകര്‍ത്താവായ=ഭാ‌ഷ്യകര്‍ത്താവ്:stemആയ|:suffix:suffix to
ഭാ‌ഷ്യകര്‍ത്താവായ=ഭാ‌ഷ്യകര്‍ത്താവ്:stemആയ|:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix:suffix to
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix
Any one interested in helping me?

Description
^([^:]+:stem[^:]+)(?::stem(?=.*?(:suffix))|)([^:]+?\|:suffix[^:]*)(?::suffix[^:]*)*$
Replace with: \1\2\3
This regular expression will do the following:
Assumes that each line will have a suffix string this is then pattern matched and pulled into the capture group 2
If there is a second stem it is replaced with suffix
Removes all but the first suffix entries
Example
Live Demo
https://regex101.com/r/rJ9gW3/2
Sample text
ഭാര്യമാരുണ്ടായിരുന്നവരില്‍നിന്നു=ഭാര്യമാര്‍:stemഉണ്ടായിരുന്നവരില്‍:stemനിന്നു|:suffix,:suffix:suffix
ഭാ‌ഷ്യകര്‍ത്താവായ=ഭാ‌ഷ്യകര്‍ത്താവ്:stemആയ|:suffix:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix:suffix
Sample Matches
ഭാര്യമാരുണ്ടായിരുന്നവരില്‍നിന്നു=ഭാര്യമാര്‍:stemഉണ്ടായിരുന്നവരില്‍:suffixനിന്നു|:suffix,
ഭാ‌ഷ്യകര്‍ത്താവായ=ഭാ‌ഷ്യകര്‍ത്താവ്:stemആയ|:suffix
മഞ്ചേരിയിലേക്കാണ്=മഞ്ചേരിയിലേക്ക്:stemആണ്|:suffix
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
:stem ':stem'
----------------------------------------------------------------------
[^:]+ any character except: ':' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
:stem ':stem'
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
.*? any character except \n (0 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
:suffix ':suffix'
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
( group and capture to \3:
----------------------------------------------------------------------
[^:]+? any character except: ':' (1 or more
times (matching the least amount
possible))
----------------------------------------------------------------------
\| '|'
----------------------------------------------------------------------
:suffix ':suffix'
----------------------------------------------------------------------
[^:]* any character except: ':' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \3
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
:suffix ':suffix'
----------------------------------------------------------------------
[^:]* any character except: ':' (0 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------

Regex to exclude match based on string before first forward slash

I want any path string that leads to a file with an extension of '.log' or a path that contains the directory 'tmp' to be excluded from the match
I'm nearly there:
(?!tmp).+?\.(?!log|tmp).+
http://rubular.com/r/Ubkz7MIEGH
What I want is for
tmp/hello.jpg
to be excluded in the same way that
hello.log
hmm.tmp
Are excluded.

Just try with following regex:
^(?!(?:.*log$)|tmp).*$

How about:
^(?!.*\btmp\b)(?!.+\.log\b)(.+)$
Explanation:
The regular expression:
(?-imsx:^(?!.*\btmp\b)(?!.+\.log\b)(.+)$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
----------------------------------------------------------------------
tmp 'tmp'
----------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
log 'log'
----------------------------------------------------------------------
\b the boundary between a word char (\w)
and something that is not a word char
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

^(?!tmp).*(?<!\.tmp|log)$
It's just a negative lookbehind. Live demo

Regex for valid sharepoint folder name

I am trying to make regular expression for Valid sharepoint folder name, which have conditions:
Cannot begin or end with a dot,
Cannot contain consecutive dots and
Cannot contain any of the following characters: ~ " # % & * : < > ? / \ { | }.
Wrote Regex for 1st and 3rd point:
[^\.]([^~ " # % & * : < > ? / \ { | }]+) [^\.]$
and for third (?!.*\.\.).*)$ but they are not working properly and have to integrate them into one expression.
Please help.

What about just
^\w(?:\w+\.?)*\w+$
I made a small test here
EDIT
This also works
^\w(?:\w\.?)*\w+$

How about:
/^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?\/\\{|}]+).+$/
explanation:
The regular expression:
(?-imsx:^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?/\\{|}]+).+$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
$ before an optional \n, and the end of
the string
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
[~"#%&*:<>?/\\{|}] any character of: '~', '"', '#', '%',
+ '&', '*', ':', '<', '>', '?', '/', '\\',
'{', '|', '}' (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
.+ any character except \n (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
In action (perl script):
my $re = qr/^(?!^\.)(?!.*\.$)(?!.*\.\.)(?!.*[~"#%&*:<>?\/\\{|}]+).+$/;
while(<DATA>) {
chomp;
say /$re/ ? "OK : $_" : "KO : $_";
}
__DATA__
.abc
abc.
a..b
abc
output:
KO : .abc
KO : abc.
KO : a..b
OK : abc

regular expressions to check length with multiple options

I need to validate the date format, that can be either 11/11/11 or 11/22/2013, i.e. the year block can be in YY or YYYY and the complete format will either MM/DD/YY or MM/DD/YYYY
I've this code
^(\d{1,2})\/(\d{1,2})\/(\d{4})$
and I've tried
^(\d{1,2})\/(\d{1,2})\/(\d{2}{4})$ // doesn't works, does nothing
and
^(\d{1,2})\/(\d{1,2})\/(\d{2|4})$ // and it returns null every time
PS: I'm applying it with Javascript/jQuery

^(\d{1,2})\/(\d{1,2})\/(\d{2}|\d{4})$
Both \d{2}{4} and \d{2|4} are not correct regex expression. You have to do two digits and for digits separately and combine then using or: (\d{2}|\d{4})

You could use:
^\d\d?/\d\d?/\d\d(?:\d\d)?$
explanation:
The regular expression:
(?-imsx:^\d\d?/\d\d?/\d\d(?:\d\d)?$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^ the beginning of the string
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d? digits (0-9) (optional (matching the most
amount possible))
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d? digits (0-9) (optional (matching the most
amount possible))
----------------------------------------------------------------------
/ '/'
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
\d digits (0-9)
----------------------------------------------------------------------
)? end of grouping
----------------------------------------------------------------------
$ before an optional \n, and the end of the
string
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

Perl regular expressions

I'm reading some code that involves regular expression and having some trouble.
Can someone please explain it and give an example of text it would parse?
if(/\|\s*STUFF(\d+)\s*\|\s*STUFF(\d+)/)
{
$a = $1;
$b = $2;
}

One string it matches against is |STUFF1|STUFF2.
YAPE::Regex::Explain
(?-imsx:\|\s*STUFF(\d+)\s*\|\s*STUFF(\d+))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\| '|'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
STUFF 'STUFF'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
\| '|'
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
----------------------------------------------------------------------
STUFF 'STUFF'
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------

\|\s*STUFF(\d+)\s*\|\s*STUFF(\d+)
\| look for a literal pipe character |.
\s* look for any number (zero or more) whitespace characters.
STUFF look for the string STUFF
(\d+) look for any number of digits (one or more), and save them to $1.
\s* look for any number of whitespace characters (zero or more)
then repeat once, and save the next digit sequence in $2.
If the regex matches, we know that $1 and $2 must be defined (i.e. they have a value).
In that case, we assign $1 to the variable $a and $2 to $b.
As no explicit string to match against is provided, the $_ variable is implicitly used.
Example text:
foo bar |STUFF123|STUFF456 baz bar foo
and
foo |
STUFF0
|STUFF1234567890bar

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regarding the differences among these four regular expressions with minor differences - regex

When learning regular expressions, I once saw the following four examples. How can I understand their differences? /ABC (?i:s) XYZ/ /ABC (?x: [A-Z] \.? \s )?XYZ/ /ABC (?ix: [A-Z] \.? \s )?XYZ/ /ABC (?x-i: [A-Z] \.? \s )?XYZ/i What do the i and x flags mean?

/expr/flags applies flags to expr. (?flags:subexpr) applies flags to subexpr. i sets to ignore case, x sets to ignore whitespaces in the regexp body. More detailed information is available on www.regular-expressions.info.

Related

Printing in patterns in perl

Regex to exclude match based on string before first forward slash

Regex for valid sharepoint folder name

regular expressions to check length with multiple options

Perl regular expressions

Categories

Resources