Why is bracket mandatory here? - regex

1 . ^([0-9A-Za-z]{5})+$
vs
2 . ^[a-zA-Z0-9]{5}+$
My intention is to match any string of length n such that n is a multiple of 5.
Check here : https://regex101.com/r/sS6rW8/1.
Please elaborate why case 1 matches the string whereas case 2 doesnot.

Because {n}+ doesn't mean what you think it does. In PCRE syntax, this turns {n} into a possessive quantifier. In other words, a{5}+ is the same as (?>a{5}). It's like the second + in the expression a++, which is the same as using an atomic group (?>a+).
This has no use with a fixed-length {n} but is more meaningful when used with {min,max}. So, a{2,5}+ is equivalent to (?>a{2,5}).
As a simple example, consider these patterns:
^(a{1,2})(ab) will match aab -> $1 is "a", $2 is "ab"
^(a{1,2}+)(ab) won't match aab -> $1 consumes "aa" possessively and $2 can't match

In ^([0-9A-Za-z]{5})+$ you're saying any number or letter 5 characters long 1 or more times. The + is on the entire group (whatever's inside the parentheses) and the {5} is on the [0-9A-Za-z]
Your second example has a no backtrack clause {5}+, which is different than (stuff{5})+

Related

CMake regex simple digit match [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?
They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here
The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).
In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com
+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.
A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.
Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba
+ is minimal one, * can be zero as well.
A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.
I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression

regex replace in powershell command duplicates characters: a bug in powershell? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?
They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here
The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).
In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com
+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.
A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.
Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba
+ is minimal one, * can be zero as well.
A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.
I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression

Regular expression for nnn or nnn.nnn

I have this regex
"^([0-9]{1,3})+(\.[0-9]{3})?$"
and it should allow only n, nn, nnn and nnn.nnn format of the number.
In my case it is passing also and this format nnnnn.nnn
You should remove + and redundant parentheses:
^[0-9]{1,3}(\.[0-9]{3})?$
^^^^^^^^^^
Your pattern matches start of the string (^), 1 or more occurrences of 1 to 3 digits (with ([0-9]{1,3})+) and an optional sequence of a dot followed with 3 digits ((\.[0-9]{3})?) at the end of the string ($).
The [0-9]{1,3} will only match 1 to 3 digits.
See the regex demo.
You need to remove the 1 from the expression like : ^([0-9]{3})+(\.[0-9]{3})?$
The + after the first parenthesis allows for an arbitrary number of repeats. If you mean {1,3} then you don't need the + at all.
The reason this is happening is because of the + you have in the middle of your regex.
This means "one or more of the preceding element", thus it effectively means 1 one more ([0-9]{1,3}) and it must end with ([0-9]{3})?$

Match this regex on perl

I am fairly new with Perl, and even more so with regex.
Have been trying to match the following, but without success:
First, 3 to 4 letters (ideally case insensitive)
Optionally a space (but not mandatory)
Then, also optionally a known big-case letter (M) and a number out of 1,2,3
An example of a valid string would be abc, but also DEFG M2. Invalid would be mem M, for example
What I have so far is:
$myExpr ~= m/^[a-z,A-z]{3,4}M[1,2,3]$/i
Not sure how to make the M and numbers optional
Why don't you try the following regular expression for it:
$myExpr =~ m/^([a-zA-Z]{3,4})(\s|)(M|)([1-3]|)$/;
([a-zA-Z]{3,4}) - Group of any character in this class: [a-zA-Z] with 3 to 4 repetition.
(\s|) - Either there will be a white-space(space) or not.
(M|) - Either there will be a Uppercase M or not.
([1-3]|) - Either there will any charter this class: [1-3] or not.
(OR) Try the following
I personally recommend this
$myExpr =~ m/^([a-zA-Z]{3,4})(\s{0,1})(M{0,1})([1-3]{0,1})$/;
([a-zA-Z]{3,4}) - Group of any character in this class: [a-zA-Z] with 3 to 4 repetition i.e., it should contain minimum of 3 characters and maximum of 4.
(\s{0,1}) - Group of \s with 0 to 1 repetition i.e., it's optional.
(M{0,1}) - Group of character M with 0 to 1 repetition i.e., it's optional.
([1-3]{0,1}) - Group of any digit from 1 to 3 with 0 to 1 repetition i.e., it's optional.
Group your optional symbols with (?:) and use "zero or one" quantifier ?.
$myExpr =~ m/^[a-zA-Z]{3,4}(?: M[123])?$/
I've also fixed errors in your regexp: you don't use , in character classes - that'd literraly mean "match ,", fixed A-Z range and removed /i modifier, since you didn't say if you need lower case M and first range already covers both small and big letters.
You can use the following regex. You don't need to use comma inside character class []. And also remove i as you need to match with M.
$myExpr ~= m/^[a-zA-z]{3,4}(?: M[123])?$/
If you think your space is optional, then again add a ? after that space too (i.e. (?: ?M[123])).

Perl. Regex not matching desired when using {x,y} metacharacter

I'm trying to work with {x,y} meta character, so please help to understand why
1. 'Hello' =~ /\w{2,}/; # Returns true. while..
2. 'Hello' =~ /\w{,6}/; # ..returns false ??!
\w{2,} stands for *'match [0-9A-Za-z_] character at least 2 times'*
\w{,6} stands for *'match [0-9A-Za-z_] character at most 6 times'*
If I'm reading this correct? So why the second doesn't match?
According to perlre documentation -- Quantifiers, only *, +, ?, {n}, {n,}, {n,m} are recognized:
The following standard quantifiers are recognized:
* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
-> /{,6}/ matches '{,6}' literally.
Use /\w{0,6}/ or /\w{1,6}/ instead according to your need.
The first argument to the {n,m} expression is required. See the perlre man page, for example:
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times
A pattern like {,m} is not recognized. If you explicitly give the first argument as 1 it works:
print 'Hello' =~ /\w{1,6}/;
generates "1".
Actually:
\w{n,m} means match alphanumeric least n times, but at most m times.
\w{n,} means match alphanumeric n or more times.
\w{n} means match alphanumeric exactly n times.
However:
\w{,m} means match alphanumeric followed by the literal {,m}. This is because the n is required; you must specify the first argument to the {n,m} expression.