regex replace in powershell command duplicates characters: a bug in powershell? [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?

They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here

The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).

In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com

+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.

A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.

Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba

+ is minimal one, * can be zero as well.

A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.

I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression

Related

CMake regex simple digit match [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?
They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here
The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).
In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com
+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.
A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.
Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba
+ is minimal one, * can be zero as well.
A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.
I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression

RegEx to check 24 hours time format fails

I have the following RegEx that is supposed to do 24 hours time format validation, which I'm trying out in https://rubular.com
/^[0-23]{2}:[0-59]{2}:[0-59]{2}$/
But the following times fails to match even if they look correct
02:06:00
04:05:00
Why this is so?
In character classes, you're supposed to denote the range of characters allowed (in contrast to the numbers you want to match in your example). For minutes and seconds, this is relatively straight-forward - the following expression
[0-5][0-9]
...will match any numerical string from "00" to "59".
But for the hours, you need to two separate expressions:
[01][0-9]|2[0-3]
...one to match "00" to "19" and one to match "20" to "23". Due to the alternative used (| character), these need to be grouped, which adds another bit of syntax (?:...). Finally we're just adding the anchors ^ and $ for beginning and end of string, which you already had where they belong.
^(?:[01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]$
You can check this solution out at regex101, if you like.
Your problem is that you understand characters ranges wrong: 0-23 doesn't mean "match any number from 0 to 23", it means: 0-2- match one digit: 0,1 or 2, then match 3.
Try this pattern: (?:[01][0-9]|2[0-3])(?::[0-5][0-9]){2}
Explanation:
(?:...) - non-capturing group
[01][0-9]|2[0-3] - alternation: match whether 0 or one followed by any digits fro 0 to 9 OR 2 followed by 0, 1, 2 or 3 (number from 00-23)
(?::[0-5][0-9]){2} - match : and [0-5][0-9] (basically number from 00-59) twice
Demo
use this (([0-1]\d|[2][0-3])):(([0-5][0-9])):(([0-5][0-9]))
Online demo

Extract information through regexp

I have a question about groups in a rule i created to extract dates from text.
Let's consider the following string:
fherfrefercr17hfeuetvbyeituew
The string is composed by everything at the beginning, then there is a number composed by one or two digits and then everything again. I need to extract only the number "17" from the string listed above.
With the following rule i extract only 7 and not 17.
.*(\d{1,2}).*
Can anyone help me with that please?
Overview
Given your pattern:
.*(\d{1,2}).*
This works in the following way:
.* Match any character any number of times
The quantifier here is considered to be greedy because it will match as many characters as possible so long as the pattern matches the string.
\d{1,2} Since your pattern says to match 1 or 2 digits and the previous token is greedy, the regex is just going to match a single digit because this still satisfies the pattern (the previous token stole the first digit).
Code
There are multiple ways you can fix this issue
Method 1
This will simply extract all numbers (1+ digits) from the string. If you want to only match 1 or two digits use \d\d? or \d{1,2} instead.
\d+
\d\d?
\d{1,2}
Method 2
This method turns the greedy quantifier * (in .*) into a lazy quantifier .*?. This will match any character any number of times, but as few as possible. The drawback to this method is that it's expensive because the engine needs to backtrack.
.*?\d{1,2}.*
Method 3
This method matches any non-digit character any number of times, then it matches one or two digits. This is likely the solution you're looking for.
\D*(\d{1,2}).*

Regular expression for nnn or nnn.nnn

I have this regex
"^([0-9]{1,3})+(\.[0-9]{3})?$"
and it should allow only n, nn, nnn and nnn.nnn format of the number.
In my case it is passing also and this format nnnnn.nnn
You should remove + and redundant parentheses:
^[0-9]{1,3}(\.[0-9]{3})?$
^^^^^^^^^^
Your pattern matches start of the string (^), 1 or more occurrences of 1 to 3 digits (with ([0-9]{1,3})+) and an optional sequence of a dot followed with 3 digits ((\.[0-9]{3})?) at the end of the string ($).
The [0-9]{1,3} will only match 1 to 3 digits.
See the regex demo.
You need to remove the 1 from the expression like : ^([0-9]{3})+(\.[0-9]{3})?$
The + after the first parenthesis allows for an arbitrary number of repeats. If you mean {1,3} then you don't need the + at all.
The reason this is happening is because of the + you have in the middle of your regex.
This means "one or more of the preceding element", thus it effectively means 1 one more ([0-9]{1,3}) and it must end with ([0-9]{3})?$

Match this regex on perl

I am fairly new with Perl, and even more so with regex.
Have been trying to match the following, but without success:
First, 3 to 4 letters (ideally case insensitive)
Optionally a space (but not mandatory)
Then, also optionally a known big-case letter (M) and a number out of 1,2,3
An example of a valid string would be abc, but also DEFG M2. Invalid would be mem M, for example
What I have so far is:
$myExpr ~= m/^[a-z,A-z]{3,4}M[1,2,3]$/i
Not sure how to make the M and numbers optional
Why don't you try the following regular expression for it:
$myExpr =~ m/^([a-zA-Z]{3,4})(\s|)(M|)([1-3]|)$/;
([a-zA-Z]{3,4}) - Group of any character in this class: [a-zA-Z] with 3 to 4 repetition.
(\s|) - Either there will be a white-space(space) or not.
(M|) - Either there will be a Uppercase M or not.
([1-3]|) - Either there will any charter this class: [1-3] or not.
(OR) Try the following
I personally recommend this
$myExpr =~ m/^([a-zA-Z]{3,4})(\s{0,1})(M{0,1})([1-3]{0,1})$/;
([a-zA-Z]{3,4}) - Group of any character in this class: [a-zA-Z] with 3 to 4 repetition i.e., it should contain minimum of 3 characters and maximum of 4.
(\s{0,1}) - Group of \s with 0 to 1 repetition i.e., it's optional.
(M{0,1}) - Group of character M with 0 to 1 repetition i.e., it's optional.
([1-3]{0,1}) - Group of any digit from 1 to 3 with 0 to 1 repetition i.e., it's optional.
Group your optional symbols with (?:) and use "zero or one" quantifier ?.
$myExpr =~ m/^[a-zA-Z]{3,4}(?: M[123])?$/
I've also fixed errors in your regexp: you don't use , in character classes - that'd literraly mean "match ,", fixed A-Z range and removed /i modifier, since you didn't say if you need lower case M and first range already covers both small and big letters.
You can use the following regex. You don't need to use comma inside character class []. And also remove i as you need to match with M.
$myExpr ~= m/^[a-zA-z]{3,4}(?: M[123])?$/
If you think your space is optional, then again add a ? after that space too (i.e. (?: ?M[123])).