Related
I am having trouble crafting a regex. For example, in the string A123 4HEL5P6 789 I want to match all the numbers 4, 5, 6, 7, 8, 9 but not 1, 2, 3.
I have tried using negative look behind with the regex (?<!^\w)\d+ but this matches the numbers in the first word.
Edit: Any numbers in the first continuous sequence of characters should not be matched, the first continuous sequence being from start (^) to a whitespace (\s). In 09B8A HE1LP only 1 should be matched, not 0, 9, or 8, as these digits are in the first word.
If your dialect supports variable-length negative lookbehinds, then this should work:
r = /(?<!^\w*)\d/g
console.log(...'A123 4HEL5P6 789'.match(r))
Otherwise, you could use /^\w*|\d/g and discard the first match.
I'm trying to use the range pattern [01-12] in regex to match two digit mm, but this doesn't work as expected.
You seem to have misunderstood how character classes definition works in regex.
To match any of the strings 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, or 12, something like this works:
0[1-9]|1[0-2]
References
regular-expressions.info/Character Classes
Numeric Ranges (have many examples on matching strings interpreted as numeric ranges)
Explanation
A character class, by itself, attempts to match one and exactly one character from the input string. [01-12] actually defines [012], a character class that matches one character from the input against any of the 3 characters 0, 1, or 2.
The - range definition goes from 1 to 1, which includes just 1. On the other hand, something like [1-9] includes 1, 2, 3, 4, 5, 6, 7, 8, 9.
Beginners often make the mistakes of defining things like [this|that]. This doesn't "work". This character definition defines [this|a], i.e. it matches one character from the input against any of 6 characters in t, h, i, s, | or a. More than likely (this|that) is what is intended.
References
regular-expressions.info/Brackets for Grouping and Alternation with the vertical bar
How ranges are defined
So it's obvious now that a pattern like between [24-48] hours doesn't "work". The character class in this case is equivalent to [248].
That is, - in a character class definition doesn't define numeric range in the pattern. Regex engines doesn't really "understand" numbers in the pattern, with the exception of finite repetition syntax (e.g. a{3,5} matches between 3 and 5 a).
Range definition instead uses ASCII/Unicode encoding of the characters to define ranges. The character 0 is encoded in ASCII as decimal 48; 9 is 57. Thus, the character definition [0-9] includes all character whose values are between decimal 48 and 57 in the encoding. Rather sensibly, by design these are the characters 0, 1, ..., 9.
See also
Wikipedia/ASCII
Another example: A to Z
Let's take a look at another common character class definition [a-zA-Z]
In ASCII:
A = 65, Z = 90
a = 97, z = 122
This means that:
[a-zA-Z] and [A-Za-z] are equivalent
In most flavors, [a-Z] is likely to be an illegal character range
because a (97) is "greater than" than Z (90)
[A-z] is legal, but also includes these six characters:
[ (91), \ (92), ] (93), ^ (94), _ (95), ` (96)
Related questions
is the regex [a-Z] valid and if yes then is it the same as [a-zA-Z]
A character class in regular expressions, denoted by the [...] syntax, specifies the rules to match a single character in the input. As such, everything you write between the brackets specify how to match a single character.
Your pattern, [01-12] is thus broken down as follows:
0 - match the single digit 0
or, 1-1, match a single digit in the range of 1 through 1
or, 2, match a single digit 2
So basically all you're matching is 0, 1 or 2.
In order to do the matching you want, matching two digits, ranging from 01-12 as numbers, you need to think about how they will look as text.
You have:
01-09 (ie. first digit is 0, second digit is 1-9)
10-12 (ie. first digit is 1, second digit is 0-2)
You will then have to write a regular expression for that, which can look like this:
+-- a 0 followed by 1-9
|
| +-- a 1 followed by 0-2
| |
<-+--> <-+-->
0[1-9]|1[0-2]
^
|
+-- vertical bar, this roughly means "OR" in this context
Note that trying to combine them in order to get a shorter expression will fail, by giving false positive matches for invalid input.
For instance, the pattern [0-1][0-9] would basically match the numbers 00-19, which is a bit more than what you want.
I tried finding a definite source for more information about character classes, but for now all I can give you is this Google Query for Regex Character Classes. Hopefully you'll be able to find some more information there to help you.
This also works:
^([1-9]|[0-1][0-2])$
[1-9] matches single digits between 1 and 9
[0-1][0-2] matches double digits between 10 and 12
There are some good examples here
The []s in a regex denote a character class. If no ranges are specified, it implicitly ors every character within it together. Thus, [abcde] is the same as (a|b|c|d|e), except that it doesn't capture anything; it will match any one of a, b, c, d, or e. All a range indicates is a set of characters; [ac-eg] says "match any one of: a; any character between c and e; or g". Thus, your match says "match any one of: 0; any character between 1 and 1 (i.e., just 1); or 2.
Your goal is evidently to specify a number range: any number between 01 and 12 written with two digits. In this specific case, you can match it with 0[1-9]|1[0-2]: either a 0 followed by any digit between 1 and 9, or a 1 followed by any digit between 0 and 2. In general, you can transform any number range into a valid regex in a similar manner. There may be a better option than regular expressions, however, or an existing function or module which can construct the regex for you. It depends on your language.
Use this:
0?[1-9]|1[012]
07: valid
7: valid
0: not match
00 : not match
13 : not match
21 : not match
To test a pattern as 07/2018 use this:
/^(0?[1-9]|1[012])\/([2-9][0-9]{3})$/
(Date range between 01/2000 to 12/9999 )
As polygenelubricants says yours would look for 0|1-1|2 rather than what you wish for, due to the fact that character classes (things in []) match characters rather than strings.
My solution to keep mm-yyyy is ^0*([1-9]|1[0-2])-(20[2-4][0-9])$
I already have a regex to match only single digits in a comma-delimited string. I need to update it to match the strings like following:
5|5,4,3
2|1,2 , 3
The constraints are
it should start with a single digit in range of 1-5, followed by a pipe character (|)
the string followed by the pipe character - it should be a single digit in range of 1-7, optionally followed by a comma. This pattern can be repetitive. For e.g. following strings are considered to be valid, after the pipe character:
"6"
"1,7"
"1,2,3, 4,6"
"1, 4,5,7"
However following strings are considered to be invalid
"8"
"8, 9,10"
I tried with following (a other variations)
\A[1-5]\|[1-7](?=(,|[1-7]))*
but it doesn't work as expected. For e.g. for sample string
5|5,4, 3, 10,5
it just matches
5|5
I need to capture the digit before pipe character and all the matching digits followed by the pipe character. For e.g. in following sample string 5|5,4, 3, 2, 1 the regex should capture
5
[5, 4, 3, 2, 1]
Note: I am using Ruby 2.2.1
Also do you mind letting me know what mistake I made in my regex pattern which was not making it work as expected?
Thanks.
You could try the below regex.
^([1-5])\|([1-7]\s*(?:,\s*[1-7])*)$
Example:
> "5|5,4, 3, 2, 1".scan(/^([1-5])\|([1-7]\s*(?:,\s*[1-7])*)$/)
=> [["5", "5,4, 3, 2, 1"]]
OR
> "5|5,4, 3, 2, 1".scan(/([1-5])\|([1-7] ?(?:, ?[1-7])*)$/)
=> [["5", "5,4, 3, 2, 1"]]
You can try the following regex that will match digits and a group of comma/space separated digits after a pipe:
^[1-5]\|(?:(?:[1-7]\s*,\s*)+\s*[1-7]?|[1-7])\b
Here is a demo.
I'm still learning regex and was hoping someone could tell me what this regex does exactly. Thank you.
\d{8,9}0101\d{3}
Breaking it apart:
\d{8,9}
That means either eight or nine digits (0-9).
0101
That means the literal string 0101
\d{3}
That means precisely three number digits.
You can use Expresso to Know more.
Youre regex means
1.Any digit of 8 or 9 repetation
2 then 0101
3 then any digit of exact 3 repetation
I would recommend to start from some source where theory could be found, later on using some tools where you can interactively check how this knowledge can be applied.
http://www.regular-expressions.info/posix.html <- This site contains information about POSIX standard for regular expressions.
Personally for testing matching I use rubular.com, but it references ruby's implementation of regexps. So it also depends on what regexp implementation you use.
In your case it is simple to answer and there should be no difference between different regexp implementations, though.
(A) \d{8,9} - a digit (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) repeated minimum 8 to maximum 9 times
(B) 0101 - literal string 0101
(C) \d{3} - 3 then any digit of exact 3 repetation
regex does = A + B + C
This finds 8 or 9 digits (numbers 0-9), followed by 0101 followed by exactly three digits...
(You should have been able to figure that out by searching!)
Autopsy:
\d{8,9} - a digit (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) repeated 8 to 9 times
0101 - a literal string of the characters 0101
\d{3} - a digit (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) repeated exactly 3 times
Note: repeated doesn't mean "the same character" but anything in the match. That means that "repeated exactly 3 times" for \d could be 111, 123, 989 etc.
I'm trying to use the range pattern [01-12] in regex to match two digit mm, but this doesn't work as expected.
You seem to have misunderstood how character classes definition works in regex.
To match any of the strings 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, or 12, something like this works:
0[1-9]|1[0-2]
References
regular-expressions.info/Character Classes
Numeric Ranges (have many examples on matching strings interpreted as numeric ranges)
Explanation
A character class, by itself, attempts to match one and exactly one character from the input string. [01-12] actually defines [012], a character class that matches one character from the input against any of the 3 characters 0, 1, or 2.
The - range definition goes from 1 to 1, which includes just 1. On the other hand, something like [1-9] includes 1, 2, 3, 4, 5, 6, 7, 8, 9.
Beginners often make the mistakes of defining things like [this|that]. This doesn't "work". This character definition defines [this|a], i.e. it matches one character from the input against any of 6 characters in t, h, i, s, | or a. More than likely (this|that) is what is intended.
References
regular-expressions.info/Brackets for Grouping and Alternation with the vertical bar
How ranges are defined
So it's obvious now that a pattern like between [24-48] hours doesn't "work". The character class in this case is equivalent to [248].
That is, - in a character class definition doesn't define numeric range in the pattern. Regex engines doesn't really "understand" numbers in the pattern, with the exception of finite repetition syntax (e.g. a{3,5} matches between 3 and 5 a).
Range definition instead uses ASCII/Unicode encoding of the characters to define ranges. The character 0 is encoded in ASCII as decimal 48; 9 is 57. Thus, the character definition [0-9] includes all character whose values are between decimal 48 and 57 in the encoding. Rather sensibly, by design these are the characters 0, 1, ..., 9.
See also
Wikipedia/ASCII
Another example: A to Z
Let's take a look at another common character class definition [a-zA-Z]
In ASCII:
A = 65, Z = 90
a = 97, z = 122
This means that:
[a-zA-Z] and [A-Za-z] are equivalent
In most flavors, [a-Z] is likely to be an illegal character range
because a (97) is "greater than" than Z (90)
[A-z] is legal, but also includes these six characters:
[ (91), \ (92), ] (93), ^ (94), _ (95), ` (96)
Related questions
is the regex [a-Z] valid and if yes then is it the same as [a-zA-Z]
A character class in regular expressions, denoted by the [...] syntax, specifies the rules to match a single character in the input. As such, everything you write between the brackets specify how to match a single character.
Your pattern, [01-12] is thus broken down as follows:
0 - match the single digit 0
or, 1-1, match a single digit in the range of 1 through 1
or, 2, match a single digit 2
So basically all you're matching is 0, 1 or 2.
In order to do the matching you want, matching two digits, ranging from 01-12 as numbers, you need to think about how they will look as text.
You have:
01-09 (ie. first digit is 0, second digit is 1-9)
10-12 (ie. first digit is 1, second digit is 0-2)
You will then have to write a regular expression for that, which can look like this:
+-- a 0 followed by 1-9
|
| +-- a 1 followed by 0-2
| |
<-+--> <-+-->
0[1-9]|1[0-2]
^
|
+-- vertical bar, this roughly means "OR" in this context
Note that trying to combine them in order to get a shorter expression will fail, by giving false positive matches for invalid input.
For instance, the pattern [0-1][0-9] would basically match the numbers 00-19, which is a bit more than what you want.
I tried finding a definite source for more information about character classes, but for now all I can give you is this Google Query for Regex Character Classes. Hopefully you'll be able to find some more information there to help you.
This also works:
^([1-9]|[0-1][0-2])$
[1-9] matches single digits between 1 and 9
[0-1][0-2] matches double digits between 10 and 12
There are some good examples here
The []s in a regex denote a character class. If no ranges are specified, it implicitly ors every character within it together. Thus, [abcde] is the same as (a|b|c|d|e), except that it doesn't capture anything; it will match any one of a, b, c, d, or e. All a range indicates is a set of characters; [ac-eg] says "match any one of: a; any character between c and e; or g". Thus, your match says "match any one of: 0; any character between 1 and 1 (i.e., just 1); or 2.
Your goal is evidently to specify a number range: any number between 01 and 12 written with two digits. In this specific case, you can match it with 0[1-9]|1[0-2]: either a 0 followed by any digit between 1 and 9, or a 1 followed by any digit between 0 and 2. In general, you can transform any number range into a valid regex in a similar manner. There may be a better option than regular expressions, however, or an existing function or module which can construct the regex for you. It depends on your language.
Use this:
0?[1-9]|1[012]
07: valid
7: valid
0: not match
00 : not match
13 : not match
21 : not match
To test a pattern as 07/2018 use this:
/^(0?[1-9]|1[012])\/([2-9][0-9]{3})$/
(Date range between 01/2000 to 12/9999 )
As polygenelubricants says yours would look for 0|1-1|2 rather than what you wish for, due to the fact that character classes (things in []) match characters rather than strings.
My solution to keep mm-yyyy is ^0*([1-9]|1[0-2])-(20[2-4][0-9])$