trying to understand what this regex means [duplicate] - regex

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
Trying to understand what the below regex means.
/^[0-9]{2,3}[- ]{0,1}[0-9]{3}[- ]{0,1}[0-9]{3}$/
Sorry not exactly a coding question.

Let's break this regex into a few different parts:
^: asserts position at start of the string
[0-9]{2,3}: Match a number between 0 and 9, between 2 and 3 times
[- ]{0,1} Matches a dash between zero and one times (Optional dash)
[0-9]{3}: Match a number between 0 and 9, exactly 3 times
[- ]{0,1} Matches a dash between zero and one times (Optional dash)
[0-9]{3}: Match a number between 0 and 9, exactly 3 times
$: asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
Here are a few strings that would pass this regex:
123-123-123
123123123
12-123-123
12123123
Here's a good resource to learn/test regexes: regex101.com

It matches two or three digits followed by (optionally) a dash or space, then 3 digits, again optional dash or space and 3 digits. It seems to try to match a telephone number written in different formats.

Related

How to make regex that matches all possible episode numbers from a tv show file format? [duplicate]

This question already has answers here:
Regex for matching season and episode
(5 answers)
Closed 7 months ago.
I would like to create a regex expression that matches all possible episode numbering formats from a tv show file format.
I currently have this regex which matches most but not all of the list of examples.
(?:(?<=e)|(?<=episode)|(?<=episode[\.\s]))(\d{1,2})|((?<=-)\d{1,2})
The one it does not match is when there are two episodes directly after another e0102 should match 01 and 02.
You can find the regex example with test cases here
As per your comment, I went by following assumptions:
Episode numbers are never more than three digits long;
Episode strings will therefor have either 1-3 digits or 4 or 6 when its meant to be a range of episodes;
There is never an integer of 5 digits assuming the same padding would be used for both numbers in a range of episodes;
This would mean that lenght of either 4 or 6 digits needs to be split evenly.
Therefor, try the following:
e(?:pisode)?\s*(\d{1,3}(?!\d)|\d\d\d??)(?:-?e?(\d{1,3}))?(?!\d)
Here is an online demo. You'll notice I added some more samples to showecase the above assumptions.
e(?:pisode)?\s* - Match either 'e' or 'episode' with 0+ trailing whitespace characters;
(\d{1,3}(?!\d)|\d\d\d??) - A 1st capture group to catch 1-3 digits if not followed by any other digit or two digits;
(?:-?e?(\d{1,3}))? - An optional non-capture group with a nested 2nd capture group looking for optional hyphen and literal 'e' with trailing digits (1-3);
(?!\d) - There is no trailing digit left.

Regex for excluding strings that start with consecutive leading zeroes or are only alphabets [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I am looking for a regex to select only the strings that are not starting with consecutive zeroes or consecutive alphabets before underscore in below strings.
For ex:
ABC_DE-001 is invalid
abc is invalid (only alphabets)
0_DE-001 is invalid (1 zero before underscore)
000_DE-001 is invalid (sequence of 3 consecutive zeroes)
00_DE-001 is invalid (sequence of 2 consecutive zeroes)
01_DE-001 is valid (0 followed by some other number is valid)
10_DE-001 is valid (starts with 1)
100_DE-001 is valid (starts with 1)
One of the approach I tried was:
(0[1-9]+|[1-9][0-9]+|0[0*$][1-9])_[A-Z0-9]+[-][0-9]{3}
I am not sure though if any scenario is missed with this. Also, how can the same thing be achieved using negative or positive lookaround?
For your examople data, you might match using an optional zero ^0? as that can occur but not more than 1 zero.
^0?[1-9][0-9]*_[A-Z]+-[0-9]{3}$
Regex demo
That will match
^0? An optional zero at the start of the string
[1-9][0-9]* Match a digit 1-9 followed by 0+ digits
_[A-Z]+ Match an _ followed by 1+ times A-Z
-[0-9]{3} Match-` followed by 3 digits
$ Assert the end of the string
You can try with negative look ahead groups:
grep -Pi '^(?![a-z]+(?:_|$|\s)|0+(?:_|$|\s))' test.txt
Explanation:
-Pi - use PCRE and process ignore case. This is grep specific, you can adapt these options to your case. If you cannot make the regex processor to ignore case, just replace [a-z] with [a-zA-Z]. And of course, PCRE support is required.
^ - beginning of the line
(?!rgx) - look forward without moving the cursor to check the line doesn't match the enclosed regular expression rgx.
[a-z]+(?:_|$|\s)|0+(?:_|$|\s) :
don't keep consecutive letters ([a-z]+) followed by an underscore, and end of line or a blank character ((?:_|$|\s))
don't keep consecutive zeroes (0+) followed by an underscore, and end of line or a blank character ((?:_|$|\s))
(?:) stands for a non capturing group (got content is not stored, use it if so to improve performances)
Output got:
01_DE-001 is valid (0 followed by some other number is valid)
10_DE-001 is valid (starts with 1)
100_DE-001 is valid (starts with 1)
Since grep only keeps valid lines (default behavior), non displayed lines were processed as invalid.

Regex quantifier not restricting match [duplicate]

This question already has an answer here:
Restricting character length in a regular expression
(1 answer)
Closed 4 years ago.
I would like to match 1 or more capital letters, [A-Z]+ followed by 0 or more numbers, [0-9]* but the entire string needs to be less than or equal to 8 characters in total.
No matter what regex I come up with the total length seems to be ignored. Here is what I've tried.
^[A-Z]+[0-9]*{1,8}$ //Range ignored, will not work on regex101.com but will on rubular.com/
^([A-Z]+[0-9]*){1,8}$ //Range ignored
^(([A-Z]+[0-9]*){1,8})$ //Range ignored
Is this not possible in regex? Do I just need to do the range check in the language I'm writing in? That's fine but I thought it would be cleaner to keep in all in regex syntax. Thanks
The behaviour is expected. When you write the following pattern:
^([A-Z]+[0-9]*){1,8}$
The {1,8} quantifier is telling the regex to repeat the previous pattern, therefore the capturing group in this case, between one to eight times. Due to the greedyness of your operators, you will match and capture indefinitely.
You need to use a lookahead to obtain the desired behaviour:
^(?=.{1,8}$)[A-Z]+[0-9]*$
^ Assert beginning of string.
(?=.{1,8}$) Ensure that the string that follows is between one and eight characters in length.
[A-Z]+[0-9]*$ Match any upper case letters, one or more, and any digits, zero or more.
$ Asserts position end of string.
See working demo here.
The regex ^([A-Z]+[0-9]*){1,8}$ would match [A-Z]+[0-9]* 1 - 8 times. That would match for example a repetition of 8 times A1A1A1A1A1A1A1A1 but not a repetition of 9 times A1A1A1A1A1A1A1A1A1
You might use a positive lookahead (?=[A-Z0-9]{1,8}$) to assert the length of the string:
^(?=[A-Z0-9]{1,8}$)[A-Z]+[0-9]*$
That would match
^ From the start of the string
(?=[A-Z0-9]{1,8}$) Positive lookahead to assert that what follows matches any of the characters in the character class [A-Z0-9] 1 - 8 times and assert the end of the string.
[A-Z]+[0-9]*$ Match one or more times an uppercase character followed by zero or more times a digit and assert the end of the string. $

Regex validate number range [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 6 years ago.
I need to validate string input with regex, rules are:
String should not be number less than 2 and not bigger than 9999 (2-9999)
String should not have zeros before number (ex: no 0002, 0022, 0222)
I really need to accomplish this by regex so any other solution is not acceptable.
Try this:
/^[2-9]|[1-9][0-9]{1,3}$/
To implement your first condition:
String should not be number less than 2 and not bigger than 9999 (2-9999)
There is two cases:
Single digits : [2-9] This is a single character in the range between 2 and 9.
Multiple digits: [1-9][0-9]{1,3} This is a two-three-four-digit number which all digits are in the range 1 and 9.
Note1: {1,3} limits second character class to just accept one or two or three digits.
Note2: ^ means start of string and $ means end of string.
By the way, your second condition isn't defined in pattern above at all. (I mean it doesn't match any number which stars with 0, So all fine.)
Try this
^(?!0|1$)\d{1,4}$
Regex demo
Explanation:
^\d{1,4}$: matches 0-9999
(?!0)...: not have zeros before number (ex: no 0002, 0022, 0222)
(?!1$)...: not be number less than 2 (==1)
(?!…): Negative lookahead sample
\d: One digit from 0 to 9 sample
^: Start of string or start of line depending on multiline mode
$: End of string or end of line depending on multiline mode

RegEx for checking if number is less or greater than [duplicate]

This question already has answers here:
Regex for number check below a value
(6 answers)
Closed 8 years ago.
I need expression to check if number 7 is less than 30 and greater than 1. This is an example. Can anybody provide expression and an explanation?
^([2-9]|[1-2][0-9])$
The expression above will match, if:
the given string is one character long and that character is a number ranging from 2 to 9
the given string is two characters long, first character is 1 or 2 and the second character ranges from 0 to 9
Don't use regex, but if you want to, here you go:
^(?:[2-9]|[1-2][0-9])$
Debuggex Demo
Explanation:
This anchors to the beginning/end of the string (so we don't match 7 in the number 175) and then all of the logic happens in the non-capturing group. Either match the numbers [2-9 ] (greater than 1) OR match [1-2] followed by any digit [0-9] (range from 10-29). Notice that I used [0-9] instead of \d because it fits better for readability and \d technically will match other numeric characters (Arabic, etc).
Side note, if you want to allow leading 0's (1 < 007 == 7 < 30), you can allow for 0+ 0's after the start of the string:
^0*(?:[2-9]|[1-2][0-9])$