How to make regex that matches all possible episode numbers from a tv show file format? [duplicate] - regex

This question already has answers here:
Regex for matching season and episode
(5 answers)
Closed 7 months ago.
I would like to create a regex expression that matches all possible episode numbering formats from a tv show file format.
I currently have this regex which matches most but not all of the list of examples.
(?:(?<=e)|(?<=episode)|(?<=episode[\.\s]))(\d{1,2})|((?<=-)\d{1,2})
The one it does not match is when there are two episodes directly after another e0102 should match 01 and 02.
You can find the regex example with test cases here

As per your comment, I went by following assumptions:
Episode numbers are never more than three digits long;
Episode strings will therefor have either 1-3 digits or 4 or 6 when its meant to be a range of episodes;
There is never an integer of 5 digits assuming the same padding would be used for both numbers in a range of episodes;
This would mean that lenght of either 4 or 6 digits needs to be split evenly.
Therefor, try the following:
e(?:pisode)?\s*(\d{1,3}(?!\d)|\d\d\d??)(?:-?e?(\d{1,3}))?(?!\d)
Here is an online demo. You'll notice I added some more samples to showecase the above assumptions.
e(?:pisode)?\s* - Match either 'e' or 'episode' with 0+ trailing whitespace characters;
(\d{1,3}(?!\d)|\d\d\d??) - A 1st capture group to catch 1-3 digits if not followed by any other digit or two digits;
(?:-?e?(\d{1,3}))? - An optional non-capture group with a nested 2nd capture group looking for optional hyphen and literal 'e' with trailing digits (1-3);
(?!\d) - There is no trailing digit left.

Related

Regex for matching numbers with optional decimal digits [duplicate]

This question already has answers here:
Regular expression for floating point numbers
(20 answers)
Closed 3 years ago.
I am trying to extract rating from a tweet using regular expression. For example for below tweet, I want to get the user rating(9.75) and maximum rating(10).
This is Logan, the Chow who lived. He solemnly swears he's up to lots of good. 9.75/10
I used below regex, but the capture groups 1 and 2 has results 75 and 10. I am not sure why the user rating is captured only after decimal group.
.*(\d+\.?\d+)\/(\d*\.?\d*)
If you want both numbers to have optional decimal you should place the match one or ore + and the match zero or more * on the correct places, where they match the mandatory leading digit and then the optional decimals
(\d+\.?\d+)\/(\d*\.?\d*)
with
(\d+\.?\d*)\/(\d+\.?\d*)
This will match at least one digit followed by maybe a dot and then again maybe some more digits.
Live link: https://regex101.com/r/qc5Zwz/1
\b(\d+(?:\.\d+)?)\/(\d+)\b
\b - expect a word boundary (eg, space, non-letter character)
( - start capturing the 'rating'
\d+ - integer part
(?:\.\d+)? - wrap the decimal part, don’t capture it as a group; make it optional
) - end of 'rating' capturing group
\/- expect a forward slash
(\d+) - capture the 'maximum'
\b - expect a word boundary again
const text = 'This is Logan, the Chow who lived. He solemnly swears he\'s up to lots of good. 9.75/10'
const pattern = /\b(\d+(?:\.\d+)?)\/(\d+)\b/
console.log(text.match(pattern))
https://regex101.com/r/foO1DF/2

trying to understand what this regex means [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 3 years ago.
Trying to understand what the below regex means.
/^[0-9]{2,3}[- ]{0,1}[0-9]{3}[- ]{0,1}[0-9]{3}$/
Sorry not exactly a coding question.
Let's break this regex into a few different parts:
^: asserts position at start of the string
[0-9]{2,3}: Match a number between 0 and 9, between 2 and 3 times
[- ]{0,1} Matches a dash between zero and one times (Optional dash)
[0-9]{3}: Match a number between 0 and 9, exactly 3 times
[- ]{0,1} Matches a dash between zero and one times (Optional dash)
[0-9]{3}: Match a number between 0 and 9, exactly 3 times
$: asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
Here are a few strings that would pass this regex:
123-123-123
123123123
12-123-123
12123123
Here's a good resource to learn/test regexes: regex101.com
It matches two or three digits followed by (optionally) a dash or space, then 3 digits, again optional dash or space and 3 digits. It seems to try to match a telephone number written in different formats.

Regex check for specific phone phone numbers or extensions

String to be evaluated will be either be a 10 digit number or a 4 digit number.
5551119900 (10 Digit)
9999 (4 Digit)
Need regex to test for specific list of 10 digit numbers or 4 digit numbers. I have the following Regex that almost works
55511199(00|01|02|10|20|30)|(0000|9901|9902|9903|9999)
Above is checking for
5551119900
5551119901
5551119902
5551119910
5551119920
5551119930
0000
9901
9902
9903
9999
ISSUE:
(1) Need match to be exactly 10 digits or 4 digits only.
(2) Pattern match (see link below) is showing an exact match and also a "Group 1". I'm not sure what the group match means or if that is a good thing.
Sample: https://regex101.com/r/BbplFG/1/
Try this version of your regex:
^(?:55511199(?:00|01|02|10|20|30)|(?:0000|9901|9902|9903|9999))$
Demo
I have made several changes here:
Used ?: inside terms in parentheses, to turn off group capturing
Placed the entire pattern inside parentheses
Added starting (^) and ending ($) anchors around the entire pattern

Regex - Exactly 7 digits no more no less

I am looking for help here. I want to write a regex to help me find EXACTLY a 7 digit in string - no more or less.
For instance in this string:
1234567 RE:TKT-2744870-R6P1G0: Gentle Reminder
It should return only 1234567
In this one:
12345678 RE:TKT-2744870-R6P1G0: Gentle Reminder
It should return none.
Can you help me with this one.
thanks in advance.
The proper regex should include \d{7} (7 digits) and 2 "border criteria",
for both start and end of the match, to block matching of a fragment
from longer sequence of digits.
My first thought was that neither before nor after the match there can be any digit.
But as I see from your example, these border criteria should be extended.
The set of "forbidden" chars (either before or after the match) should
include also - and letters.
E.g. 2744870 in your example data contains just 7 digits (no more, no less),
but you still don't want it to be matched, apparently because they are surrounded with - chars.
To keep the regex short, I propose:
(?<![\w-])\d{7}(?![\w-])
Details:
(?<![\w-]) - Negative lookbehind for word char or -.
\d{7} - 7 digits.
(?![\w-]) - Negative lookahead for word char or -.
If you decide to extend the set of "forbidden" chars in both border criteria,
just add them to [...] fragments in lookbehind / lookahead (but - char
should remain at the end, otherwise it must be quoted with \).
Regex like (\d{7})[^\d] (in other proposition) is wrong,
as it matches last 7 digits from any longer sequence of digits
(no "front border criterion").
It matches also both 2744870 (surronded with - chars), which are not
to be matched.
This one should do for your examples:
(\d{7})[^\d]
The first matching group contains the seven digits.
Alternatively –as suggested in the comments– you can use a negative lookahead to only match the seven digits and not require matching groups:
^\d{7}(?!\d)

Regex quantifier not restricting match [duplicate]

This question already has an answer here:
Restricting character length in a regular expression
(1 answer)
Closed 4 years ago.
I would like to match 1 or more capital letters, [A-Z]+ followed by 0 or more numbers, [0-9]* but the entire string needs to be less than or equal to 8 characters in total.
No matter what regex I come up with the total length seems to be ignored. Here is what I've tried.
^[A-Z]+[0-9]*{1,8}$ //Range ignored, will not work on regex101.com but will on rubular.com/
^([A-Z]+[0-9]*){1,8}$ //Range ignored
^(([A-Z]+[0-9]*){1,8})$ //Range ignored
Is this not possible in regex? Do I just need to do the range check in the language I'm writing in? That's fine but I thought it would be cleaner to keep in all in regex syntax. Thanks
The behaviour is expected. When you write the following pattern:
^([A-Z]+[0-9]*){1,8}$
The {1,8} quantifier is telling the regex to repeat the previous pattern, therefore the capturing group in this case, between one to eight times. Due to the greedyness of your operators, you will match and capture indefinitely.
You need to use a lookahead to obtain the desired behaviour:
^(?=.{1,8}$)[A-Z]+[0-9]*$
^ Assert beginning of string.
(?=.{1,8}$) Ensure that the string that follows is between one and eight characters in length.
[A-Z]+[0-9]*$ Match any upper case letters, one or more, and any digits, zero or more.
$ Asserts position end of string.
See working demo here.
The regex ^([A-Z]+[0-9]*){1,8}$ would match [A-Z]+[0-9]* 1 - 8 times. That would match for example a repetition of 8 times A1A1A1A1A1A1A1A1 but not a repetition of 9 times A1A1A1A1A1A1A1A1A1
You might use a positive lookahead (?=[A-Z0-9]{1,8}$) to assert the length of the string:
^(?=[A-Z0-9]{1,8}$)[A-Z]+[0-9]*$
That would match
^ From the start of the string
(?=[A-Z0-9]{1,8}$) Positive lookahead to assert that what follows matches any of the characters in the character class [A-Z0-9] 1 - 8 times and assert the end of the string.
[A-Z]+[0-9]*$ Match one or more times an uppercase character followed by zero or more times a digit and assert the end of the string. $