Google sheets to regexextract 4 digits only. Do not extract from longer number, say 1234 from 123456 [duplicate] - regex

This question already has answers here:
regular expression to match exactly 5 digits
(5 answers)
Closed 27 days ago.
I have data where users enter 3 types of data:
random
4 digit / 6 digit
6 digit / 4 digit
I was looking for REGEXEXTRACT code in google sheets that will pick only 4 digit group. It will not be tricked by longer number, say "Article 1234124 Store 4444" it should return 4444 not 1234.
1234/1234567 --> 1234
1234567/5555 --> 5555
Article 1234124 Store 4444 -->4444
this would be a sample code, but google sheets does not support lookahead or lookbehind,
not a digit, 4 digits, not a digit:
=REGEXEXTRACT(A2,"(?<!\d)\d{4}(?!\d)")
thanks

You could use the following regex pattern:
(?:^|\D)(\d{4})(?:\D|$)
This pattern says to match:
(?:^|\D) Either a non digit or the start of the string
(\d{4}) A 4 digit number
(?:\D|$) Either a non digit or the end of the string
This ensures that we only match 4 digit numbers.
Sample code:
=REGEXEXTRACT(A2,"(?:^|\D)(\d{4})(?:\D|$)")

Related

Regex how to match everything that doesn't follow the format with 3 digits followed by a dash then 4 digits? [duplicate]

This question already has answers here:
Regex: match everything but a specific pattern
(6 answers)
Closed 8 months ago.
\d{3}-\d{4}
This one matches something like 123-4567
So I change it to ^(?!(\d{3}-\d{4})). But it doesn't match anything.
I got a list of phone numbers like
123-4567
NOW-4-WAX
12 345 67 89
I would like to match anything except the format xxx-xxxx where x is a digit.
If you add .* to the end of your regex, it should match any line that does not start with a phone number in the 123-4567 format.
Demo on RegExr: ^(?!(\d{3}-\d{4})).*$
It's worth noting that if you have anything in your input file that is not a phone number at all, this will match those too.

How to make regex that matches all possible episode numbers from a tv show file format? [duplicate]

This question already has answers here:
Regex for matching season and episode
(5 answers)
Closed 7 months ago.
I would like to create a regex expression that matches all possible episode numbering formats from a tv show file format.
I currently have this regex which matches most but not all of the list of examples.
(?:(?<=e)|(?<=episode)|(?<=episode[\.\s]))(\d{1,2})|((?<=-)\d{1,2})
The one it does not match is when there are two episodes directly after another e0102 should match 01 and 02.
You can find the regex example with test cases here
As per your comment, I went by following assumptions:
Episode numbers are never more than three digits long;
Episode strings will therefor have either 1-3 digits or 4 or 6 when its meant to be a range of episodes;
There is never an integer of 5 digits assuming the same padding would be used for both numbers in a range of episodes;
This would mean that lenght of either 4 or 6 digits needs to be split evenly.
Therefor, try the following:
e(?:pisode)?\s*(\d{1,3}(?!\d)|\d\d\d??)(?:-?e?(\d{1,3}))?(?!\d)
Here is an online demo. You'll notice I added some more samples to showecase the above assumptions.
e(?:pisode)?\s* - Match either 'e' or 'episode' with 0+ trailing whitespace characters;
(\d{1,3}(?!\d)|\d\d\d??) - A 1st capture group to catch 1-3 digits if not followed by any other digit or two digits;
(?:-?e?(\d{1,3}))? - An optional non-capture group with a nested 2nd capture group looking for optional hyphen and literal 'e' with trailing digits (1-3);
(?!\d) - There is no trailing digit left.

regex: Numbers and spaces (10 or 14 numbers)

How I can write a regex which accepts 10 or 14 digits separated by a single space in groups of 1,2 or 3 digits?
examples:
123 45 6 789 1 is valid
1234 567 8 9 1 is not valid (group of 4 digits)
123 45 6 789 109 123 8374 is not valid (not 10 or 14 digits)
EDIT
This is what I have tried so far
[0-9 ]{10,14}+
But it validates also 11,12,13 numbers, and doesn't check for group of numbers
You may use this regex with lookahead assertion:
^(?=(?:\d ?){10}(?:(?:\d ?){4})?$)\d{1,3}(?: \d{1,3})+$
RegEx Demo
Here (?=...) is lookahead assertion that enforces presence of 10 or 14 digits in input.
\d{1,3}(?: \d{1,3})+ matches input with 1 to 3 digits separated by space with no space allowed at start or end.
aggtr,
You can match your use case with the following:
^(?:\d\s?){10}$|^(?:\d\s?){14}$
^ means the beginning of the string and $ means the end of the string.
(?:...) means a non-capturing group. Thus, the part before the | means a string that starts and has a non-capturing group of a decimal followed by an optional space that has exactly 10 items followed by the end of the string. By putting the | you allow for either 10 or 14 of your pattern.
Edit I missed the part of your requirement to have the digits grouped by 1, 2, or 3 digits.

Iranian postal code validation

Please, I need to validate Iranian postal code using regex.
I write this regex for this case \b([^02\n\D]){4}[^5](\d){5} but its not working on rule number 5 and 7.
please help me to fix it.
this is some rules about this regex:
It's all numeric
10 digit count
don't use 0 in first 5 digit
don't use 2 in postal code
First 4 digit is not the same
The 5th digit cannot be 5
all digits aren't the same
The following regex satisifes your conditions:
\b(?!(\d)\1{3})[13-9]{4}[1346-9][013-9]{5}\b
Click for Demo
Explanation:
\b - a word boundary
(?!(\d)\1{3}) - negative lookahead to make sure that the 1st 4 digits are not the same.
[13-9]{4} - matches 4 occurrences of all the digits except 0 and 2
[1346-9] - matches a single digit that is not a 0,2 or 5
[013-9]{5} - matches 5 occurrences of all the digits except 2
\b - a word boundary

regex + capturing groups with varying conditions

working on the regex here https://regex101.com/r/wI2cG1/1
this is the data:
K'1234567
K'123456789
K'123456
I am interested in the digits after K'
I am looking to do this using regex but not sure if it can be done. What I want is:
if the number has 6 digits return the first 2 digits e.g. 12
if the number has 7 digits return the first 3 digits e.g. 123
if the number has 9 digits return the first 4 digits e.g. 1234
also
if the number has 10 or 11 digits return the first 3 digits e.g. 123
and I want to return these to different capturing group names or if possible the same capturing group name.
It's possible to maintain the results in one group using the branch reset feature:
K'(?|(\d{2,3})\d{4}|(\d{4})\d{5}|(\d{3})\d{7,8})\b
Regex Demo