A protein-coded gene Regular Expression - regex

I am trying to write a regex that can match the following instructions
A sequence of character with the “AT” prefix, followed by “nG” where n is a digit from 1 through 5 and then "G" and lastly followed by a suffix of 5 numeric digits.
Note: just the ordinary regular expression not language specific.
An example of a matching string is this: “AT1G01040”
Here is what I could construct AT[1-5]G(d\{1,5}) but I am not sure if it is the correct answer.
Please, I need your hand on this thanks.

If the number of digits at the end may be from 1 to 5, you may use
^AT[1-5]G[0-9]{1,5}$
See the regex demo.
Note that if the number of digits at the end must be exactly 5, you must remove 1,:
^AT[1-5]G[0-9]{5}$
Details
^ - start of string
AT - a sequence of chars AT
[1-5] - 1, 2, 3, 4 or 5
G - a G char
[0-9]{1,5} - any 1 to 5 consecutive occurrences of an ASCII digit (or - if you use {5} - exactly 5 occurrences)
$ - end of string.

Related

Is there anyway I can avoid to match a 9 digit number from the following rules with the regex i have?

I have the below regex which follows the following rules,
(?<!x)(?=(?:[._ –-]*\d){9})\d{2,}[._ –-]*\d{2,}[._ –-]*\d{2,}
Rules:
9 digit Order numbers should not get detected if 'X or x' precedes
the number. (WORKING FINE)
9 digit numbers, Non-numeric characters or whitespaces (up to 3) in
between numbers should also get matched. (WORKING FINE)
Below is regex demo which shows it matches the numbers with the above rules.
https://regex101.com/r/mrGcvp/1
Now, the regex pattern should not match the 9 digit numbers following the above rules if it comes under the below rules of exclusion.
Rules of exclusion,
The number should not be matched at all for the following rules.
If the number beginning with the number “9”
If the number “666” in positions 1 – 3.
If the number “000” in positions 1 – 3.
If the number “00” in positions 4 – 5.
if the number “0000” in positions 6 – 9
You can use
(?<!x)(?=(?:[._ –-]*\d){9})(?!9|66\D*6|00\D*0|(?:\d\D*){3}0\D*0|(?:\d\D*){5}0(?:\D*0){3})\d{2,}[._ –-]*\d{2,}[._ –-]*\d{2,}
See the regex demo.
The added part is (?!9|66\D*6|00\D*0|(?:\d\D*){3}0\D*0|(?:\d\D*){5}0(?:\D*0){3}) and it fails the match if, immediately to the right of the current location, right after the (?<!x) and (?=(?:[._ –-]*\d){9}) checks, there is
9| - a 9 digit, or
66\D*6| - 66, zero or more non-digits, 6, or
00\D*0| - 00, zero or more non-digits, 0, or
(?:\d\D*){3}0\D*0| - three occurrences of a digit and then zero or more non-digits, and then a 0, zero or more non-digits, 0, or
(?:\d\D*){5}0(?:\D*0){3}) - five occurrences of a digit and zero or more non-digits, 0, and then three occurrences of zero or more non-digits followed with a 0 char.
Note I used \D* instead of [._ –-]* that should be enough here, but if you want to make it more precise, you may replace each \D* with [._ –-]* .
You may use this regex:
(?<![xX])(?=(?:[._ –-]*\d){9})(?!9|666|000|.{3}00|.{5}0000)\d{2,}[._ –-]*\d{2,}[._ –-]*\d{2,}
RegEx Demo
To enforce all rule we have a negative lookahead:
(?!9|666|000|.{3}00|.{5}0000)
That does following:
9: Doesn't start with 9
666: Doesn't start with 666
000: Doesn't start with 000
.{3}00: Doesn't allow 00 in position 4-5
.{5}0000: Doesn't allow 0000 in position 6-9

Regex to match all permutations of {1,2,3} with repetition in the middle. Ex: 122223

I need a regex expression, which will search for all permutations of digits (1, 2, 3), where digit in the middle will occur one or many times.
For ex:
123
133332
21111113
312
13333332
I've tried this expression:
([1][2]+[3])|([1][3]+[2])|([2][1]+[3])|([2][3]+[1])|([3][2]+[1])|([3][1]+[2]))
Unfortunately it is slow, is there any way to make it more more efficient?
You may use
([1-3])(?!\1)([1-3])\2*(?!\1|\2)[1-3]
See the regex demo
Details
([1-3]) - Group 1: 1, 2 or 3
(?!\1)([1-3])\2* - a digit from 1 to 3 not equal to Group 1 value and then 0+ occurrences of the digit
(?!\1|\2)[1-3] - a digit from 1 to 3 not equal to Group 1 and 2 value
In case you need to match the whole string, add ^ at the start and $ at the end of the pattern.

RegEx: How can I match optional comma separated integers between 5 to 100?

I'm trying to match a comma separated string that can be empty. The following strings would pass:
""
"12"
"5,100"
"5,34,55,12"
"5,8,15,9,94"
The following would fail:
"2" - Because 2 is less than 5
"4,15" - Because 4 is less than 5
"87, 3" - Because 3 is less than 5
"39,7,23,62, 1" - Because 1 is less than 5
"25," - Because no number comes after the comma
Currently I have the the following regex: ^(\d+(,\d+)*)?$ which is able to match comma separated integers. What I'm not able to do is matching that all the integers are between 5 and 100.
Regex: ^((?:[5-9]|([1-9][0-9])|100)(,(?:[5-9]|([1-9][0-9])|100))*)?$ should check numbers between 5 - 100.
Demo
I was able to solve it using this regex:
^(([5-9]|[1-9][0-9]|100)(,([5-9]|[1-9][0-9]|100))*)?$
Breakdown:
^ marks the start.
$ marks the end of the string.
? makes the string optional i.e. one or zero occurrences of the string.
([5-9]|[1-9][0-9]|100) matches numbers 5 - 9 or 10 to 99 or 100 respectively.
(,([5-9]|[1-9][0-9]|100))* means that there can be zero or more occurrences of a comma followed by a number between 5 - 100.
This regex ^(?!.*(?:\b[1-4]\b|^\s*,|,\s*,|,\s*$)).*$ must exclude strings containing numbers 1, 2, 3, or 4.

Regex limiting a number string

I am trying to figure out how to use regex to pass a 6 digit number string. My trouble is the string can be any 6 digits, unless it starts with 12. So the first digit can be 1 but not if second digit is 2. The second digit can be 2, but not if the first is 1.
I tried this, ([^1])([^2])(\d{4}) but that does not take into account both digits, so it will block anything with a 2 in the second spot.
Thank you for any help.
You may use
^([02-9][0-9]|[0-9][013-9])[0-9]{4}$
See the regex demo
Details:
^ - start of string
([02-9][0-9]|[0-9][013-9]) - either of the two alternatives:
[02-9][0-9] - any digit but 1 and then any digit
| - or
[0-9][013-9] - any digit and then any digit but 2
[0-9]{4} - any 4 digits
$ - end of string.
Another way is to use a negative lookahead:
^(?!12)[0-9]{6}$
See another demo. Here, (?!12) fails the match if the first 2 digits are 12. The [0-9]{6} will match 6 digits.
Depending on the regex library/method, ^/$ anchors may not be required. Lookaheads are not always supported, too.

RegExp Quantifier to Match Pattern Exactly n or m Times Instead of n to m Times

I want to match 2 and 4 digit numbers. This RegExp is a obvious choice:
/[0-9]{2,4}/
However, this matches 3 Digit numbers. Is there a way around this in regexp?
You can use ([0-9]{2}){1,2}:
/\b([0-9]{2}){1,2}\b/
Above regular expression is not general; it was possible because 4 = 2 * 2.
More general solution is:
/\b[0-9]{2}\b|\b[0-9]{4}\b/
NOTE: \b (word boundary) was used to prevent matching 2 digits from 3 digits (or 5, 6, ... digit string).
You can use negative lookbehind/lookahead to accomplish this:
Here's one possible way to accomplish what you're trying to do.
/(?<!\d)(\d{2}|\d{4})(?!\d)/
This says - find a 2 or 4 digit number that is not preceded or followed by another number. This differs from the answer above in that it will match ALL 2 and 4 digit numbers including those that are not surrounded by spaces such as the "12" in the string "abc12def".
Which way you choose will depend on what in particular you are looking for.