Ruby regex to match a pattern followed by another pattern - regex

I already have a regex to match only single digits in a comma-delimited string. I need to update it to match the strings like following:
5|5,4,3
2|1,2 , 3
The constraints are
it should start with a single digit in range of 1-5, followed by a pipe character (|)
the string followed by the pipe character - it should be a single digit in range of 1-7, optionally followed by a comma. This pattern can be repetitive. For e.g. following strings are considered to be valid, after the pipe character:
"6"
"1,7"
"1,2,3, 4,6"
"1, 4,5,7"
However following strings are considered to be invalid
"8"
"8, 9,10"
I tried with following (a other variations)
\A[1-5]\|[1-7](?=(,|[1-7]))*
but it doesn't work as expected. For e.g. for sample string
5|5,4, 3, 10,5
it just matches
5|5
I need to capture the digit before pipe character and all the matching digits followed by the pipe character. For e.g. in following sample string 5|5,4, 3, 2, 1 the regex should capture
5
[5, 4, 3, 2, 1]
Note: I am using Ruby 2.2.1
Also do you mind letting me know what mistake I made in my regex pattern which was not making it work as expected?
Thanks.

You could try the below regex.
^([1-5])\|([1-7]\s*(?:,\s*[1-7])*)$
Example:
> "5|5,4, 3, 2, 1".scan(/^([1-5])\|([1-7]\s*(?:,\s*[1-7])*)$/)
=> [["5", "5,4, 3, 2, 1"]]
OR
> "5|5,4, 3, 2, 1".scan(/([1-5])\|([1-7] ?(?:, ?[1-7])*)$/)
=> [["5", "5,4, 3, 2, 1"]]

You can try the following regex that will match digits and a group of comma/space separated digits after a pipe:
^[1-5]\|(?:(?:[1-7]\s*,\s*)+\s*[1-7]?|[1-7])\b
Here is a demo.

Related

Regex to Match All Numbers Except Those in the First Word

I am having trouble crafting a regex. For example, in the string A123 4HEL5P6 789 I want to match all the numbers 4, 5, 6, 7, 8, 9 but not 1, 2, 3.
I have tried using negative look behind with the regex (?<!^\w)\d+ but this matches the numbers in the first word.
Edit: Any numbers in the first continuous sequence of characters should not be matched, the first continuous sequence being from start (^) to a whitespace (\s). In 09B8A HE1LP only 1 should be matched, not 0, 9, or 8, as these digits are in the first word.
If your dialect supports variable-length negative lookbehinds, then this should work:
r = /(?<!^\w*)\d/g
console.log(...'A123 4HEL5P6 789'.match(r))
Otherwise, you could use /^\w*|\d/g and discard the first match.

A protein-coded gene Regular Expression

I am trying to write a regex that can match the following instructions
A sequence of character with the “AT” prefix, followed by “nG” where n is a digit from 1 through 5 and then "G" and lastly followed by a suffix of 5 numeric digits.
Note: just the ordinary regular expression not language specific.
An example of a matching string is this: “AT1G01040”
Here is what I could construct AT[1-5]G(d\{1,5}) but I am not sure if it is the correct answer.
Please, I need your hand on this thanks.
If the number of digits at the end may be from 1 to 5, you may use
^AT[1-5]G[0-9]{1,5}$
See the regex demo.
Note that if the number of digits at the end must be exactly 5, you must remove 1,:
^AT[1-5]G[0-9]{5}$
Details
^ - start of string
AT - a sequence of chars AT
[1-5] - 1, 2, 3, 4 or 5
G - a G char
[0-9]{1,5} - any 1 to 5 consecutive occurrences of an ASCII digit (or - if you use {5} - exactly 5 occurrences)
$ - end of string.

Skip White spaces in a comma separated string when using regular expression to split the string

I'm using this [[:alnum:]]{0,}, regular expression to split this string by the comma 3,5,7 test, and getting following results.
Match 1 : 3,
Match 2 : 5,
Match 3 : test,
But Match 3: should be '7 test,'
How to change this repression to skip the white space and fetch the correct values.
select regexp_substr('3,5,7 test,','[^,]+', 1, level) from dual
connect by regexp_substr('3,5,7 test,', '[^,]+', 1, level) is not null;
This regex captures any character before a comma (including the comma) until it gets to a comma.
[^,]*,

Regex turn to negative validation

I have the following Regex that check if there is in the string 8 caracters (letters or numbers) followed by a space a number and a comma:
^.*[a-zA-Z0-9]{8,} \d*,.*$
The following Regex does not match the following:
Hello 23, abc 2, me 5,
But match the following:
My8Chara 12, abc 2,
I would like to reverse the Regex. I want the regex match if the string does NOT contain 8 characters followed by a space a number and a comma.
Does anyone knows how to reverse Regex ? I cannot use something like !Regex.IsMatch because I use a generic validator. I must write it in regular expression.
The desired output are :
"" -> match
"abc 123, def 234," -> match
"my8chara 123, only5 12" -> does not match -> it contains 8 characters followed by a space a number and a comma
Thanks in advance,
Raphaël
You could maybe use a negative lookahead like this:
^(?!.*[a-zA-Z0-9]{8,} \d*).+$
regex101 demo.
A negative lookahead has the format (?! ... ). If what's inside it matched, then the whole match will fail.
So, if there is .*[a-zA-Z0-9]{8,} \d* matched, the whole match fails.
EDIT: If you still want to match sentences with the structure Hello 23, abc 2, me 5,, then I would suggest this:
^(?!.*[a-zA-Z0-9]{8,} \d*).*(?:[a-zA-Z0-9]+ \d*,)?.*$
^(?!.*[a-zA-Z0-9]{8} \d+,)\w.*$
Live demo
To match empty strings too:
^(?!.*[a-zA-Z0-9]{8} \d+,).*$

Match letter followed by specific numeric range

I am writing a regular expression in which the string can be of 2-3 characters.
The first character has to be a Alphabet between A and H (capitals). This character has to be followed by a number between 1 and 12.
I wrote
[A-H]{1}[1-12]{1,2}
This is fine when I keyin A12 but not when I keyin A6
Please suggest.
You can't specify a range of digits like that because it is implemented as a range between characters, so [1-12] is equivalent to [12], which would only match either a 1 or a 2. Instead, try the following:
[A-H](?:1[012]|[1-9])
Here is an explanation:
[A-H] # one letter from A to H
(?: # start non-capturing group
1[012] # 1 followed by 0, 1, or 2 (10, 11, 12)
| # OR
[1-9] # one digit from 1 to 9
) # end non-capturing group
Note that the {1} after [A-H] in your original regex is unnecessary, [A-H]{1} and [A-H] are equivalent.
You may want to consider adding anchors to the regex, otherwise you would also get a partial match on a string like A20. If you are trying to match an entire string then you should use the following:
\A[A-H](?:1[012]|[1-9])\z
If it is within a larger text you could use word boundaries instead:
\b[A-H](?:1[012]|[1-9])\b
Here you go:
^[A-H]([1-9]|1[0-2])$
No need to for the {1} in your question.
The regex is anchored with ^ and $ meaning it can can be the only thing on your line.
It will not match A60 for example