Multiple capture groups - all optional - regex

I know this (or similar) has been asked a hundred times - but I really need help now :D
The strings the regex should match.
Note: n is in the range of INTEGER_MIN - INTEGER_MAX
{number}
{number(1-n)}
{number(1-n,-n-n)}
{number(1-n,-n-n,0-n)}
If the pattern matches it should result in 3 seperate capture groups, with this results.
All groups should be optional - so that if request in for example Java they return null.
1: 1-n
2: -n-n
3: 0-n
What I've tried:
\{number(?:\(([1-9])(?:(?:,)([0-9])){0,2}\))?\}
This obviously isn't right and is only containing 2 groups (m.groupCount())

Okay, from what I deduced, I would do this:
\{number(?:\((\-?\d+)(?:\,(\-?\d+))?(?:\,(\-?\d+))?\))?\}
Then carry out operations on the captured groups to valid the range of the integers such as...
[Pseudo code since I don't know what language you are using]
captured integers = "capture1", "capture2", "capture3"
if{("capture1" < "capture2" && "capture1" > "capture3") ||
("capture1" > "capture2" && "capture1" < "capture3")} {
Do something
} else {
Do something else; like reject or throw error
}

Related

RegEx for matching 3 alphabets and 1-2 digits

I am trying to write a regular expression to find a match in a text having at least 100 characters. The match should be like this - Any sub string within a string that contains at least 3 alphabet to begin with, at least 1 digit following it and a maximum of 2 digits following the 3 letters.
Examples -
abcjkhklfdpdn24hjkk - In this case I want to extract pdn24
hjdksfkpdf1lkjk - In this case I want to extract pdf1
hjgjdkspdg34kjfs dhj khk678jkfhlds1 - In this case I want both pdg34 and lds1
How do I write a regex for this ? The length of the starting letters for a match is always 3 and the digits length can be either 1 or 2 (not more not less)
This is what works if there are 2 digits after the 3 letter string.
[A-Za-z]{3}[0-9]{2}
But the length of the digits can vary between 1 and 2. How do I include the varying length in the regex?
The expression we wish to design is quite interesting. We can first add your original expression with a slight modification in a capturing group, then we should think of left and right boundaries around it. For instance, on the right we might want to use \D:
([A-Za-z]{3}[0-9]{1,2})\D
DEMO 1
We can surely define an exact restricted expression. However, this might just work.
Based on Cary Swoveland's advice, we can also use this expression, which is much better:
\p{L}{3}\d{1,2}(?!\d)
Test
re = /([A-Za-z]{3}[0-9]{1,2})\D/m
str = 'abcjkhklfdpdn24hjkk
hjdksfkpdf1lkjk
hjgjdkspdg34kjfs dhj khk678jkfhlds1 '
# Print the match result
str.scan(re) do |match|
puts match.to_s
end
This script shows how the capturing group works:
const regex = /([A-Za-z]{3}[0-9]{1,2})\D/gm;
const str = `abcjkhklfdpdn24hjkk
hjdksfkpdf1lkjk
hjgjdkspdg34kjfs dhj khk678jkfhlds1 `;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
At least 3 alphabets: [a-zA-Z]{3,}
1 or 2 digits (not more not less): [0-9]{1,2}
This gives us:
/[a-zA-Z]{3,}[0-9]{1,2}/

How do you find 3 UNIQUE digits in a string of digits?

I am trying to write a regex that is very specific. I want to find 3 digits in a list. The issue comes because I do not care about repeating digits (5, 555, and 55555555555555 are seen as 5). Also, within the 3 digits, they need to be 3 different digits (123 = good, 311 = bad).
Here is what I have so far to find 3 digits, ignoring repeats but it does not specify 3 unique digits.
^(?:([0]{1,}|[1]{1,}|[2]{1,}|[3]{1,}|[4]{1,}|[5]{1,}|[6]{1,}|[7]{1,}|[8]{1,}|[9]{1,}|[0]{1,})(?!.*\\1)){3}$<p>
Here is an example of the types of data I see.
Matching:
458
3333335555111
2222555111
222255558888
111147
9533333333
And not matching:
999999999
222252
888887
Right now my regex will find all of these. How can I ignore any that do not have 3 unique digits?
If your regex-tool of choice supports look-behinds, back-references and possesive matching you could use
^(\d)\1*+(?!.*\1)(\d)\2*+(\d)\3*+$
^ and $ are anchors to ensure, that we check the whole string
(\d) matches a digit into a first capturing group, with \1*+ we possesively match any following occurences of this digit and use the lookbehind (?!.*\1) to ensure, that it doesn't end with that number.
(\d)\2*+ then matches the next different digit, again matching any following occurences possesively (check 122 without the possesive matching to see, why I use it here)
(\d)\3*+ matches the last digit with any following occurences.
Without possesive matching you could make more use of look-behinds, like ^(\d)\1*(?!.*\1)(\d)\2*(?!.*\2)(\d)\3*+$
See https://regex101.com/r/pV2tB2/2 for a demo.
Site Note: Regex might not be the best for this, but as you specifically asked for it - here you are.
This can be done with regex, but it's not the best tool for your work.
Instead of a regex-only approach, you can easily achieve this using Python.
Example:
strings = ['458', '3333335555111', '2222555111', '222255558888', '111147', '9533333333', '955555555', '12222211']
for s in strings:
if len(set(list(s))) == 3:
print "Ok :", s
else:
print "Error :", s
Output:
>> Ok : 458
>> Ok : 3333335555111
>> Ok : 2222555111
>> Ok : 222255558888
>> Ok : 111147
>> Ok : 9533333333
>> Error : 955555555
>> Error : 12222211
I've used the following commands while iterating over the strings inside that list:
list()
set()
len()
Using negative lookahead, this should match any string of digits that contains at least 3 unique digits /^(\d)\1*(?!\1)(\d)(?:\2|\1)*(?!\2|\1)(\d)+$/
(\d) - Match a digit
\1* - Allow that digit to repeat
(?!\1) - Make sure that's followed by a digit that does not match the first match
(\d) - Match the new digit
(?:\2|\1)* - Allow repeats of either the first or second digit
(?!\2|\1) - Make sure that's followed by a digit that does not match the first or second match
(\d)+ - Capture the third unique digit, then allow any number of digits of any kind to follow
I'm not sure if an awk script will do it for you, but here it goes:
awk '
function match_func(num) {
if (match_array[num] == 0)
match_array[num] = 1;
}
{
for (i = 0; i < length($1); i++)
match_func(substr($1, i, 1));
for (i = 0; i < 10; i++)
if (match_array[i] == 1) match_sum++;
if (match_sum == 3)
print $1;
}'

Regular expression for comma seperated name search with wild card

Right now I am using multiple if conditions to valid the input for search by name with wildcard(*). Since I have multiple 'if' with inner 'if' statements I am trying to use regular expression to validate my input. I want to use this expression in both front end and back end.
Appreciate if anyone can help.
Validating rules are follow
Input is last name, first name i.e. separated by comma.
Must have at least two characters while using wild card search.
Valid wildcard character is '*' only.
At most two wildcard characters can be used.
No consecutive wild cards.
If no wild card used no constraint on length of characters in both last and first name.
Some of the valid inputs are:
- hopkins, johns
- h, j
- ho*, jp*
- *ins, johns
- *op*, john*
Some of the invalid inputs are:
- hopkins johns
- h*, johns
- hop**, joh*
- h*pk*n*
If regular expression not going to be complex we can consider this as valid otherwise it OK to consider this as invalid
- ho*in*, jo*
In short general name format is
[*]XX[*], [*]XX[*]
where [] ==> Optional
X ==> A-Z, a-z
XX ==> length 2 or more if wild card used
You can use this regex
\*?[a-zA-Z]{2,}\*?, \*?[a-zA-Z]{2,}\*?
The before doing validation with the above regex, just do something like match the number of * with the regex /\*/g and make sure it's length is between 0 to 2.
With the help of #Amit_Joki answer I wrote the following code and its working fine.
var nameArray = [...];
var re = /\*?[a-zA-Z]{2,}\*?, \*?[a-zA-Z]{2,}\*?/;
for (var i = 0; i < nameArray.length; i++) {
if(nameArray[i].indexOf(',') < 0 ||
(nameArray[i].indexOf('*') >= 0 && !re.test(nameArray[i]))) {
console.log(nameArray[i] + ": Invalid");
} else {
console.log(nameArray[i] + ": Valid");
}
}

use regular expression to find and replace but only every 3 characters for DNA sequence

Is it possible to do a find/replace using regular expressions on a string of dna such that it only considers every 3 characters (a codon of dna) at a time.
for example I would like the regular expression to see this:
dna="AAACCCTTTGGG"
as this:
AAA CCC TTT GGG
If I use the regular expressions right now and the expression was
Regex.Replace(dna,"ACC","AAA") it would find a match, but in this case of looking at 3 characters at a time there would be no match.
Is this possible?
Why use a regex? Try this instead, which is probably more efficient to boot:
public string DnaReplaceCodon(string input, string match, string replace) {
if (match.Length != 3 || replace.Length != 3)
throw new ArgumentOutOfRangeException();
var output = new StringBuilder(input.Length);
int i = 0;
while (i + 2 < input.Length) {
if (input[i] == match[0] && input[i+1] == match[1] && input[i+2] == match[2]) {
output.Append(replace);
} else {
output.Append(input[i]);
output.Append(input[i]+1);
output.Append(input[i]+2);
}
i += 3;
}
// pick up trailing letters.
while (i < input.Length) output.Append(input[i]);
return output.ToString();
}
Solution
It is possible to do this with regex. Assuming the input is valid (contains only A, T, G, C):
Regex.Replace(input, #"\G((?:.{3})*?)" + codon, "$1" + replacement);
DEMO
If the input is not guaranteed to be valid, you can just do a check with the regex ^[ATCG]*$ (allow non-multiple of 3) or ^([ATCG]{3})*$ (sequence must be multiple of 3). It doesn't make sense to operate on invalid input anyway.
Explanation
The construction above works for any codon. For the sake of explanation, let the codon be AAA. The regex will be \G((?:.{3})*?)AAA.
The whole regex actually matches the shortest substring that ends with the codon to be replaced.
\G # Must be at beginning of the string, or where last match left off
((?:.{3})*?) # Match any number of codon, lazily. The text is also captured.
AAA # The codon we want to replace
We make sure the matches only starts from positions whose index is multiple of 3 with:
\G which asserts that the match starts from where the previous match left off (or the beginning of the string)
And the fact that the pattern ((?:.{3})*?)AAA can only match a sequence whose length is multiple of 3.
Due to the lazy quantifier, we can be sure that in each match, the part before the codon to be replaced (matched by ((?:.{3})*?) part) does not contain the codon.
In the replacement, we put back the part before the codon (which is captured in capturing group 1 and can be referred to with $1), follows by the replacement codon.
NOTE
As explained in the comment, the following is not a good solution! I leave it in so that others will not fall for the same mistake
You can usually find out where a match starts and ends via m.start() and m.end(). If m.start() % 3 == 0 you found a relevant match.

UK Date Regular Expression [duplicate]

This question already has answers here:
Does anyone know of a reg expression for uk date format
(7 answers)
Closed 9 years ago.
I'm trying to create a regular expression that validates UK date format. I have the following:
(0[1-9]|[12][0-9]|3[01])[- /.](0[1-9]|1[012])[- /.](19|20)\d\d
This works great for validating: 09/12/2011. But if the date is: 9/12/2011 it will not validate correctly. Is there a regular expression that allows me to use a single number and two numbers for the day section? For example "09" and "9".
Just make the leading 0 optional:
(0?[1-9]|[12][0-9]|3[01])[- /.](0?[1-9]|1[012])[- /.](19|20)\d\d
You will need an additional validation step, though - this regex of course won't check for invalid dates like 31-02-2000 etc. While it's possible to do this in regex, it's not recommended because it's much easier to do this programmatically, and that regex is going to be monstrous. Here is a date validating regex (that uses the mmddyyyy format, though) to show what I mean.
My preference goes to a combination of the simple regex, (\d{1,2})[-/.](\d{1,2})[-/.](\d{4}), with some code that validates that this is indeed a correct date. You will have to have that code anyways, unless you want to make a monstrous regex that rejects "29-02-2011" but not "29-02-2008".
Anyway, here's a breakdown of that regex so you can see what's going on:
\d{1,2}: this part matches one or two ({1,2}) digits (\d), making up the day portion of the date.
[-/.]: this matches one of the characters inside the brackets, i.e, either a ., a /, or a -.
\d{1,2}: again, this matches one or two digits from the month.
[-/.]: another separator...
\d{4}: this matches exactly four ({4}) digits for the year portion.
Note that the day, month, and year portion of the regular expression are inside parentheses. This is to create groups. Each of those three portions will be captured into a group that you can retrieve from the match. Groups are identified with a number, starting with 1, from left to right. This means that the day will be group 1, the month group 2, and the year group 3. There is also a group 0 that always contains the entire text matched.
You can use the groups to perform the second part of the validation and reject invalid dates like "30-02-2011", "31-4-2011", or "32-13-2011".
If you want to reject inputs that use two different separators, like "31-12.2011", you can use a slightly more advanced feature called backreferences:
(\d{1,2})([-/.])(\d{1,2})\2(\d{4})
Note that now I placed the first separator inside a group. This changes the month to group 3, and the year to group 4. The separator is matched by group 2. The backreference is that \2 part between the month and the year. It matches whatever was matched by the 2nd previous group. If you walk back two groups from the backreference you end up in group 2, the separator. If that group matched a ., the backreference with match only a . as well; if it matched a -, the backreference will match only a -; and so on.
What is "the UK date format" anyway?
Officially, it's 2011-02-21 today, see BS EN 28601 / ISO 8601.
On the web, you should all be using the format defined in RFC 3339.
Correct way to check for the day is to ban the [4-9]. numbers too.
Something like 0[0-9]|[12][0-9]|3[01]|[^0-9][0-9]|^[0-9]
Yes. {n,m} is the quantifier that say "at least n element, max m elements". So you can write \d{1,2} (matches 1 or 2 digits). Complete date: \d{1,2}/\d{1,2}/\d{4}
Alternative: Make the leading zero optional:
0?\d/0?\d/\d{4}
The question mark says, that the element before the question mark is optional.
Use this code, I am validating everything for the date. :-
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class FinalDateValidator {
private Pattern pattern;
private Matcher matcher;
public boolean isValidDate(final String date) {
Pattern pattern;
Matcher matcher;
final String DATE_PATTERN = "([0-9]{4})/(0?[1-9]|1[012])/(0[1-9]|[12][0-9]|3[01]|[1-9])";
pattern = Pattern.compile(DATE_PATTERN);
matcher = pattern.matcher(date);
if (matcher.matches()) {
matcher.reset();
if (matcher.find()) {
int year = Integer.parseInt(matcher.group(1));
String month = matcher.group(2);
String day = matcher.group(3);
System.out.println("__________________________________________________");
System.out.println("year : "+year +" month : "+month +" day : "+day);
if (day.equals("31")
&& (month.equals("4") || month.equals("6")
|| month.equals("9") || month.equals("11")
|| month.equals("04") || month.equals("06") || month
.equals("09"))) {
return false; // only 1,3,5,7,8,10,12 has 31 days
} else if (month.equals("2") || month.equals("02")) {
// leap year
if (year % 4 == 0) {
if (day.equals("30") || day.equals("31")) {
return false;
} else {
return true;
}
} else {
if (day.equals("29") || day.equals("30")
|| day.equals("31")) {
return false;
} else {
return true;
}
}
} else {
return true;
}
} else {
return false;
}
} else {
return false;
}
}
public static void main(String argsp[]){
FinalDateValidator vs = new FinalDateValidator();
System.out.println("1: 1910/12/10---"+vs.isValidDate("1910/12/10"));
System.out.println("2: 2010/2/29---"+vs.isValidDate("2010/02/29"));
System.out.println("3: 2011/2/29---"+vs.isValidDate("2011/02/29"));
System.out.println("3: 2011/2/30---"+vs.isValidDate("2011/02/30"));
System.out.println("3: 2011/2/31---"+vs.isValidDate("2011/02/31"));
System.out.println("4: 2010/08/31---"+vs.isValidDate("2010/08/31"));
System.out.println("5: 2010/3/10---"+vs.isValidDate("2010/03/10"));
System.out.println("6: 2010/03/33---"+vs.isValidDate("2010/03/33"));
System.out.println("7: 2010/03/09---"+vs.isValidDate("2010/03/09"));
System.out.println("8: 2010/03/9---"+vs.isValidDate("2010/03/9"));
System.out.println("9: 1910/12/00---"+vs.isValidDate("1910/12/00"));
System.out.println("10: 2010/02/01---"+vs.isValidDate("2010/02/01"));
System.out.println("11: 2011/2/03---"+vs.isValidDate("2011/02/03"));
System.out.println("12: 2010/08/31---"+vs.isValidDate("2010/08/31"));
System.out.println("13: 2010/03/39---"+vs.isValidDate("2010/03/39"));
System.out.println("14: 201011/03/31---"+vs.isValidDate("201011/03/31"));
System.out.println("15: 2010/032/09---"+vs.isValidDate("2010/032/09"));
System.out.println("16: 2010/03/922---"+vs.isValidDate("2010/03/922"));
}
}
Enjoy...
I ran into the similar requirements.
Here is the complete regular expression along with Leap Year validation.
Format: dd/MM/yyyy
(3[01]|[12]\d|0[1-9])/(0[13578]|10|12)/((?!0000)\d{4})|(30|[12]\d|0[1-9])/(0[469]|11)/((?!0000)\d{4})|(2[0-8]|[01]\d|0[1-9])/(02)/((?!0000)\d{4})|
29/(02)/(1600|2000|2400|2800|00)|29/(02)/(\d\d)(0[48]|[2468][048]|[13579][26])
It can be easily modified to US format or other EU formats.