Regex match between n and m numbers but as much as possible - regex

I have a set of strings that have some letters, occasional one number, and then somewhere 2 or 3 numbers. I need to match those 2 or 3 numbers.
I have this:
\w*(\d{2,3})\w*
but then for strings like
AAA1AAA12A
AAA2AA123A
it matches '12' and '23' respectively, i.e. it fails to pick the three digits in the second case.
How do I get those 3 digits?

Here is how you would do it in Java.
the regex simply matches on a group of 2 or 3 digits.
the while loop uses find() to continue finding matches and the printing the captured match. The 1 and the 1223 are ignored.
String s= "AAA1AAA12Aksk2ksksk21sksksk123ksk1223sk";
String regex = "\\D(\\d{2,3})\\D";
Matcher m = Pattern.compile(regex).matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
prints
12
21
123

Looks like the correct answer would be:
\w*?(\d{2,3})\w*
Basically, making preceding expression lazy does the job

Related

Regex - Match n occurences of substring within any m-lettered window

I am facing some issues forming a regex that matches at least n times a given pattern within m characters of the input string.
For example imagine that my input string is:
00000001100000001110111100000000000000000000000000000000000000000000000000110000000111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001100
I want to detect all cases where an 1 appears at least 7 times (not necessarily consecutively) in the input string, but within a window of up to 20 characters.
So far I have built this expression:
(1[^1]*?){7,}
which detects all cases where an 1 appears at least 7 times in the input string, but this now matches both the:
11000000011101111
and the
1100000001110000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011
parts whereas I want only the first one to be kept, as it is within a substring composed of less than 20 characters.
It tried to combine the aforementioned regex with:
(?=(^[01]{0,20}))
to also match only parts of the string containing either an '1' or a '0' of length up to 20 characters but when I do that it stops working.
Does anyone have an idea gow to accomplish this?
I have put this example in regex101 as a quick reference.
Thank you very much!
This is not something that can be done with regex without listing out every possible string. You would need to iterate over the string instead.
You could also iterate over the matches. Example in Python:
import re
matches = re.finditer(r'(?=((1[^1]*?){7}))', string)
matches = [match.group(1) for match in matches if len(match.group(1)) <= 20]
The next Python snippet is an attempt to get the desired sequences using only the regular expression.
import re
r = r'''
(?mx)
( # the 1st capturing group will contain the desired sequence
1 # this sequence should begin with 1
(?=(?:[01]{6,19}) # let's see that there are enough 0s and 1s in a line
(.*$)) # the 2nd capturing group will contain all characters to the end of a line
(?:0*1){6}) # there must be six more 1s in the sequence
(?=.{0,13} # complement the 1st capturing group to 20 characters
\2) # the rest of a line should be 2nd capturing group
'''
s = '''
0000000
101010101010111111100000000000001
00000001100000001110111100000000000000000000000000000000000000000000000000110000000111000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001100
1111111
111111
'''
print([m.group(1) for m in re.finditer(r, s)])
Output:
['1010101010101', '11111100000000000001', '110000000111011', '1111111']
You can find an exhaustive explanation of this regular expression on RegEx101.

Regular Expression - Two Digit Range (23-79)?

I have been reading the regex questions on this site but my issue seems to be a bit different. I need to match a 2 digit number, such as 23 through 75. I am doing this on an HP-UX Unix system. I found examples of 3 - 44 but or any digit number, nothing that is fixed in length, which is a bit surprising, but perhaps I am not understand the variable length example answer.
Since you're not indicating whether this is in addition to any other characters (or in the middle of a larger string), I've included the logic here to indicate what you would need to match the number portion of a string. This should get you there. We're creating a range for the second numbers we're looking for only allowing those characters. Then we're comparing it to the other ranges as an or:
(2[3456789]|[3456][0-9]|7[012345])
As oded noted you can do this as well since sub ranges are also accepted (depends on the implementation of REGEX in the application you're using):
(2[3-9]|[3-6][0-9]|7[0-5])
Based on the title you would change the last 5 to a 9 to go from 75-79:
(2[3-9]|[3-6][0-9]|7[0-9])
If you are trying to match these numbers specifically as a string (from start to end) then you would use the modifiers ^ and $ to indicate the beginning and end of the string.
There is an excellent technical reference of Regex ranges here:
http://www.regular-expressions.info/numericranges.html
If you're using something like grep and trying to match lines that contain the number with other content then you might do something like this for ranges thru 79:
grep "[^0-9]?(2[3-9]|[3-6][0-9]|7[0-9])[^0-9]?" folder
This tool is exactly what you need: Regex_For_Range
From 29 to 79: \b(2[3-9]|[3-7][0-9])\b
From 29 to 75: \b(29|[3-6][0-9]|7[0-5])\b
And just for fun, from 192 to 1742: \b(19[2-9]|[2-9][0-9]{2}|1[0-6][0-9]{2}|17[0-3][0-9]|174[0-2])\b :)
If I want 2 digit range 0-63
/^[0-9]|[0-5][0-9]|6[0-3]$/
[0-9] will allow single digit from 0 to 9
[0-5][0-9] will allow from 00 to 59
6[0-3] will allow from 60 till 63
This way you can take Regular Expression for any Two Digit Range
You have two classes of numbers you want to match:
the digit 2, followed by one of the digits between 3 and 9
one of the digits between 3 and 7, followed by any digit
Edit: Well, that's the title's range (23-79). Within your question (23-75), you have three:
the digit 2, followed by one of the digits between 3 and 9
one of the digits between 3 and 6, followed by any digit
the digit 7, followed by one of the digits between 0 and 5
Just to add to this, here is a solution for generating the string from the accepted answer in javascript. You can click "Run Code Snippet" to enter your own bounds and get your own string.
function regexRangeString(lower,upper){
let current=lower;
let nextRange=function(){
let currentString=String(current);
let len=currentString.length;
let string="";
let newUpper;
for(let digit=0;digit<len;digit++){
let index=len-digit-1;
let lower=Number(currentString[index]);
let thisString="";
for(let u=9;u>=lower;u--){
let us=currentString.substring(0,index)+u+currentString.substring(index+1,len);
if(Number(us)<=upper){
if(lower==u){
thisString=lower;
}
else{
thisString=`[${lower}-${u}]`;
}
currentString=currentString.substring(0,index)+u+currentString.substring(index+1,len);
break;
}
}
if(thisString!="[0-9]"){
string=currentString.substring(0,index)+thisString+string;
break;
}
else{
string=thisString+string
}
}
current=Number(currentString)+1;
return string
}
let string=""
while(current<upper){
string+="|"+nextRange(current);
}
string="("+string.slice(1)+")";
return string
}
let lower=prompt("Enter Lower Bound")
let upper=prompt("Enter Upper Bound")
alert(regexRangeString(lower,upper))
For example:
regexRangeString(72,189)
generates the following output string:
(7[2-9]|[8-9][0-9]|1[0-8][0-9])
This should do it:
/^([2][3-9]|[3-6][0-9]|[7][0-5])$/
^ and $ will make it strict that it will match only 2 numbers, so in case that you have i.e 234 it won't work.

concatenating regex pattern in C#

I have a C# project that requires me to capture a string value from a html stream.
The pattern I need to match is:
XXXX-abc
Where:
XXXX = a 4 character integer
followed by a -
abc = a 3 character alphanumeric.
I looked at txt2re.com and got
string re1="(\\d)"; // Any Single Digit 1
string re2="(\\d)"; // Any Single Digit 2
string re3="(\\d)"; // Any Single Digit 3
string re4="(\\d)"; // Any Single Digit 4
string re5="(-)"; // Any Single Character 1
string re6="((?:[a-z][a-z]*[0-9]+[a-z0-9]*))"; // Alphanum 1
The thing I am having difficulty with is combining it into one expression instead of 6.
I know I can do:
Regex r = new Regex(re1+re2+re3+re4+re5+re6,RegexOptions.IgnoreCase|RegexOptions.Singleline);
However, my OCD cringes at this method :)
You can use the expresion \d{4}-\w{3} 4 digits follow by - follow by 3 alphanumerical characters. Here is a good site to test and learn about the regular expresion.

Word between 9-10 characters, of which 0-2 are numbers

http://regexr.com?32uvo
What I currently have:
\b(?=[A-Z\d]{10})(?:[A-Z]*\d){0,2}[A-Z]*\b
This would only match a string with a length of 10. I would like to change it to between 9 and 10 characters, where 2 can be numbers. Why doesn't this work?
\b(?=[A-Z\d]{9,10})(?:[A-Z]*\d){0,2}[A-Z]*\b
AFAIK, {9,10} should be the length interval.
You were close
\b(?=[A-Z\d]{9,10}\b)(?:[A-Z]*\d){0,2}[A-Z]*\b
--
|->you missed this
try it here
So this regex would match a word that contains 9 to 10 characters[upper case and digits] that contain 1 to 2 digits
if you want to match the whole string you better use ^(start of the string) and $(end of the string)

How to match a one of a set of numbers?

I am trying to match a group of numbers in regex that consist of one of the following:
1,2,3,4,5,6,7,8,9,10,11
But I am having trouble figuring out the regex.
For single digits this pattern worked fine "0|1|2|3|4|5|6|7|8|9" but it fails on double digit numbers. For example 12 passes as ok due to the regex finding the 1 in 12.
You can use begin and end anchors to force the whole string to be matched:
^(0|1|2|3|4|5|6|7|8|9|10|11)$
Which can be shortened to:
^(\d|10|11)$
This will work if you want to check if just one number is between 0 and 11.
^[0-9]$|^1?[0-1]$
If you want to match a string like:
1,2,3,12,32,5,1,6,8, 11
and match 0-11 then you can use the following:
(?<=,|^)([0-9]|1?[0-1])(?=,|$)
use this regex ^(0|1|2|3|4|5|6|7|8|9|(10)|(11))$