make regex more specific in getting consequtive digits - regex

import pandas as pd
df= pd.DataFrame({'Data':['123456A122 119999 This 1234522261 1A1619 BL171111 A-1-24',
'134456 dont 12-23-34-45-5-6 Z112 NOT 01-22-2001',
'mix: 1A25629Q88 or A13B ok'],
'IDs': ['A11','B22','C33'],
})
I have the following df as seen above. I am using the following to get only consequtive digits
reg = r'((?:[\d]-?){6,})'
df['new'] = df['Data'].str.findall(reg)
Data IDs new
0 [123456,119999, 1234522261, 171111]
1 [134456, 12-23-34-45-5-6, 01-22-2001]
2 []
This picks up many things I dont want like 171111 from BL171111 and 123456 from 123456A122 etc
I would like the following output which only picks up 6 consequtive digits
Data IDs new
0 [119999]
1 [134456]
2 []
How do I change my regex to so?
reg = r'((?:[\d]-?){6,})'

Change your regex to use word boundaries (\b) and limit the number of digits to exactly 6, like this:
reg = r'(\b\d{6}\b)'
This looks for a word boundary, 6 numbers, and another word boundary.
Here's a demo.

Related

Regex match between n and m numbers but as much as possible

I have a set of strings that have some letters, occasional one number, and then somewhere 2 or 3 numbers. I need to match those 2 or 3 numbers.
I have this:
\w*(\d{2,3})\w*
but then for strings like
AAA1AAA12A
AAA2AA123A
it matches '12' and '23' respectively, i.e. it fails to pick the three digits in the second case.
How do I get those 3 digits?
Here is how you would do it in Java.
the regex simply matches on a group of 2 or 3 digits.
the while loop uses find() to continue finding matches and the printing the captured match. The 1 and the 1223 are ignored.
String s= "AAA1AAA12Aksk2ksksk21sksksk123ksk1223sk";
String regex = "\\D(\\d{2,3})\\D";
Matcher m = Pattern.compile(regex).matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
prints
12
21
123
Looks like the correct answer would be:
\w*?(\d{2,3})\w*
Basically, making preceding expression lazy does the job

getting consecutive digits regex

import re
s = 'words here and a num 1311374/104813603 and 2302374/544863603 and 0100374/104563603'
I have the following string and I want to extract 7 consecutive digits followed by / and followed by 9 consecutive digits e.g. 1311374/104813603. To do so, I have tried the following
reg = r'(?:^|(?<=\s))\d{7,9}(?=\s|$)'
r1 = re.findall(reg,s)
But this gives me an empty []. How do I tweak my reg to get my desired output?
desired output
['1311374/104813603', '2302374/544863603', '0100374/104563603']
I want to extract 7 consecutive digits followed by / and followed by 9 consecutive digits
I think you're over complicating it. You may just use:
\b\d{7}/\d{9}\b
RegEx Demo
Code:
>>> import re
>>> s = 'words here and a num 1311374/104813603 and 2302374/544863603 and 0100374/104563603'
>>> print (re.findall(r'\b\d{7}/\d{9}\b', s))
['1311374/104813603', '2302374/544863603', '0100374/104563603']

RegEx for matching group in multiline texts

I have this multi-line text, I want to extract the numerical value before the 'Next' text (in this case 13). The numerical values will change, but the location will stay the same, it indicates total # of pages on website. I am having trouble writing the correct regex to return this value:
Previous
1
2
3
...
13
Next
Showing 1 - 100 of 1227 Results[EXTRACT]
pattern =re.compile(r'(\d{1,2})\r\nNext', re.M)
result = pattern.match(text)
The expected return value is 13.
import re
t = """Previous
1
2
3
...
13
Next
Showing 1 - 100 of 1227 Results[EXTRACT]"""
re.search(r"\d+(?=\s+Next)", t).group(0)
Returns: '13'
The regular expression does a lookahead assertion to see if there is any amount (>1) of digits followed by any amount (>1) of whitespace characters followed by the word Next.

Retrieve a certain text in a string

I'd like a solution to retrieve a text in a string in a c# script
the fomat of the text is 4 digits then _ and 1 to 2 digits
test_p_2008_1_Annexe_1_prix
test_p_2008_100_Annexe_1_prix
test_p_2008_1
test_p_2008_100
For this 4 examples, i need to get
2008_1
2008_100
2008_1
2008_100
Maybe use a regex buit i'm not enought good with this
I think you're trying to retrieve text which are in 4 digits then _ and 1 to 3 digits format.
#"\d{4}_\d{1,3}"
Code:
String input = #"test_p_2008_1_Annexe_1_prix
test_p_2008_100_Annexe_1_prix
test_p_2008_1
test_p_2008_100";
Regex rgx = new Regex(#"\d{4}_\d{1,3}");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE

REGEX : Extract group of number where digits are more than 3

HI I have a question regarding REGEX.
This sounds very simple and I remember doing it but somehow it got deleted and I am finding it hard to get it back.
I want to extract group of numbers from one line.
If the count of digits > 3 - select that.
EG:
ga3rdparty/phpMyAdmin/i0ndex.php?&t0oken=abf540063shakk
This line can be different everytime but there will be only 1 group of digits with more than 2 digits.
OUTPUT: 540063
Thank you in advance
You can use \d{3,} where 3 is the minimum number of digits. You an take a look at the following python code
import re
var= "ga3rdparty/phpMyAdmin/i0ndex.php?&t0oken=abf540063shakk"
pattern = re.compile(r'\d{3,}')
for match in pattern.findall(ver):
print(match)