getting consecutive digits regex - regex

import re
s = 'words here and a num 1311374/104813603 and 2302374/544863603 and 0100374/104563603'
I have the following string and I want to extract 7 consecutive digits followed by / and followed by 9 consecutive digits e.g. 1311374/104813603. To do so, I have tried the following
reg = r'(?:^|(?<=\s))\d{7,9}(?=\s|$)'
r1 = re.findall(reg,s)
But this gives me an empty []. How do I tweak my reg to get my desired output?
desired output
['1311374/104813603', '2302374/544863603', '0100374/104563603']

I want to extract 7 consecutive digits followed by / and followed by 9 consecutive digits
I think you're over complicating it. You may just use:
\b\d{7}/\d{9}\b
RegEx Demo
Code:
>>> import re
>>> s = 'words here and a num 1311374/104813603 and 2302374/544863603 and 0100374/104563603'
>>> print (re.findall(r'\b\d{7}/\d{9}\b', s))
['1311374/104813603', '2302374/544863603', '0100374/104563603']

Related

Regex match between n and m numbers but as much as possible

I have a set of strings that have some letters, occasional one number, and then somewhere 2 or 3 numbers. I need to match those 2 or 3 numbers.
I have this:
\w*(\d{2,3})\w*
but then for strings like
AAA1AAA12A
AAA2AA123A
it matches '12' and '23' respectively, i.e. it fails to pick the three digits in the second case.
How do I get those 3 digits?
Here is how you would do it in Java.
the regex simply matches on a group of 2 or 3 digits.
the while loop uses find() to continue finding matches and the printing the captured match. The 1 and the 1223 are ignored.
String s= "AAA1AAA12Aksk2ksksk21sksksk123ksk1223sk";
String regex = "\\D(\\d{2,3})\\D";
Matcher m = Pattern.compile(regex).matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
prints
12
21
123
Looks like the correct answer would be:
\w*?(\d{2,3})\w*
Basically, making preceding expression lazy does the job

make regex more specific in getting consequtive digits

import pandas as pd
df= pd.DataFrame({'Data':['123456A122 119999 This 1234522261 1A1619 BL171111 A-1-24',
'134456 dont 12-23-34-45-5-6 Z112 NOT 01-22-2001',
'mix: 1A25629Q88 or A13B ok'],
'IDs': ['A11','B22','C33'],
})
I have the following df as seen above. I am using the following to get only consequtive digits
reg = r'((?:[\d]-?){6,})'
df['new'] = df['Data'].str.findall(reg)
Data IDs new
0 [123456,119999, 1234522261, 171111]
1 [134456, 12-23-34-45-5-6, 01-22-2001]
2 []
This picks up many things I dont want like 171111 from BL171111 and 123456 from 123456A122 etc
I would like the following output which only picks up 6 consequtive digits
Data IDs new
0 [119999]
1 [134456]
2 []
How do I change my regex to so?
reg = r'((?:[\d]-?){6,})'
Change your regex to use word boundaries (\b) and limit the number of digits to exactly 6, like this:
reg = r'(\b\d{6}\b)'
This looks for a word boundary, 6 numbers, and another word boundary.
Here's a demo.

How do I use regular expressions to separate white spaces in a phone number?

I have 3 phone numbers in different formats.
(123) 456 7890
234-567-9999
345 569 2411 # notice there are two spaces after 345
I need to find only the numbers and ignore the spaces and the parentheses. I require this output xxx-xxx-xxxx in a dictionary.
So far, I have tried this:
if re.search('\d{3}.*\d{3}.*\d{4}', line):
Phone = re.findall('\d{3}.*\d{3}.*\d{4}', line)
Phone = ''.join(Phone)
PhoneLst.append(Phone)
You can use re.findall with a pattern that matches just the numbers:
PhoneLst.append(''.join(re.findall(r'\d+', line)))
The issue is that you're matching the whole part of the phone number starting with the first digit and ending with the last digit, including any spaces, dashes, or parentheses in between.
To fix this you should match only the digit groups. You can do this using capturing groups, and using one for each digit group—i.e. [3]-[3]-[4].
For example:
phone_list = []
lines = ["(123) 456 7890", "234-567-9999", "345 569 2411"]
for line in lines:
re_match = re.search("(\d{3}).*(\d{3}).*(\d{4})", line)
if re_match:
formatted_number = "".join(re_match.groups())
phone_list.append(formatted_number)
With result for phone_list:
['1234567890', '2345679999', '3455692411']
Here's another answer that uses list comprehension.
import re
# List of possible phone numbers
possible_numbers = ['(123) 456 7890', '234-567-9999', '345 569 2411']
# Use list comprehension to look for phone number pattern
# numbers is a list
numbers = [n for n in possible_numbers if re.search('(\d{3}.*\d{3}.*\d{3})', n)]
# Use list comprehension to reformat the numbers based on your requirements
# formatted_number is a list
formatted_number = [(re.sub('\s', '-', x.replace('(','').replace(')',''))) for x in numbers]
# You mentioned in your question that you needed the outout in a dictionary.
# This code will convert the formatted_number list to a dictionary.
phoneNumbersDictionary = {i : formatted_number[i] for i in range(0, len(formatted_number))}
print (phoneNumbersDictionary)
# output
{0: '123-456-7890', 1: '234-567-9999', 2: '345-569-2411'}

Regular expression to join group of 5 digits

I am trying to extract 10 digit phone numbers from string. In some cases the numbers are separated by space after 2 or 5 digits. How do I merge such numbers to get the final count of 10 digits?
mystr='(R) 98198 38466 (some Text) 9702977470'
import re
re.findall('\d+' , mystr)
Close, but not correct:
['98198', '38466', '9702977470']
Expected Results:
['9819838466', '9702977470']
I can write python code to concat '98198' and '38466', but I will like to know if regular expression can be used for this.
You could remove the non-digits first.
>>> mydigits = re.sub(r'\D', '', mystr)
>>> mydigits
'98198384669702977470'
>>> re.findall(r'.{10}', mydigits)
['9819838466', '9702977470']
If all the separators are one character long, this would work.
>>> re.findall(r'(?:\d.?)+\d', mystr)
['98198 38466', '9702977470']
Of course, this includes the non-digit separators in the match. A regex findall can only return some number of slices of the input string. It cannot modify them.
These are easy to remove afterwards if that's a problem.
>>> [re.sub(r'\D', '', s) for s in _]
['9819838466', '9702977470']
In some cases numbers are separated by space after 2 or 5 digits.
You can use the regex:
\b(?:\d{2}\s?\d{3}|\d{5}\s)\d{5}\b
For example, this regular expression will match all of these:
01 23456789
01234 56789
0123456789
I doubt if you can achieve it just by a regex pattern alone. May be just use a pattern to get 10+ digits and spaces and then clean out its spaces programmatically. The below pattern should work as long as you are sure of there being some text between the phone nos.
[\d ]{10,}
credit goes to commenter jsonharper
\d{2} ?\d{3} ?\d{5}

Regexp matched values subpattern as subarray

My regular expression: https://regex101.com/r/oF7pM8/1
I get http://joxi.ru/J2b54KaI40bbwm
But, i have get all "num" values (all digits) and that they are in an array "num"
I have to get it:
name = house
num = [3 4 5 6 7 8 9]
What's wrong doing?
p.s.: python regular expression
The pattern must find all the numbers separately (array).
Does (?P<name>house)(?:\s(?P<num>(\d\s+)+)\d?)+? do the job ?
My additions to your original in bold: (?Phouse)(?:\s(?P(\d\s+)+)\d?)+?
Then the last digit is found, not all. I need all.
re.match finds all, but returns only the last one. Since you have to post-process the matches anyway in order to assign them to the Python variables name and num, make the pattern simple:
import re
test_string = 'house 3 44 555 6666 777 88 9'
m = re.match(r'(house)((\s\d+)+)', test_string)
name = m.group(1)
num = [int(s) for s in m.group(2).split()]