Get 10 numbers from the end of the string with Python regex? - regex

I have some strings like -
1. "07870 622103"
2. "(0) 07543 876545"
3. "07321 786543 - not working"
I want to get the last 10 digits of these strings. like -
1. "07870622103"
2. "07543876545"
3. "07321786543"
So far I have tried-
a = re.findall(r"\d+${10}", mobilePhone)
Please help.

It'll be easier just to filter your string for digits and picking out the last 10:
''.join([c for c in mobilePhone if c.isdigit()][-10:])
Result:
>>> mobilePhone = "07870 622103"
>>> ''.join([c for c in mobilePhone if c.isdigit()][-10:])
'7870622103'
>>> mobilePhone = "(0) 07543 876545"
>>> ''.join([c for c in mobilePhone if c.isdigit()][-10:])
'7543876545'
>>> mobilePhone = "07321 786543 - not working"
>>> ''.join([c for c in mobilePhone if c.isdigit()][-10:])
'7321786543'
The regular expression approach (filtering everything but digits), is faster though:
$ python -m timeit -s "mobilenum='07321 786543 - not working'" "''.join([c for c in mobilenum if c.isdigit()][-10:])"
100000 loops, best of 3: 6.68 usec per loop
$ python -m timeit -s "import re; notnum=re.compile(r'\D'); mobilenum='07321 786543 - not working'" "notnum.sub(mobilenum, '')[-10:]"
1000000 loops, best of 3: 0.472 usec per loop

I suggest using a regex to throw away all non-digit. Like so:
newstring = re.compile(r'\D').sub('', yourstring)
The regex is very simple - \D means non-digit. And the code above uses sub to replace any non-digit char with an empty string. So you get what you want in newstring
Oh, and for taking the last ten chars use newstring[-10:]
That was a regex answer. The answer of Martijn Pieters may be more pythonic.

Related

Julia - Extract number from string using regex

I have a list of strings each telling me after how many iterations an algorithm converged.
string_list = [
"Converged after 1 iteration",
"Converged after 20 iterations",
"Converged after 7 iterations"
]
How can I extract the number of iterations? The result woudl be [1, 20, 7]. I tried with regex. Apparently (?<=after )(.*)(?= iteration*) will give me anything in between after and iteration but then this doesn't work:
occursin(string_list[1], r"(?<=after )(.*)(?= iteration*)")
There's a great little Julia package that makes creating regexes easier called ReadableRegex, and as luck would have it the first example in the readme is an example of finding every integer in a string:
julia> using ReadableRegex
julia> reg = #compile look_for(
maybe(char_in("+-")) * one_or_more(DIGIT),
not_after = ".",
not_before = NON_SEPARATOR)
r"(?:(?<!\.)(?:(?:[+\-])?(?:\d)+))(?!\P{Z})"
That regex can now be broadcast over your list of strings:
julia> collect.(eachmatch.(reg, string_list))
3-element Vector{Vector{RegexMatch}}:
[RegexMatch("1")]
[RegexMatch("20")]
[RegexMatch("7")]
To extract information out of a regex, you want to use match and captures:
julia> convergeregex = r"Converged after (\d+) iteration"
r"Converged after (\d+) iteration"
julia> match(convergeregex, string_list[2]).captures[1]
"20"
julia> parse.(Int, [match(convergeregex, s).captures[1] for s in string_list])
3-element Vector{Int64}:
1
20
7
\d+ matches a series of digits (so, the number of iterations here), and the parantheses around it indicates that you want the part of the string matched by that to be placed in the results captures array.
You don't need the lookbehind and lookahead operators (?<=, ?=) here.

Regex substitution: getting half of the string

I got another question about regex. The requirement is quite easy:
Given a string that has length of a even number.
12
1234
123456
12345678
abcdef
Write a substition regex to get the first half of the string:
After substition:
1
12
123
1234
abc
I'm using pcre, it supports recursion and control verbs.
I tried something like this but it's not working :(
s/^(?=(.))(?:((?1))(?1))+$/$2/mg
Here's the test subject on regex101
Is it possible? How can I achieve this?
I'm pretty sure this is not the most elegant solution, but it does work:
>>> def half(string):
regex = re.compile(r"(.{%d})" % int(len(string)/2))
return regex.search(string).group(1)
>>> half("12")
'1'
>>> half("1234")
'12'
>>> half("123456")
'123'
>>> half("12345678")
'1234'
>>> half("abcdef")
'abc'

getting consecutive digits regex

import re
s = 'words here and a num 1311374/104813603 and 2302374/544863603 and 0100374/104563603'
I have the following string and I want to extract 7 consecutive digits followed by / and followed by 9 consecutive digits e.g. 1311374/104813603. To do so, I have tried the following
reg = r'(?:^|(?<=\s))\d{7,9}(?=\s|$)'
r1 = re.findall(reg,s)
But this gives me an empty []. How do I tweak my reg to get my desired output?
desired output
['1311374/104813603', '2302374/544863603', '0100374/104563603']
I want to extract 7 consecutive digits followed by / and followed by 9 consecutive digits
I think you're over complicating it. You may just use:
\b\d{7}/\d{9}\b
RegEx Demo
Code:
>>> import re
>>> s = 'words here and a num 1311374/104813603 and 2302374/544863603 and 0100374/104563603'
>>> print (re.findall(r'\b\d{7}/\d{9}\b', s))
['1311374/104813603', '2302374/544863603', '0100374/104563603']

Regular expression to join group of 5 digits

I am trying to extract 10 digit phone numbers from string. In some cases the numbers are separated by space after 2 or 5 digits. How do I merge such numbers to get the final count of 10 digits?
mystr='(R) 98198 38466 (some Text) 9702977470'
import re
re.findall('\d+' , mystr)
Close, but not correct:
['98198', '38466', '9702977470']
Expected Results:
['9819838466', '9702977470']
I can write python code to concat '98198' and '38466', but I will like to know if regular expression can be used for this.
You could remove the non-digits first.
>>> mydigits = re.sub(r'\D', '', mystr)
>>> mydigits
'98198384669702977470'
>>> re.findall(r'.{10}', mydigits)
['9819838466', '9702977470']
If all the separators are one character long, this would work.
>>> re.findall(r'(?:\d.?)+\d', mystr)
['98198 38466', '9702977470']
Of course, this includes the non-digit separators in the match. A regex findall can only return some number of slices of the input string. It cannot modify them.
These are easy to remove afterwards if that's a problem.
>>> [re.sub(r'\D', '', s) for s in _]
['9819838466', '9702977470']
In some cases numbers are separated by space after 2 or 5 digits.
You can use the regex:
\b(?:\d{2}\s?\d{3}|\d{5}\s)\d{5}\b
For example, this regular expression will match all of these:
01 23456789
01234 56789
0123456789
I doubt if you can achieve it just by a regex pattern alone. May be just use a pattern to get 10+ digits and spaces and then clean out its spaces programmatically. The below pattern should work as long as you are sure of there being some text between the phone nos.
[\d ]{10,}
credit goes to commenter jsonharper
\d{2} ?\d{3} ?\d{5}

Regular Expression to remove numbers ending with "," or "."

I am trying to generate regular expression in java to parse financial entities from strings. I need to write a regex in such a way that numbers ending with "." or "," should be removed, like
15,
15.
where as if values like
15,303(currency )
15.55(rate)
should be taken.
This should do it:
/^\d+[,.]$/
You can play with it here.
You might be looking for something like:
(\d+)[\.,][^\d]
Where the group captures digits followed by . or , and not continuing with other digit.
\d+(\.|,)\d+ for your should be taken values
To remove such numbers (example in Python, but should work in nearly any regex flavor):
>>> import re
>>> regex = re.compile(r"\d+[.,](?!\d)")
>>> regex.sub("", "15 15,0 15, 15. 15.0 15")
'15 15,0 15.0 15'
To find only "correct" numbers:
>>> regex = re.compile(r"\d+(?:[.,]\d+)?(?![\d.,])\b")
>>> regex.findall("15 15,0 15, 15. 15.0 15")
['15', '15,0', '15.0', '15']