I got another question about regex. The requirement is quite easy:
Given a string that has length of a even number.
12
1234
123456
12345678
abcdef
Write a substition regex to get the first half of the string:
After substition:
1
12
123
1234
abc
I'm using pcre, it supports recursion and control verbs.
I tried something like this but it's not working :(
s/^(?=(.))(?:((?1))(?1))+$/$2/mg
Here's the test subject on regex101
Is it possible? How can I achieve this?
I'm pretty sure this is not the most elegant solution, but it does work:
>>> def half(string):
regex = re.compile(r"(.{%d})" % int(len(string)/2))
return regex.search(string).group(1)
>>> half("12")
'1'
>>> half("1234")
'12'
>>> half("123456")
'123'
>>> half("12345678")
'1234'
>>> half("abcdef")
'abc'
Related
I am trying to extract 10 digit phone numbers from string. In some cases the numbers are separated by space after 2 or 5 digits. How do I merge such numbers to get the final count of 10 digits?
mystr='(R) 98198 38466 (some Text) 9702977470'
import re
re.findall('\d+' , mystr)
Close, but not correct:
['98198', '38466', '9702977470']
Expected Results:
['9819838466', '9702977470']
I can write python code to concat '98198' and '38466', but I will like to know if regular expression can be used for this.
You could remove the non-digits first.
>>> mydigits = re.sub(r'\D', '', mystr)
>>> mydigits
'98198384669702977470'
>>> re.findall(r'.{10}', mydigits)
['9819838466', '9702977470']
If all the separators are one character long, this would work.
>>> re.findall(r'(?:\d.?)+\d', mystr)
['98198 38466', '9702977470']
Of course, this includes the non-digit separators in the match. A regex findall can only return some number of slices of the input string. It cannot modify them.
These are easy to remove afterwards if that's a problem.
>>> [re.sub(r'\D', '', s) for s in _]
['9819838466', '9702977470']
In some cases numbers are separated by space after 2 or 5 digits.
You can use the regex:
\b(?:\d{2}\s?\d{3}|\d{5}\s)\d{5}\b
For example, this regular expression will match all of these:
01 23456789
01234 56789
0123456789
I doubt if you can achieve it just by a regex pattern alone. May be just use a pattern to get 10+ digits and spaces and then clean out its spaces programmatically. The below pattern should work as long as you are sure of there being some text between the phone nos.
[\d ]{10,}
credit goes to commenter jsonharper
\d{2} ?\d{3} ?\d{5}
I want to delete any numbers that have 3 or less than 3 digits. Can someone please help me with a regex that does this?
Currently, my code removes all the numbers it finds.
# Cleans Numbers
def cleanNumbers(stringToClean):
stringToClean = re.sub(r'[0-9]*', r'', stringToClean)
print 'String after cleaning : %s' %stringToClean
return stringToClean
Numbers will be surrounded by space. Example string I pass into the function :
connection breaks on Win8 client after a while. [persistence] 123 1 22 333 4444 554665 645fdgf45 ds3434 457870978934787843 345342kl
I call the above function as follows :
# Main function, calls other functions
def main():
# Parsing the input query
searchQuery = open('input.txt', 'r').read()
print 'Input query : %s' %searchQuery
# Cleaning the input query
string = CleanUpText.cleanNumbers(searchQuery)
\b[0-9]{1,3}\b finds blocks of digits that have up to three digits.
re.sub(r'[0-9]{,3}',r'',stringToClean)
I have corrected the question, '3 or less than 3'
Given that, it should be as simple as: \b\d{1,3}\b
You could use a regex like this
r'\b[0-9]{1,2}\b'
Edit: Sorry wrote my answer to fast without really thinking. You have to use boundries so you don't capture 3456 for example
I have some strings like -
1. "07870 622103"
2. "(0) 07543 876545"
3. "07321 786543 - not working"
I want to get the last 10 digits of these strings. like -
1. "07870622103"
2. "07543876545"
3. "07321786543"
So far I have tried-
a = re.findall(r"\d+${10}", mobilePhone)
Please help.
It'll be easier just to filter your string for digits and picking out the last 10:
''.join([c for c in mobilePhone if c.isdigit()][-10:])
Result:
>>> mobilePhone = "07870 622103"
>>> ''.join([c for c in mobilePhone if c.isdigit()][-10:])
'7870622103'
>>> mobilePhone = "(0) 07543 876545"
>>> ''.join([c for c in mobilePhone if c.isdigit()][-10:])
'7543876545'
>>> mobilePhone = "07321 786543 - not working"
>>> ''.join([c for c in mobilePhone if c.isdigit()][-10:])
'7321786543'
The regular expression approach (filtering everything but digits), is faster though:
$ python -m timeit -s "mobilenum='07321 786543 - not working'" "''.join([c for c in mobilenum if c.isdigit()][-10:])"
100000 loops, best of 3: 6.68 usec per loop
$ python -m timeit -s "import re; notnum=re.compile(r'\D'); mobilenum='07321 786543 - not working'" "notnum.sub(mobilenum, '')[-10:]"
1000000 loops, best of 3: 0.472 usec per loop
I suggest using a regex to throw away all non-digit. Like so:
newstring = re.compile(r'\D').sub('', yourstring)
The regex is very simple - \D means non-digit. And the code above uses sub to replace any non-digit char with an empty string. So you get what you want in newstring
Oh, and for taking the last ten chars use newstring[-10:]
That was a regex answer. The answer of Martijn Pieters may be more pythonic.
I am trying to generate regular expression in java to parse financial entities from strings. I need to write a regex in such a way that numbers ending with "." or "," should be removed, like
15,
15.
where as if values like
15,303(currency )
15.55(rate)
should be taken.
This should do it:
/^\d+[,.]$/
You can play with it here.
You might be looking for something like:
(\d+)[\.,][^\d]
Where the group captures digits followed by . or , and not continuing with other digit.
\d+(\.|,)\d+ for your should be taken values
To remove such numbers (example in Python, but should work in nearly any regex flavor):
>>> import re
>>> regex = re.compile(r"\d+[.,](?!\d)")
>>> regex.sub("", "15 15,0 15, 15. 15.0 15")
'15 15,0 15.0 15'
To find only "correct" numbers:
>>> regex = re.compile(r"\d+(?:[.,]\d+)?(?![\d.,])\b")
>>> regex.findall("15 15,0 15, 15. 15.0 15")
['15', '15,0', '15.0', '15']
I need a regular expression that will find all the numbers on a sentence.
For example:
"I have 3 bananas and 37 balloons"
I will get:
3
37
"The time is 20:00 and I have 7 tanks"
I will get:
20
00
7
Split your string by [^0-9]+.
JAVA: String[] numbers = "yourString".split("[^0-9]+");
JavaScript: var numbers = "yourString".split(/[^0-9]+/);
PHP: $numbers = preg_split("/[^0-9]+/", "yourString");
The regex itself is as simple as \d+, but you will also need to set a flag to match it globally, the syntax of which depends on the programming language or software you are using.
EDIT: Some examples:
Python:
import re
re.findall(r"\d+", my_string)
JavaScript:
myString.match(/\d+/g)
The regex you are looking for is [0-9]+ or \d+. You should then get multiple matches for the sentence.