Regex substitution: getting half of the string - regex

I got another question about regex. The requirement is quite easy:
Given a string that has length of a even number.
12
1234
123456
12345678
abcdef
Write a substition regex to get the first half of the string:
After substition:
1
12
123
1234
abc
I'm using pcre, it supports recursion and control verbs.
I tried something like this but it's not working :(
s/^(?=(.))(?:((?1))(?1))+$/$2/mg
Here's the test subject on regex101
Is it possible? How can I achieve this?

I'm pretty sure this is not the most elegant solution, but it does work:
>>> def half(string):
regex = re.compile(r"(.{%d})" % int(len(string)/2))
return regex.search(string).group(1)
>>> half("12")
'1'
>>> half("1234")
'12'
>>> half("123456")
'123'
>>> half("12345678")
'1234'
>>> half("abcdef")
'abc'

Related

Regular expression to join group of 5 digits

I am trying to extract 10 digit phone numbers from string. In some cases the numbers are separated by space after 2 or 5 digits. How do I merge such numbers to get the final count of 10 digits?
mystr='(R) 98198 38466 (some Text) 9702977470'
import re
re.findall('\d+' , mystr)
Close, but not correct:
['98198', '38466', '9702977470']
Expected Results:
['9819838466', '9702977470']
I can write python code to concat '98198' and '38466', but I will like to know if regular expression can be used for this.
You could remove the non-digits first.
>>> mydigits = re.sub(r'\D', '', mystr)
>>> mydigits
'98198384669702977470'
>>> re.findall(r'.{10}', mydigits)
['9819838466', '9702977470']
If all the separators are one character long, this would work.
>>> re.findall(r'(?:\d.?)+\d', mystr)
['98198 38466', '9702977470']
Of course, this includes the non-digit separators in the match. A regex findall can only return some number of slices of the input string. It cannot modify them.
These are easy to remove afterwards if that's a problem.
>>> [re.sub(r'\D', '', s) for s in _]
['9819838466', '9702977470']
In some cases numbers are separated by space after 2 or 5 digits.
You can use the regex:
\b(?:\d{2}\s?\d{3}|\d{5}\s)\d{5}\b
For example, this regular expression will match all of these:
01 23456789
01234 56789
0123456789
I doubt if you can achieve it just by a regex pattern alone. May be just use a pattern to get 10+ digits and spaces and then clean out its spaces programmatically. The below pattern should work as long as you are sure of there being some text between the phone nos.
[\d ]{10,}
credit goes to commenter jsonharper
\d{2} ?\d{3} ?\d{5}

How to create a regex that matches numbers that have 3 or less than 3 digits?

I want to delete any numbers that have 3 or less than 3 digits. Can someone please help me with a regex that does this?
Currently, my code removes all the numbers it finds.
# Cleans Numbers
def cleanNumbers(stringToClean):
stringToClean = re.sub(r'[0-9]*', r'', stringToClean)
print 'String after cleaning : %s' %stringToClean
return stringToClean
Numbers will be surrounded by space. Example string I pass into the function :
connection breaks on Win8 client after a while. [persistence] 123 1 22 333 4444 554665 645fdgf45 ds3434 457870978934787843 345342kl
I call the above function as follows :
# Main function, calls other functions
def main():
# Parsing the input query
searchQuery = open('input.txt', 'r').read()
print 'Input query : %s' %searchQuery
# Cleaning the input query
string = CleanUpText.cleanNumbers(searchQuery)
\b[0-9]{1,3}\b finds blocks of digits that have up to three digits.
re.sub(r'[0-9]{,3}',r'',stringToClean)
I have corrected the question, '3 or less than 3'
Given that, it should be as simple as: \b\d{1,3}\b
You could use a regex like this
r'\b[0-9]{1,2}\b'
Edit: Sorry wrote my answer to fast without really thinking. You have to use boundries so you don't capture 3456 for example

Get 10 numbers from the end of the string with Python regex?

I have some strings like -
1. "07870 622103"
2. "(0) 07543 876545"
3. "07321 786543 - not working"
I want to get the last 10 digits of these strings. like -
1. "07870622103"
2. "07543876545"
3. "07321786543"
So far I have tried-
a = re.findall(r"\d+${10}", mobilePhone)
Please help.
It'll be easier just to filter your string for digits and picking out the last 10:
''.join([c for c in mobilePhone if c.isdigit()][-10:])
Result:
>>> mobilePhone = "07870 622103"
>>> ''.join([c for c in mobilePhone if c.isdigit()][-10:])
'7870622103'
>>> mobilePhone = "(0) 07543 876545"
>>> ''.join([c for c in mobilePhone if c.isdigit()][-10:])
'7543876545'
>>> mobilePhone = "07321 786543 - not working"
>>> ''.join([c for c in mobilePhone if c.isdigit()][-10:])
'7321786543'
The regular expression approach (filtering everything but digits), is faster though:
$ python -m timeit -s "mobilenum='07321 786543 - not working'" "''.join([c for c in mobilenum if c.isdigit()][-10:])"
100000 loops, best of 3: 6.68 usec per loop
$ python -m timeit -s "import re; notnum=re.compile(r'\D'); mobilenum='07321 786543 - not working'" "notnum.sub(mobilenum, '')[-10:]"
1000000 loops, best of 3: 0.472 usec per loop
I suggest using a regex to throw away all non-digit. Like so:
newstring = re.compile(r'\D').sub('', yourstring)
The regex is very simple - \D means non-digit. And the code above uses sub to replace any non-digit char with an empty string. So you get what you want in newstring
Oh, and for taking the last ten chars use newstring[-10:]
That was a regex answer. The answer of Martijn Pieters may be more pythonic.

Regular Expression to remove numbers ending with "," or "."

I am trying to generate regular expression in java to parse financial entities from strings. I need to write a regex in such a way that numbers ending with "." or "," should be removed, like
15,
15.
where as if values like
15,303(currency )
15.55(rate)
should be taken.
This should do it:
/^\d+[,.]$/
You can play with it here.
You might be looking for something like:
(\d+)[\.,][^\d]
Where the group captures digits followed by . or , and not continuing with other digit.
\d+(\.|,)\d+ for your should be taken values
To remove such numbers (example in Python, but should work in nearly any regex flavor):
>>> import re
>>> regex = re.compile(r"\d+[.,](?!\d)")
>>> regex.sub("", "15 15,0 15, 15. 15.0 15")
'15 15,0 15.0 15'
To find only "correct" numbers:
>>> regex = re.compile(r"\d+(?:[.,]\d+)?(?![\d.,])\b")
>>> regex.findall("15 15,0 15, 15. 15.0 15")
['15', '15,0', '15.0', '15']

Find numbers in a sentence by regex

I need a regular expression that will find all the numbers on a sentence.
For example:
"I have 3 bananas and 37 balloons"
I will get:
3
37
"The time is 20:00 and I have 7 tanks"
I will get:
20
00
7
Split your string by [^0-9]+.
JAVA: String[] numbers = "yourString".split("[^0-9]+");
JavaScript: var numbers = "yourString".split(/[^0-9]+/);
PHP: $numbers = preg_split("/[^0-9]+/", "yourString");
The regex itself is as simple as \d+, but you will also need to set a flag to match it globally, the syntax of which depends on the programming language or software you are using.
EDIT: Some examples:
Python:
import re
re.findall(r"\d+", my_string)
JavaScript:
myString.match(/\d+/g)
The regex you are looking for is [0-9]+ or \d+. You should then get multiple matches for the sentence.