I am trying to extract 10 digit phone numbers from string. In some cases the numbers are separated by space after 2 or 5 digits. How do I merge such numbers to get the final count of 10 digits?
mystr='(R) 98198 38466 (some Text) 9702977470'
import re
re.findall('\d+' , mystr)
Close, but not correct:
['98198', '38466', '9702977470']
Expected Results:
['9819838466', '9702977470']
I can write python code to concat '98198' and '38466', but I will like to know if regular expression can be used for this.
You could remove the non-digits first.
>>> mydigits = re.sub(r'\D', '', mystr)
>>> mydigits
'98198384669702977470'
>>> re.findall(r'.{10}', mydigits)
['9819838466', '9702977470']
If all the separators are one character long, this would work.
>>> re.findall(r'(?:\d.?)+\d', mystr)
['98198 38466', '9702977470']
Of course, this includes the non-digit separators in the match. A regex findall can only return some number of slices of the input string. It cannot modify them.
These are easy to remove afterwards if that's a problem.
>>> [re.sub(r'\D', '', s) for s in _]
['9819838466', '9702977470']
In some cases numbers are separated by space after 2 or 5 digits.
You can use the regex:
\b(?:\d{2}\s?\d{3}|\d{5}\s)\d{5}\b
For example, this regular expression will match all of these:
01 23456789
01234 56789
0123456789
I doubt if you can achieve it just by a regex pattern alone. May be just use a pattern to get 10+ digits and spaces and then clean out its spaces programmatically. The below pattern should work as long as you are sure of there being some text between the phone nos.
[\d ]{10,}
credit goes to commenter jsonharper
\d{2} ?\d{3} ?\d{5}
Related
I have a set of strings that have some letters, occasional one number, and then somewhere 2 or 3 numbers. I need to match those 2 or 3 numbers.
I have this:
\w*(\d{2,3})\w*
but then for strings like
AAA1AAA12A
AAA2AA123A
it matches '12' and '23' respectively, i.e. it fails to pick the three digits in the second case.
How do I get those 3 digits?
Here is how you would do it in Java.
the regex simply matches on a group of 2 or 3 digits.
the while loop uses find() to continue finding matches and the printing the captured match. The 1 and the 1223 are ignored.
String s= "AAA1AAA12Aksk2ksksk21sksksk123ksk1223sk";
String regex = "\\D(\\d{2,3})\\D";
Matcher m = Pattern.compile(regex).matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
prints
12
21
123
Looks like the correct answer would be:
\w*?(\d{2,3})\w*
Basically, making preceding expression lazy does the job
I'm trying to read specific values from a TestString using Perl and can't seem to get to where I need to. Hoping someone could help me.
I'm trying to read the value that starts the string and only two numbers to the left of the decimal and save it to value1. It has to be the value that ends two numbers to the left of the decimal and to the start of the string since the leading numbers may be 4, 3, or 2 numbers (e.g. 123420.78616 or 3320.78616).
So with the example below, I'm looking to save "133" to value1 using regexmatch and autohotkey.
The second regexmatch is, I need to save the other portion of the number to value2. Value2 would start two numbers to the left of the decimal and then to the end of the string. So I need the "20.78616" to be saved as value2.
Below I can only capture the full number with the Perl used and I've been trying combinations for hours with a regex101.com to no avail.
Hoping someone could help me.
TestString := "13320.78616"
RegExMatch (TestString, "(([\w\.]+)$)", value1)
RegExMatch (TestString, "(([\w\.]+)$)", value2)
msgbox, %value1%
msgbox, %value2%
Suggest the following regex:
(\d+)(\d\d\.\d*)
Three things to note:
use \d instead of \w if you want to capture just digits and not letters;
the (\d+) captures a leading string of at least one digit, and ends two digits before the decimal because of the next part:
the (\d\d\.\d*) captures exactly two digits, the decimal point, and any following digits.
I am looking for a regex string to match a set of numbers:
9.50 (numbers without spaces, that have 2 to 4 decimal points)
1 9 . 5 0 (numbers with spaces that have 2 to 4 decimals points)
10 (numbers without spaces and without decimal points)
So far I have come up regex string [0-9\s\.]+, but this not doing what I want. Any cleaner solutions out there?
Many Thanks
Try this:
[\d\s]+(?:\.(?:\s*\d){2,4})?
This makes the decimal point and the digits/spaces after it optional. If there are digits after, it checks that there are 2-4 of them with {2,4}
DEMO
If this should only match the whole string, you can anchor it.
^[\d\s]+(?:\.(?:\s*\d){2,4})?\s*$
The problem with your regex is that it will match 127.0.0.1 as well, which is an IP4 address, not a number.
The following regex should do the trick:
[0-9]+[0-9\s]*(\.(\s*[0-9]){2,4})?
Assumption I've made: You need to place at least one digit (before the comma).
regex101 demo.
(\d+[\d\s]*\.((\s*\d){2,4})?|\d+)
I was still getting "trailing spaces" selected with the third example of 10
This eliminated them.
wouldn't this work as well - '[^. 0-9]' ?
my full postgresql query looks like this:
split_part(regexp_replace(columnyoudoregexon , '[^. 0-9]', '', 'g'), ' ', 1)
and its doing the following:
values in the column get everything except numbers, spaces and point(for decimal) replaced with empty string.
split this new char string with split_part() and call which element in the resulting list you want.
was stuck on this for a while. i hope it helps.
I need a regX which can match like 123,123 only. My regX is
var regX = /^\d*[0-9](|.\d*[0-9]|,\d*[0-9])*$/;
but its currently matching 123,123 and 123, as well
Valid case: 123,123 or 123,000 or 000,000
Invalid case: 123.123 or 123?123 or '123'.'123'
you should use this regex = \d+(,\d+)+
You might want to use the {x,y} quantifier. I matches at least X of the item, and at most Y. If you leave one out, it has no limit in that direction. If you just have one number, with no comma it matches exactly that amount.
Exactly three digits:
(\d{3}),(\d{3})
Three or more
(\d{3,}),(\d{3,})
Between 2 and 7 digits:
(\d{2,7}),(\d{2,7})
And so on...
It looks like you're actually trying to match a number with thousand separators.
Try this: /\d{1,3}(?:,\d{3})*/
If your numbers are positive integers you can use: \d+,\d+
If you want floating point numbers as well: (\d|.)+,(\d|.)+
although this will also match malformed numbers with multiple or misplaced decimal points including .,. etc.
Probably an easy regex question.
How do I remove all non-digts except leading + from a phone number?
i.e.
012-3456 => 0123456
+1 (234) 56789 => +123456789
/(?<!^)\+|[^\d+]+//g
will remove all non-numbers and leave a leading + alone. Note that leading whitespace will cause the "leave + alone" bit to fail. In .NET languages, this can be worked into the regex, in others you should strip whitespace first before passing the string to this regex.
Explanation:
(?<!^)\+: Match a + unless it's at the start of the string. (In .NET, use (?<!^\s*)\+ to allow for leading whitespace).
| or
[^\d+]+: match any run of characters that are neither numbers nor +.
Before (using (?<!^\s*)\+|[^\d+]+):
+49 (123) 234 5678
+1 (555) 234-5678
+7 (23) 45/6789+10
(0123) 345/5678, ext. 666
After:
+491232345678
+15552345678
+72345678910
01233455678666
In Java, you can do
public static String trimmed(String phoneNumber) {
return phoneNumber.replaceAll("[^+\\d]", "");
}
This will keep all +, even if it's in the middle of phoneNumber. If you want to remove any + in the middle, then do something like this:
return phoneNumber.replaceAll("[^+\\d]|(?<=.)\\+", "");
(?<=.) is a lookbehind to see if there was a preceding character before the +.
System.out.println("[" + trimmed("+1 (234)++56789 ") + "]");
// prints "[+123456789]"
How do I remove all non-digits except leading + from a phone number?
Removing ( and ) and spaces from +44 (0) 20 3000 9000 results in the non-valid number +4402030009000. It should be +442030009000.
The tidying routine needs several steps to deal with country code (with or without access code or +) and/or trunk code and/or punctuation either singly or in any combination.
It is certainly possible to do that all in one regex, but I prefer simpler regexs that will deal with the leading plus correctly and the leading and trailing whitespace:
#!/usr/bin/perl
while (<DATA>) {
print "DATA Read: \$_=$_"; #\n already there...
s/\s*(.*)\s*/$1/g;
$s=s/(^\+){0,1}//?$1:'';
s/[^\d]//g;
print "Formatted: $s$_\n====\n";
}
__DATA__
012-3456
+1 (234) 56789
+1 (234) 56789
1234-56789 |
+12345+6789
Output:
DATA Read: $_=012-3456
Formatted: 0123456
====
DATA Read: $_=+1 (234) 56789
Formatted: +123456789
====
DATA Read: $_= +1 (234) 56789
Formatted: +123456789
====
DATA Read: $_=1234-56789 |
Formatted: 123456789
====
DATA Read: $_=+12345+6789
Formatted: +123456789
If global regular expressions are supported you could simply replace all characters that are not a digit or plus symbol:
s/[^0-9+]//g
If global regular expressions are not supported you could match as many possible number groups as might be valid in your given phone number format:
s/([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)[^0-9+]*([0-9+]*)/\1\2\3\4/
use perl,
my $number = // set it equal to phone number
$number =~ s/[^\d+]//g
This will still allow for a plus sign to be anywhere, if you want it to only allow a plus sign in the beginning, I will leave that part up to you. You can't just have the entire answer given to you or else you won't learn.
Essentially what that does now, is it will replace anything in $number that is not a digit or a plus sign with an empty string
Just replace everything except digits and + to ''
/[^\d+]/
In Python,
>>> import re
>>> re.sub("[^\d+]","","+1 (234) 56789")
'+123456789'
>>>
You cannot simply remove the '+' symbol. It has to be treated like '00' and belongs to the country code. '+xx' is the same as '00xx'.
Anyway, handling phone numbers with regex is like parsing html with regex...nearly impossible because there are so many (correct) spelling formats.
My advice would be be to write a custom class for handling phone numbers and not to use regex.