Trying to extract 1st match string between numbers:
For example:
testsfa13.4extractthis8488.9090testssffwwww
ajfafs-sss133.6extractthis887878.222testtest522252.9thismore
So far I have the following:
[\d](.*?)[\d]
However, the match includes the numbers at the end of capture group? Any suggestions appreciated. Thank you.
If you want to extract the first match, you could start with an anchor ^ matching any char except a digit \D* and then match a digit with an optional decimal part.
^\D*\d+(?:[.,]\d+)*(\D+)\d
^ Start of string
\D* Match 0+ times any char except a digit
\d+(?:[.,]\d+)* Match 1+ digits and optionally repeat a . or , and 1+ digits
(\D+) Capture group 1, match 1+ times any char except a digit
\d Match a digit
Regex demo
To prevent crossing newline boundaries:
^[^\d\n\r]*\d+(?:[,.]\d+)*([^\d\n\r]+)\d
Regex demo
try \d([A-Za-z]+)\d and get first value from returned object
https://regex101.com/r/v61exp/1
Related
Is it possible to match only the number between a string and other number?
RO41 RNCB 0089 0957 6044 0001 FPS21098343 RO17 BTRL 0470 1202 W949 45XX
What I want: 21098343
What I'm trying LINK : [0-9]{4}\s*\S+\s+(\S+)
What I get: FPS21098343
Any help is much appreciated! Thanks.
If the digits are at the end of the string $ you could use :
\b\d{4}\s+[A-Z]+(\d+)$
Regex demo
Or you can match uppercase chars preceding the digits and capture the digits in capture group 1 if it is not at the end of string, not followed by a space and digit:
\b\d{4}\s+[A-Z]+(\d+)\b(?!\s+\d)
\b\d{4}\s+ A word boundary, match 4 digits and 1+ whitespace chars
[A-Z]+(\d+)\b Match 1+ uppercase chars and capture 1+ digits in group 1
(?!\s+\d) Assert not whitespaces followed by a digit to the right
Regex demo
Match the number only if its preceded by 3 capital letters.
(?<=[A-Z]{3})([\d]+)
Sample run here
I have a string that has the following structure:
digit-word(s)-digit.
For example:
2029 AG.IZTAPALAPA 2
I want to extract the word(s) in the middle, and the digit at the end of the string.
I want to extract AG.IZTAPALAPA and 2 in the same capture group to extract like:
AG.IZTAPALAPA 2
I managed to capture them as individual capture groups but not as a single:
town_state['municipality'] = town_state['Town'].str.extract(r'(\D+)', expand=False)
town_state['number'] = town_state['Town'].str.extract(r'(\d+)$', expand=False)
Thank you for your help!
Yo can use a single capturing group for the example string to match a single "word" that consists of uppercase chars A-Z with an optional dot in the middle which can not be at the start or end followed by 1 or more digits.
\b\d+ ([A-Z]+(?:\.[A-Z]+)* \d+)\b
Explanation
\b A word boundary
\d+
( Capture group 1
[A-Z]+ Match 1+ occurrences of an uppercase char A-Z
(?:\.[A-Z]+)* \d+ Repeat 0+ times matching a dot and a char A-Z followed by matching 1+ digits
) Close group 1
\b A word boundary
Regex demo
Or you can make the pattern a bit broader matching either a dot or a word character
\b\d+ ([\w.]+(?: [\w.]+)* \d+)\b
Regex demo
You can use the following simple regex:
[0-9]+\s([A-Z]+.[A-Z]+(?: [0-9]+)*)
Note:
(?: [0-9]+)* will make it the last digital optional.
I'm using the following expression to validate a house number:
^\d{1,4}([a-zA-Z]{1,2}\d{1,3}|[a-zA-Z]{1,2}|)$
Now the requirement has changed to the following constraints:
one number (25)
one number w/ one letter (25A)
one number w/ a second one divided by a hyphen (25-32)
one number w/ a second one divided by a hyphen and one letter w/ blank (25-32 A)
How do I validate these w/ changes to the regex above?
If you only want to match those values, you might use a pattern to match 1 or more digits followed by an optional part that matches either A-Z OR a hyphen and 1+ digits optionally followed by a space and a char A-Z
^\d+(?:[A-Z]|-\d+(?: [A-Z])?)?$
^ Start of string
\d+ Match 1+ digits
(?: Non capture group
[A-Z] Match a char A-Z
| Or
-\d+ Match
(?: [A-Z])?
)? Close group and make it optional
$ End of string
Regex demo
I am trying to validate decimal number of 13 digit before and 4 digit after dot excluding comma , i.e comma shouldn't be counted as a digit.
Valid Cases
1,234,567,890,123.1234
1234567890123.1234
123456789012.1234
1234567890123.123
12345.123
1.2
0
In Valid Cases
12345abc.23 // string or special characters not allowed
1,234,567,890,1231.1234
1,234,567,890,123.12341
12345678901231.1234
1234567890123.12341
Current Regex
^[0-9]{1,13}(\.[0-9]{0,4})?$
The current Regex is counting comma as a digit.
Any help would be great.
You could use a negative lookahead to assert what is directly on the right is not 14 times a digit before matching a dot:
^(?!(?:[^.\s\d]*\d){14})-?\d+(?:,\d{1,3})*(?:\.\d{1,4})?$
Explanation
^ Start of string
-? Optional hyphen
(?! Negative lookahead, assert what follows is not
(?:[^.\s\d]*\d){14} Match not a digit, whitespace char or dot 14 times
) Close lookahead
\d+ Match 1+ digits
(?:,\d{1,3})* Match comma, 1-3 digits and repeat 0+ times (Or use \d+)
(?:\.\d{1,3})? Optional part, match a dot and 1-4 digits
$ End of the string
Regex demo
You could just specify the optional count of , Like
^[0-9]{0,1}([,])?[0-9]{0,3}([,])?[0-9]{0,3}([,])?[0-9]{1,3}(\.[0-9]{0,3})?$
I need to detect last digits in the string, as they are indexes for my strings. They may be 2^64, So it's not convenient to check only last element in the string, then try second... etc.
String may be like asdgaf1_hsg534, i.e. in the string may be other digits too, but there are somewhere in the middle and they are not neighboring with the index I want to get.
Here is a method using re.sub:
import re
input = ['asdgaf1_hsg534', 'asdfh23_hsjd12', 'dgshg_jhfsd86']
for s in input:
print re.sub('.*?([0-9]*)$',r'\1',s)
Output:
534
12
86
Explanation:
The function takes a regular expression, a replacement string, and the string you want to do the replacement on: re.sub(regex,replace,string)
The regex '.*?([0-9]*)$' matches the whole string and captures the number that precedes the end of the string. Parenthesis are used to capture parts of the match we are interested in, \1 refers to the first capture group and \2 the second ect..
.*? # Matches anything (non-greedy)
([0-9]*) # Upto a zero or more digits digit (captured)
$ # Followed by the end-of-string identifier
So we are replacing the whole string with just the captured number we are interested in. In python we need to use raw strings for this: r'\1'. If the string doesn't end with digits then a blank string with be returned.
twosixfour = "get_the_numb3r_2_^_64__18446744073709551615"
print re.sub('.*?([0-9]*)$',r'\1',twosixfour)
>>> 18446744073709551615
A simple regex can detect digits at the end of the string:
'\d+$'
$ matches the end of the string. \d+ matches one or more digits. The + operator is greedy by default, meaning it matches as many digits as possible. So this will match all of the digits at the end of the string.
If you want to use re.sub and make sure that there is at least a single digit present at the end of the line, you can use the quantifier + to match 1 or more digits \d+ to not remove the whole line if there are no digits present or no digits only at the end of the line.
^.*?(\d+)$
^ Start of line
.*? Match any char except a newline as least as possible (non greedy)
(\d+) Capture group 1, match 1+ digits
$ End of line
Or using a negative lookbehind
^.*(?<!\d)(\d+)$
^ Start of line
.* Match any char except a newline as much as possible
(?<!\d)(\d+) Assert no digits directly to the left, then capture 1+ digits in group 1
$ End of line
Regex demo
When using re.match, you can omit the ^ anchor and you might also use \A and \Z to asert the start and the end of the string.
Regex demo
import re
strings = ['asdgaf1_hsg534', 'asdfh23_hsjd12', 'dgshg_jhfsd86', 'test']
for s in strings:
print (re.sub(r".*?(\d+)$", r'\1',s))
Output
534
12
86
test
If there should be a non digit present before matching a digit as in this comment you could use a negated character class with a single capture group.
^.*[^\d\r\n](\d+)
^ Start of line
.* Match any char except a newline as much as possible
[^\d\r\n] Negated character class, match any char except a digit or a newline
(\d+) Capture group 1, match 1+ digits
Regex demo
To get the last digits in the string (not necessarily at the end of the string)
^.*?(\d+)[^\r\n\d]*$
^ Start of line
.*? Match any char except a newline as least as possible (non greedy)
(\d+) Capture group 1, match 1+ digits
[^\r\n\d]* Negated character class, match 0+ times any char except a newline or digit
$ End of line
Regex demo