Extracting floating point number [duplicate] - regex

Assuming I have the following string:
str = """
HELLO 1 Stop #$**& 5.02‼️ 16.1
regex
5 ,#2.3222
"""
I want to export all numbers , Whether int or float after the word "stop" with no case sensitive . so the expected results will be :
[5.02, 16.1, 5, 2.3222]
The farthest I have come so far is by using PyPi regex from other post here:
regex.compile(r'(?<=stop.*)\d+(?:\.\d+)?', regex.I)
but this expression gives me only [5.02, 16.1]

Yet another one, albeit with the newer regex module:
(?:\G(?!\A)|Stop)\D+\K\d+(?:\.\d+)?
See a demo on regex101.com.
In Python, this could be
import regex as re
string = """
HELLO 1 Stop #$**& 5.02‼️ 16.1
regex
5 ,#2.3222
"""
pattern = re.compile(r'(?:\G(?!\A)|Stop)\D+\K\d+(?:\.\d+)?')
numbers = pattern.findall(string)
print(numbers)
And would yield
['5.02', '16.1', '5', '2.3222']
Don't name your variables after inbuilt-functions, like str, list, dict and the like.
If you need to go further and limit your search within some bounds (e.g. all numbers between Stop and end), you could as well use
(?:\G(?!\A)|Stop)(?:(?!end)\D)+\K\d+(?:\.\d+)?
# ^^^ ^^^
See another demo on regex101.com.

You get only the first 2 numbers, as .* does not match a newline.
You can add update the flags to regex.I | regex.S to have the dot match a newline.
import regex
text = """
HELLO 1 Stop #$**& 5.02‼️ 16.1
regex
5 ,#2.3222
"""
pattern = regex.compile(r'(?<=\bstop\b.*)\d+(?:\.\d+)?', regex.I | regex.S)
print(regex.findall(pattern, text))
Output
['5.02', '16.1', '5', '2.3222']
See a Python demo
If you want to print the numbers after the word "stop", you can also use python re and match stop, and then capture in a group all that follows.
Then you can take that group 1 value, and find all the numbers.
import re
text = """
HELLO 1 Stop #$**& 5.02‼️ 16.1
regex
5 ,#2.3222
"""
pattern = r"\bStop\b(.+)"
m = re.search(pattern, text, re.S|re.I)
if m:
print(re.findall(r"\d+(?:\.\d+)*", m.group(1)))
Output
['5.02', '16.1', '5', '2.3222']

You could use:
inp = """
HELLO 1 Stop #$**& 5.02‼️ 16.1
regex
5 ,#2.3222"""
nums = []
if re.search(r'\bstop\b', inp, flags=re.I):
inp = re.sub(r'^.*?\bstop\b', '', inp, flags=re.S|re.I)
nums = re.findall(r'\d+(?:\.\d+)?', inp)
print(nums) # ['5.02', '16.1', '5', '2.3222']
The if logic above ensures that we only attempt to populate the array of numbers if we are certain that Stop appears in the input text. Otherwise, the default output is just an empty array. If Stop does appear, then we strip off that leading portion of the string before using re.findall to find all numbers appearing afterwards.

import re
_string = """
HELLO 1 Stop #$**& 5.02‼️ 16.1
regex
5 ,#2.3222
"""
start = _string.find("Stop") + len("Stop")
print(re.findall("[-+]?\d*\.?\d+", _string[start:])) # ['5.02', '16.1', '5', '2.3222']

Related

Python re to retrieve pattern plus x number of characters after the pattern

I want to use python re to search for a string, and then print out that string and the next 4 characters after the string. I can not work out how to do it.
I've tried using the .{4} parameter when I print the pattern, but nothing is displayed (see my code example)
import re
sequence="I want to know if there are some available 123"
pattern="available"
re.search(pattern, sequence):
print(pattern{.4})
else:
print ("it's not there")
What every the next 4 characters is after the search strong 'available' I would like to print out the search string, and those 4 characters, so in the code example it would print out 'available 123'.
You have to concatenate the .{4} to the pattern when searching:
import re
sequence="I want to know if there are some available 123"
pattern="available"
res = re.search(pattern + '.{4}', sequence)
if (res):
print(res.group(0))
else:
print ("it's not there")
Output:
available 123

In python how can I convert regex match type (sre.SRE_Match) to FLOATS? [duplicate]

I am trying to use a regular expression to extract words inside of a pattern.
I have some string that looks like this
someline abc
someother line
name my_user_name is valid
some more lines
I want to extract the word my_user_name. I do something like
import re
s = #that big string
p = re.compile("name .* is valid", re.flags)
p.match(s) # this gives me <_sre.SRE_Match object at 0x026B6838>
How do I extract my_user_name now?
You need to capture from regex. search for the pattern, if found, retrieve the string using group(index). Assuming valid checks are performed:
>>> p = re.compile("name (.*) is valid")
>>> result = p.search(s)
>>> result
<_sre.SRE_Match object at 0x10555e738>
>>> result.group(1) # group(1) will return the 1st capture (stuff within the brackets).
# group(0) will returned the entire matched text.
'my_user_name'
You can use matching groups:
p = re.compile('name (.*) is valid')
e.g.
>>> import re
>>> p = re.compile('name (.*) is valid')
>>> s = """
... someline abc
... someother line
... name my_user_name is valid
... some more lines"""
>>> p.findall(s)
['my_user_name']
Here I use re.findall rather than re.search to get all instances of my_user_name. Using re.search, you'd need to get the data from the group on the match object:
>>> p.search(s) #gives a match object or None if no match is found
<_sre.SRE_Match object at 0xf5c60>
>>> p.search(s).group() #entire string that matched
'name my_user_name is valid'
>>> p.search(s).group(1) #first group that match in the string that matched
'my_user_name'
As mentioned in the comments, you might want to make your regex non-greedy:
p = re.compile('name (.*?) is valid')
to only pick up the stuff between 'name ' and the next ' is valid' (rather than allowing your regex to pick up other ' is valid' in your group.
You could use something like this:
import re
s = #that big string
# the parenthesis create a group with what was matched
# and '\w' matches only alphanumeric charactes
p = re.compile("name +(\w+) +is valid", re.flags)
# use search(), so the match doesn't have to happen
# at the beginning of "big string"
m = p.search(s)
# search() returns a Match object with information about what was matched
if m:
name = m.group(1)
else:
raise Exception('name not found')
You can use groups (indicated with '(' and ')') to capture parts of the string. The match object's group() method then gives you the group's contents:
>>> import re
>>> s = 'name my_user_name is valid'
>>> match = re.search('name (.*) is valid', s)
>>> match.group(0) # the entire match
'name my_user_name is valid'
>>> match.group(1) # the first parenthesized subgroup
'my_user_name'
In Python 3.6+ you can also index into a match object instead of using group():
>>> match[0] # the entire match
'name my_user_name is valid'
>>> match[1] # the first parenthesized subgroup
'my_user_name'
Maybe that's a bit shorter and easier to understand:
import re
text = '... someline abc... someother line... name my_user_name is valid.. some more lines'
>>> re.search('name (.*) is valid', text).group(1)
'my_user_name'
You want a capture group.
p = re.compile("name (.*) is valid", re.flags) # parentheses for capture groups
print p.match(s).groups() # This gives you a tuple of your matches.
Here's a way to do it without using groups (Python 3.6 or above):
>>> re.search('2\d\d\d[01]\d[0-3]\d', 'report_20191207.xml')[0]
'20191207'
You can also use a capture group (?P<user>pattern) and access the group like a dictionary match['user'].
string = '''someline abc\n
someother line\n
name my_user_name is valid\n
some more lines\n'''
pattern = r'name (?P<user>.*) is valid'
matches = re.search(pattern, str(string), re.DOTALL)
print(matches['user'])
# my_user_name
I found this answer via google because I wanted to unpack a re.search() result with multiple groups directly into multiple variables. While this might be obvious for some, it was not for me because I always used group() in the past, so maybe it helps someone in the future who also did not know about group*s*().
s = "2020:12:30"
year, month, day = re.search(r"(\d+):(\d+):(\d+)", s).groups()
It seems like you're actually trying to extract a name vice simply find a match. If this is the case, having span indexes for your match is helpful and I'd recommend using re.finditer. As a shortcut, you know the name part of your regex is length 5 and the is valid is length 9, so you can slice the matching text to extract the name.
Note - In your example, it looks like s is string with line breaks, so that's what's assumed below.
## covert s to list of strings separated by line:
s2 = s.splitlines()
## find matches by line:
for i, j in enumerate(s2):
matches = re.finditer("name (.*) is valid", j)
## ignore lines without a match
if matches:
## loop through match group elements
for k in matches:
## get text
match_txt = k.group(0)
## get line span
match_span = k.span(0)
## extract username
my_user_name = match_txt[5:-9]
## compare with original text
print(f'Extracted Username: {my_user_name} - found on line {i}')
print('Match Text:', match_txt)

Python regex negative lookbehind embedded numeric number

I am trying to pull a certain number from various strings. The number has to be standalone, before ', or before (. The regex I came up with was:
\b(?<!\()(x)\b(,|\(|'|$) <- x is the numeric number.
If x is 2, this pulls the following string (almost) fine, except it also pulls 2'abd'. Any advice what I did wrong here?
2(2'Abf',3),212,2'abc',2(1,2'abd',3)
Your actual question is, as I understand it, get these specific number except those in parenthesis.
To do so I suggest using the skip_what_to_avoid|what_i_want pattern like this:
(\((?>[^()\\]++|\\.|(?1))*+\))
|\b(2)(?=\b(?:,|\(|'|$))
The idea here is to completely disregard the overall matches (and there first group use for the recursive pattern to capture everything between parenthesis: (\((?>[^()\\]++|\\.|(?1))*+\))): that's the trash bin. Instead, we only need to check capture group $2, which, when set, contains the asterisks outside of comments.
Demo
Sample Code:
import regex as re
regex = r"(\((?>[^()\\]++|\\.|(?1))*+\))|\b(2)(?=\b(?:,|\(|'|$))"
test_str = "2(2'Abf',3),212,2'abc',2(1,2'abd',3)"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
if match.groups()[1] is not None:
print ("Found at {start}-{end}: {group}".format(start = match.start(2), end = match.end(2), group = match.group(2)))
Output:
Found at 0-1: 2
Found at 16-17: 2
Found at 23-24: 2
This solution requires the alternative Python regex package.

python3: regex need to character to match but dont want in output

I have a string named
Set-Cookie: BIGipServerApp_Pool_SSL=839518730.47873.0000; path=/
I am trying to extract 839518730.47873.0000 from it. For exact string I am fine with my regex but If I include any digit before 1st = then its all going wrong.
No Digit
>>> m=re.search('[0-9.]+','Set-Cookie: BIGipServerApp_Pool_SSL=839518730.47873.0000; path=/')
>>> m.group()
'839518730.47873.0000'
With Digit
>>> m=re.search('[0-9.]+','Set-Cookie: BIGipServerApp_Pool_SSL2=839518730.47873.0000; path=/')
>>> m.group()
'2'
Is there any way I can extract `839518730.47873.0000' only but doesnt matter what else lies in the string.
I tried
>>> m=re.search('=[0-9.]+','Set-Cookie: BIGipServerApp_Pool_SSL=839518730.47873.0000; path=/')
>>> m.group()
'=839518730.47873.0000'
As well but its starting with '=' in the output and I dont want it.
Any ideas.
Thank you.
If your substring always comes after the first =, you can just use capture group with =([\d.]+) pattern:
import re
result = ""
m = re.search(r'=([0-9.]+)','Set-Cookie: BIGipServerApp_Pool_SSL2=839518730.47873.0000; path=/')
if m:
result = m.group(1) # Get Group 1 value only
print(result)
See the IDEONE demo
The main point is that you match anything you do not need and match and capture (with the unescaped round brackets) the part of pattern you need. The value you need is in Group 1.
You can use word boundaries:
\b[\d.]+
RegEx Demo
Or to make match more targeted use lookahead for next semi-colon after your matched text:
\b[\d.]+(?=\s*;)
RegEx Demo2
Update :
>>> m.group(0)
'839518730.47873.0000'
>>> m=re.search(r'\b[\d.]+','Set-Cookie: BIGipServerApp_Pool_SSL2=839518730.47873.0000; path=/')
>>> m.group(0)
'839518730.47873.0000'
>>>

Regex to catch a string without () in 3 patterns like abc(ef) ,(ef)abc and (ef)abc(gh)

I have tested this Regex
(?<=\))(.+?)(?=\()|(?<=\))(.+?)\b|(.+?)(?=\()
but it doesn't work for strings like this pattern (ef)abc(gh).
I got a result like this "(ef)abc".
But these 3 regexes (?<=\))(.+?)(?=\() , (?<=\))(.+?)\b, (.+?)(?=\()
do work separately for "(ef)abc(gh)", "(ef)abc" ,"abc(ef)" .
can anyone tell me where the problem is or how can I get the expected result?
Assuming you are looking to match the text from between the elements in parenthesis, try this:
^(?:\(\w*\))?([\w]*)(?:\(\w*\))?$
^ - beginning of string
(?:\(\w*\))? - non-capturing group, match 0 or more alphabetic letters within parens, all optional
([\w]*) - capturing group, match 0 or more alphabetic letters
(?:\(\w*\))? - non-capturing group, match 0 or more alphabetic letters within parens, all optional
$ - end of string
You haven't specified what language you might be using, but here is an example in Python:
>>> import re
>>> string = "(ef)abc(gh)"
>>> string2 = "(ef)abc"
>>> string3 = "abc(gh)"
>>> p = re.compile(r'^(?:\(\w*\))?([\w]*)(?:\(\w*\))?$')
>>> m = re.search(p, string)
>>> m2 = re.search(p, string2)
>>> m3 = re.search(p, string3)
>>> print m.groups()[0]
'abc'
>>> print m2.groups()[0]
'abc'
>>> print m3.groups()[0]
'abc'
\([^)]+\)|([^()\n]+)
Try this.Just grab the capture or group.See demo.
https://regex101.com/r/tX2bH4/6
Your problem is that (.+?)(?=\() matches "(ef)abc" in "(ef)abc(gh)".
The easiest solution to this problem is be more explicit about what you are looking for. In this case by exchanging "any character" ., with "any character that is not a parenthesis" [^\(\)].
(?<=\))([^\(\)]+?)(?=\()|(?<=\))([^\(\)]+?)\b|([^\(\)]+?)(?=\()
A cleaner regexp would be
(?:(?<=^)|(?<=\)))([^\(\)]+)(?:(?=\()|(?=$))