I have a regex pattern:
import regex as re
re.sub(r'(.*)\bHello (.*) BGC$\b', "OTR", 'Hello People BGC')
This will replace to give OTR, but how do I find out what the matched characters are within the (.*)?
Using regex==2016.1.10, Python 3.5.1
Compile the pattern and then call match() and sub() separately:
>>> pattern = re.compile(r'^Hello (.*?) BGC$')
>>> s = 'Hello People BGC'
>>> pattern.match(s).group(1)
'People'
>>> pattern.sub("OTR", s)
'OTR'
Related
So for example I have this string
var = 'column1;column2;column3\r\nval1;val2;val3\r\n;val4;val5;val6\r\n'
I want to be able to find all \r\n and replace it with temp\r\n, but I want to ignore column3\r\n
Tried to do ^(?!.*column3).*$\r\n but the \r\n syntax does not work
You want to use a negative lookbehind, that is make the substitution when \r\n is not preceded by column3:
re.sub(r'(?<!column3)\r\n', r'temp\r\n', var)
For example:
>>> import re
>>>
>>> var = 'column1;column2;column3\r\nval1;val2;val3\r\n;val4;val5;val6\r\n'
>>> new_text = re.sub(r'(?<!column3)\r\n', r'temp\r\n', var)
>>> new_text
'column1;column2;column3\r\nval1;val2;val3temp\r\n;val4;val5;val6temp\r\n'
>>>
I am trying to use a regular expression to extract words inside of a pattern.
I have some string that looks like this
someline abc
someother line
name my_user_name is valid
some more lines
I want to extract the word my_user_name. I do something like
import re
s = #that big string
p = re.compile("name .* is valid", re.flags)
p.match(s) # this gives me <_sre.SRE_Match object at 0x026B6838>
How do I extract my_user_name now?
You need to capture from regex. search for the pattern, if found, retrieve the string using group(index). Assuming valid checks are performed:
>>> p = re.compile("name (.*) is valid")
>>> result = p.search(s)
>>> result
<_sre.SRE_Match object at 0x10555e738>
>>> result.group(1) # group(1) will return the 1st capture (stuff within the brackets).
# group(0) will returned the entire matched text.
'my_user_name'
You can use matching groups:
p = re.compile('name (.*) is valid')
e.g.
>>> import re
>>> p = re.compile('name (.*) is valid')
>>> s = """
... someline abc
... someother line
... name my_user_name is valid
... some more lines"""
>>> p.findall(s)
['my_user_name']
Here I use re.findall rather than re.search to get all instances of my_user_name. Using re.search, you'd need to get the data from the group on the match object:
>>> p.search(s) #gives a match object or None if no match is found
<_sre.SRE_Match object at 0xf5c60>
>>> p.search(s).group() #entire string that matched
'name my_user_name is valid'
>>> p.search(s).group(1) #first group that match in the string that matched
'my_user_name'
As mentioned in the comments, you might want to make your regex non-greedy:
p = re.compile('name (.*?) is valid')
to only pick up the stuff between 'name ' and the next ' is valid' (rather than allowing your regex to pick up other ' is valid' in your group.
You could use something like this:
import re
s = #that big string
# the parenthesis create a group with what was matched
# and '\w' matches only alphanumeric charactes
p = re.compile("name +(\w+) +is valid", re.flags)
# use search(), so the match doesn't have to happen
# at the beginning of "big string"
m = p.search(s)
# search() returns a Match object with information about what was matched
if m:
name = m.group(1)
else:
raise Exception('name not found')
You can use groups (indicated with '(' and ')') to capture parts of the string. The match object's group() method then gives you the group's contents:
>>> import re
>>> s = 'name my_user_name is valid'
>>> match = re.search('name (.*) is valid', s)
>>> match.group(0) # the entire match
'name my_user_name is valid'
>>> match.group(1) # the first parenthesized subgroup
'my_user_name'
In Python 3.6+ you can also index into a match object instead of using group():
>>> match[0] # the entire match
'name my_user_name is valid'
>>> match[1] # the first parenthesized subgroup
'my_user_name'
Maybe that's a bit shorter and easier to understand:
import re
text = '... someline abc... someother line... name my_user_name is valid.. some more lines'
>>> re.search('name (.*) is valid', text).group(1)
'my_user_name'
You want a capture group.
p = re.compile("name (.*) is valid", re.flags) # parentheses for capture groups
print p.match(s).groups() # This gives you a tuple of your matches.
Here's a way to do it without using groups (Python 3.6 or above):
>>> re.search('2\d\d\d[01]\d[0-3]\d', 'report_20191207.xml')[0]
'20191207'
You can also use a capture group (?P<user>pattern) and access the group like a dictionary match['user'].
string = '''someline abc\n
someother line\n
name my_user_name is valid\n
some more lines\n'''
pattern = r'name (?P<user>.*) is valid'
matches = re.search(pattern, str(string), re.DOTALL)
print(matches['user'])
# my_user_name
I found this answer via google because I wanted to unpack a re.search() result with multiple groups directly into multiple variables. While this might be obvious for some, it was not for me because I always used group() in the past, so maybe it helps someone in the future who also did not know about group*s*().
s = "2020:12:30"
year, month, day = re.search(r"(\d+):(\d+):(\d+)", s).groups()
It seems like you're actually trying to extract a name vice simply find a match. If this is the case, having span indexes for your match is helpful and I'd recommend using re.finditer. As a shortcut, you know the name part of your regex is length 5 and the is valid is length 9, so you can slice the matching text to extract the name.
Note - In your example, it looks like s is string with line breaks, so that's what's assumed below.
## covert s to list of strings separated by line:
s2 = s.splitlines()
## find matches by line:
for i, j in enumerate(s2):
matches = re.finditer("name (.*) is valid", j)
## ignore lines without a match
if matches:
## loop through match group elements
for k in matches:
## get text
match_txt = k.group(0)
## get line span
match_span = k.span(0)
## extract username
my_user_name = match_txt[5:-9]
## compare with original text
print(f'Extracted Username: {my_user_name} - found on line {i}')
print('Match Text:', match_txt)
I have a string named
Set-Cookie: BIGipServerApp_Pool_SSL=839518730.47873.0000; path=/
I am trying to extract 839518730.47873.0000 from it. For exact string I am fine with my regex but If I include any digit before 1st = then its all going wrong.
No Digit
>>> m=re.search('[0-9.]+','Set-Cookie: BIGipServerApp_Pool_SSL=839518730.47873.0000; path=/')
>>> m.group()
'839518730.47873.0000'
With Digit
>>> m=re.search('[0-9.]+','Set-Cookie: BIGipServerApp_Pool_SSL2=839518730.47873.0000; path=/')
>>> m.group()
'2'
Is there any way I can extract `839518730.47873.0000' only but doesnt matter what else lies in the string.
I tried
>>> m=re.search('=[0-9.]+','Set-Cookie: BIGipServerApp_Pool_SSL=839518730.47873.0000; path=/')
>>> m.group()
'=839518730.47873.0000'
As well but its starting with '=' in the output and I dont want it.
Any ideas.
Thank you.
If your substring always comes after the first =, you can just use capture group with =([\d.]+) pattern:
import re
result = ""
m = re.search(r'=([0-9.]+)','Set-Cookie: BIGipServerApp_Pool_SSL2=839518730.47873.0000; path=/')
if m:
result = m.group(1) # Get Group 1 value only
print(result)
See the IDEONE demo
The main point is that you match anything you do not need and match and capture (with the unescaped round brackets) the part of pattern you need. The value you need is in Group 1.
You can use word boundaries:
\b[\d.]+
RegEx Demo
Or to make match more targeted use lookahead for next semi-colon after your matched text:
\b[\d.]+(?=\s*;)
RegEx Demo2
Update :
>>> m.group(0)
'839518730.47873.0000'
>>> m=re.search(r'\b[\d.]+','Set-Cookie: BIGipServerApp_Pool_SSL2=839518730.47873.0000; path=/')
>>> m.group(0)
'839518730.47873.0000'
>>>
With python ( regex module ), I am triying to substitute 'x' for each letter 'c' in those strings occurring in a text and:
delimited by 'a', at the left, and 'b' at the right, and
with no more 'a's and 'b's in them.
Example:
cuacducucibcl -> cuaxduxuxibcl
How can I do this?
Thank you.
With the standard re module in Python, you can use a[^ab]+b to match the string which starts and end with a and b and doesn't have any occurence of a or b in between, then supply a replacement function to take care of the replacement of c:
>>> import re
>>> re.sub('a[^ab]+b', lambda m: m.group(0).replace('c', 'x'), 'cuacducucibcl')
'cuaxduxuxibcl'
Document of re.sub for reference.
Use the below regex and then replace the matched c's with x . For this , you need to install external regex module.
>>> import regex
>>> s = 'cuacducucibcl'
>>> regex.sub(r'((?:a|(?<!^)\G)[^abc\n]*)c', r'\1x', s)
'cuaxduxuxibcl'
DEMO
I have tested this Regex
(?<=\))(.+?)(?=\()|(?<=\))(.+?)\b|(.+?)(?=\()
but it doesn't work for strings like this pattern (ef)abc(gh).
I got a result like this "(ef)abc".
But these 3 regexes (?<=\))(.+?)(?=\() , (?<=\))(.+?)\b, (.+?)(?=\()
do work separately for "(ef)abc(gh)", "(ef)abc" ,"abc(ef)" .
can anyone tell me where the problem is or how can I get the expected result?
Assuming you are looking to match the text from between the elements in parenthesis, try this:
^(?:\(\w*\))?([\w]*)(?:\(\w*\))?$
^ - beginning of string
(?:\(\w*\))? - non-capturing group, match 0 or more alphabetic letters within parens, all optional
([\w]*) - capturing group, match 0 or more alphabetic letters
(?:\(\w*\))? - non-capturing group, match 0 or more alphabetic letters within parens, all optional
$ - end of string
You haven't specified what language you might be using, but here is an example in Python:
>>> import re
>>> string = "(ef)abc(gh)"
>>> string2 = "(ef)abc"
>>> string3 = "abc(gh)"
>>> p = re.compile(r'^(?:\(\w*\))?([\w]*)(?:\(\w*\))?$')
>>> m = re.search(p, string)
>>> m2 = re.search(p, string2)
>>> m3 = re.search(p, string3)
>>> print m.groups()[0]
'abc'
>>> print m2.groups()[0]
'abc'
>>> print m3.groups()[0]
'abc'
\([^)]+\)|([^()\n]+)
Try this.Just grab the capture or group.See demo.
https://regex101.com/r/tX2bH4/6
Your problem is that (.+?)(?=\() matches "(ef)abc" in "(ef)abc(gh)".
The easiest solution to this problem is be more explicit about what you are looking for. In this case by exchanging "any character" ., with "any character that is not a parenthesis" [^\(\)].
(?<=\))([^\(\)]+?)(?=\()|(?<=\))([^\(\)]+?)\b|([^\(\)]+?)(?=\()
A cleaner regexp would be
(?:(?<=^)|(?<=\)))([^\(\)]+)(?:(?=\()|(?=$))