Python reg exp - match number - regex

So I have this code that extract the integer from a string of the form: Dir.<int>
def MatchDir(s):
RegExp = re.compile('Dir.([0-9]+)')
result = RegExp.match(s)
try:
return int(result.group(1))
except:
return None
problem is that it also matches strings such as Dir.123_test which is not desired.
How to resolve this to match only strings of the from Dir.<int> (no char is acceptable before or after this specific form)

Use ^ and $ to match the start and end of string:
RegExp = re.compile('^Dir.([0-9]+)$')
This won't allow anything other than Dir. and a number

Related

Replace a certain part of the string which matches a pattern in python 2.7 and save in the same file

I am trying to achieve something of this sort
My input file has this kind of entries
art.range.field = 100
art.net.cap = 200
art.net.ht = 1000
art.net.dep = 8000
I am trying to match the pattern like where art.range.field is there the value should be changed to 500. So the output of the code should be something like
art.range.field = 500
art.net.cap = 200
art.net.ht = 1000
art.net.dep = 8000
Here is my following attempt at solving this problem
file_path = /tmp/dimension
with open(file_path,"r") as file
file_content = file.read()
new_content = re.sub(r"^.*"+parameter+".*$",parameter+" = %s" % value, file_content)
file.seek(0)
file.write(new_content)
file.truncate()
Here I have taken parameter = art.range.field and value = 500.
But still my file is remaining unchanged as the new_content variable is not changing its value to the desired out put.
So I want to know where I am going wrong and what can be the possible solution to this.
You can get what you want with
import re
parameter = 'art.range.field'
value = 500
with open(file_path,"r+") as file:
new_content = re.sub(r"^("+re.escape(parameter)+r"\s*=\s*).*", r"\g<1>%d" % value, file.read(), flags=re.M)
file.seek(0)
file.write(new_content)
file.truncate()
See the regex demo.
Note:
You need to use r+ to actally read/write to a file
re.M to match start of any line with a ^
re.escape to escape special chars in the parameter variable
Regex details:
^ - start of line
(art\.range\.field\s*=\s*) - Group 1 (\g<1> in the replacement pattern, the unambiguous backreference is required as value starts with a digit):
art\.range\.field - a art.range.field string
\s*=\s* - a = enclosed with 0+ whitespaces
.* - any 0 or more chars other than line break chars as many as possible
You are not changing anything because you are not in multi-line mode. If you prepend (?m) to your regex (before the ^) it should work. See also this resource for deepening the argument of regexp's modifiers.
Furthermore:
You don't need $, since you are not in single line mode, so with .* you'll just match all the characters until the end of the line.
For avoiding false positives, I would also make sure that parameter is followed by an equal sign.
You'd better to escape the parameter (using the re.escape method) if you want to use it as part of a regex.
So you should use this line of code:
new_content = re.sub(r"(?m)^\s*" + re.escape(parameter) +"\s*=.*", parameter + " = " + value, file_content)

convert string to regex pattern

I want to find the pattern of a regular expression from a character string. My goal is to be able to reuse this pattern to find a string in another context but checking the pattern.
from sting "1example4whatitry2do",
I want to find pattern like: [0-9]{1}[a-z]{7}[0-9]{1}[a-z]{8}[0-9]{1}[a-z]{2}
So I can reuse this pattern to find this other example of sting 2eytmpxe8wsdtmdry1uo
I can do a loop on each caracter, but I hope there is a fast way
Thanks for your help !
You can puzzle this out:
go over your strings characterwise
if the character is a text character add a 't' to a list
if the character is a number add a 'd' to a list
if the character is something else, add itself to the list
Use itertools.groupby to group consecutive identical letters into groups.
Create a pattern from the group-key and the length of the group using some string literal formatting.
Code:
from itertools import groupby
from string import ascii_lowercase
lower_case = set(ascii_lowercase) # set for faster lookup
def find_regex(p):
cum = []
for c in p:
if c.isdigit():
cum.append("d")
elif c in lower_case:
cum.append("t")
else:
cum.append(c)
grp = groupby(cum)
return ''.join(f'\\{what}{{{how_many}}}'
if how_many>1 else f'\\{what}'
for what,how_many in ( (g[0],len(list(g[1]))) for g in grp))
pattern = "1example4...whatit.ry2do"
print(find_regex(pattern))
Output:
\d\t{7}\d\.{3}\t{6}\.\t{2}\d\t{2}
The ternary in the formatting removes not needed {1} from the pattern.
See:
str.isdigit()
If you now replace '\t'with '[a-z]' your regex should fit. You could also replace isdigit check using a regex r'\d' or a in set(string.digits) instead.
pattern = "1example4...whatit.ry2do"
pat = find_regex(pattern).replace(r"\t","[a-z]")
print(pat) # \d[a-z]{7}\d\.{3}[a-z]{6}\.[a-z]{2}\d[a-z]{2}
See
string module for ascii_lowercase and digits

I want to match two Urls in Python using regular expressions, but following code gives an error. I cannot figure out why

input = 'susaya https://sousfs#sls.sus.uk/de/sekd/sho/project1/first_project'
url_match = re.match("\s*susaya\s+([^ ]+)", input)
When I try to print url_match, I get the memory location.
print url_match
None
<_sre.SRE_Match object at 0x5f630cs48e40>
What is the output of regular expression ("\s*susaya\s+([^ ]+)?
I get None when I try to print because url_match doesn't match?
I am using python2.7. Thanks.
re.match doesn't return a string, it returns a match object. Calling group(i) on the match object returns the i'th capture group, with the 0th capture group being the entire match.
>>> input = "susaya https://sousfs#sls.sus.uk/de/sekd/sho/project1/first_project"
>>> url_match = re.match(r"\s*susaya\s+([^ ]+)", input)
>>> url_match.group(0)
'susaya https://sousfs#sls.sus.uk/de/sekd/sho/project1/first_project'
The pattern "\s*susaya\s+([^ ]+)" matches zero or more spaces, followed by "susaya", followed by one or more spaces, followed by a capture group of one or more characters that are not spaces.

Regex for random string plus optional dash and number

I would like to validate my string and would need help with regex. How do I express such a string:
anything, then dash and some digits. Empty input should be ok too, if possible.
so:
[nothing] = valid
astring = valid
astring- = invalid
astring-1 = valid
astring-a = invalid
astring-a1 = invalid
You can use this regex:
^[^-]*(-[0-9]+)?$
RegEx Demo
Try this simple regex :
^*(-[0-9]+)?$

Regexp matching except

I'm trying to match some paths, but not others via regexp. I want to match anything that starts with "/profile/" that is NOT one of the following:
/profile/attributes
/profile/essays
/profile/edit
Here is the regex I'm trying to use that doesn't seem to be working:
^/profile/(?!attributes|essays|edit)$
For example, none of these URLs are properly matching the above:
/profile/matt
/profile/127
/profile/-591m!40v81,ma/asdf?foo=bar#page1
You need to say that there can be any characters until the end of the string:
^/profile/(?!attributes|essays|edit).*$
Removing the end-of-string anchor would also work:
^/profile/(?!attributes|essays|edit)
And you may want to be more restrictive in your negative lookahead to avoid excluding /profile/editor:
^/profile/(?!(?:attributes|essays|edit)$)
comments are hard to read code in, so here is my answer in nice format
def mpath(path, ignore_str = 'attributes|essays|edit',anything = True):
any = ''
if anything:
any = '.*?'
m = re.compile("^/profile/(?!(?:%s)%s($|/)).*$" % (ignore_str,any) )
match = m.search(path)
if match:
return match.group(0)
else:
return ''