I have a problem in REGEX .
My code is:
self.file = re.sub(r'([^;{}]{1}\s*)[\n]|([;{}]\s*[\n])',r'\1\2',self.file)
I need to replace this :
TJumpMatchArray *skipTableMatch
);
void computeCharJumps(string *str
with this:
TJumpMatchArray *skipTableMatch );
void computeCharJumps(string *str
I need to store white spaces and I need to replace all new lines '\n' that are not after {}; with '' .
I found that problem is maybe that python interpret(using Python 3.2.3) not working parallen and if it don't match first group if fails with this:
File "cha.py", line 142, in <module>
maker.editFileContent()
File "cha.py", line 129, in editFileContent
self.file = re.sub(r'([^;{}]{1}\s*)[\n]|([;{}]\s*[\n])',r'\1|\2',self.file)
File "/usr/local/lib/python3.2/re.py", line 167, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "/usr/local/lib/python3.2/re.py", line 286, in filter
return sre_parse.expand_template(template, match)
File "/usr/local/lib/python3.2/sre_parse.py", line 813, in expand_template
raise error("unmatched group")
In this online regex tool it is working:Example here
Reason why i use :
|([;{}]\s*[\n])
is because if i have:
'; \n'
it replace the :
' \n'
with '' and i need to store the same format after {};.
Is there any way to fix this?
The problem is that for every found match only one group will be not empty.
Consider this simplified example:
>>> import re
>>>
>>> def replace(match):
... print(match.groups())
... return "X"
...
>>> re.sub("(a)|(b)", replace, "-ab-")
('a', None)
(None, 'b')
'-XX-'
As you can see, the replacement function is called twice, once with the second group set to None, and once with the first.
If you would use a function to replace your matches (like in my example), you can easily check which of the groups was the matching one.
Example:
re.sub(r'([^;{}]{1}\s*)[\n]|([;{}]\s*[\n])', lambda m: m.group(1) or m.group(2), self.file)
Related
I would like to transform the following text:
some text
% comment line 1
% comment line 2
% comment line 3
some more text
into
some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
AND in the same file, when there is only one line commented, I would like it to go from
some text
% a single commented line
some more text
to
some text
# a single commented line
some more text
So, when the two cases are in the same file, I would like to go from:
some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text
to
some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
some text
# a single commented line
some more text
What I tried so far, for the second case works as:
re.sub(r'(\A|\r|\n|\r\n|^)% ', r'\1# ', 'some text \n% a single comment line\nsome more text')
but it replaces % into # also when there is more than one line commented.
As for the second case I have failed with:
re.sub(r'(\A|\r|\n|\r\n|^)(% )(.*)(?:\n^\t.*)*', r'"""\3"""', 'some text \n% comment line1\n% comment line 2\n% comment line 3\nsome more text')
which repeats the """ at each line and conflicts with the case when only one line is commented.
Is there any way to count the consecutive lines where a regular expression is found and change pattern accordingly?
Thanks in advance for the help!
While this is probably possible with a regular expression, I think this is much easier without one. You could e.g. use itertools.groupby to detect groups of consecutive commented lines, simply using str.startswith to check whether a line is a comment.
text = """some text
% comment line 1
% comment line 2
% comment line 3
some more text
some text
% a single commented line
some more text"""
import itertools
for k, grp in itertools.groupby(text.splitlines(), key=lambda s: s.startswith("%")):
if not k:
for s in grp:
print(s)
else:
grp = list(grp)
if len(grp) == 1:
print("# " + grp[0].lstrip("% "))
else:
print('"""')
for s in grp:
print(s.lstrip("% "))
print('"""')
This just prints the resulting text, but you can of course also collect it in some string variable and return it. If comments can also start in the middle of a line, you can check this in the if not k block. Here it would make sense to use re.sub to e.g. differentiate between % and \%.
Straightforwardly:
with open('input.txt') as f:
comments = []
def reformat_comments(comments):
if len(comments) == 1:
comments_str = '#' + comments[0] + '\n'
else:
comments_str = '"""\n{}\n"""\n'.format('\n'.join(comments))
return comments_str
for line in f:
line = line.strip()
if line.startswith('% '):
comments.append(line.lstrip('%'))
elif comments:
print(reformat_comments(comments) + line)
comments = []
else:
print(line)
if comments: print(reformat_comments(comments))
Sample output:
some text
"""
comment line 1
comment line 2
comment line 3
"""
some more text
some text
# a single commented line
some more text
I'm trying to parse a barely formated text to a price list.
I store a bunch of regex patterns in a file looing like this:
[^S](7).*(\+|(plus)|➕).*(128)
When i attempt to verify whether there is a match like this:
def trMatch(line):
for tr in trs:
nr = re.compile(tr.nameReg, re.IGNORECASE)
cr = re.compile(tr.colourReg, re.IGNORECASE)
if (nr.search(line.text) is not None): doStuff()
I get an error
File "<stdin>", line 1, in <module>
File "<stdin>", line 10, in go
File "<stdin>", line 3, in trMatch
File "/usr/lib/python3.5/re.py", line 224, in compile
return _compile(pattern, flags)
File "/usr/lib/python3.5/re.py", line 292, in _compile
raise TypeError("first argument must be string or compiled pattern")
TypeError: first argument must be string or compiled pattern
I assume it can't compile a pattern because it is missing 'r' flag.
Is there a proper way to make this method to cooperate?
Thanks!
The r"" syntax is not mandatory for working with regular expressions - this is just a helper syntax for escaping fewer characters, but it results in the same string. See What exactly do "u" and "r" string flags do, and what are raw string literals?
I'm not sure what trs is in your code, but it's a good guess that tr.nameReg and tr.colourReg are not strings: try to debug or print them and make sure they have the correct value.
Turns out re.search doesn't omit null patterns as I assumed. I added a simple check if there's a valid pattern and string to look in.
Works like charm
I have a text file that I would like to search through it to see how many of a certain word is in it. I'm getting the wrong count for the words.
File is here
code:
import re
with open('SysLog.txt', 'rt') as myfile:
for line in myfile:
m = re.search('guest', line, re.M|re.I)
if m is not None:
m.group(0)
print( "Found it.")
print('Found',len(m.group()), m.group(),'s')
break
for line in myfile:
n = re.search('Worm', line)
if n is not None:
n.group(0)
print("\n\tNext Match.")
print('Found', len(n.group()), n.group(), 's')
break
for line in myfile:
o = re.search('anonymous', line)
if o is not None:
o.group(0)
print("\n\tNext Match.")
print('Found', len(o.group()), o.group(), 's')
break
There is no need to use a regex, you can use str.count() to make the process much more simple:
with open('SysLog.txt', 'rt') as myfile:
text = myfile.read()
for word in ('guest', 'Worm', 'anonymous'):
print("\n\tNext Match.")
print('Found', text.count(word), word, 's')
To test this, I downloaded the file and ran the code above, and got the output:
Next Match.
Found 4 guest s
Next Match.
Found 91 Worm s
Next Match.
Found 18 anonymous s
which is correct if you do a find on the document in a text editor!
*As a sidenote, I'm not sure why you want to print a tab (\t) before 'Next Match' each time as it just looks weird in the output but it doesn't matter :)
There are multiple problems with your code:
re.search will only give you the first match, if any; this does not have to be a problem, though, as it seems like the word is only expected to appear once per line; otherwise, use re.findall
the line n.group(0) does not do anything without an assignment
len(n.group()) does not give you the number of matches, but the length of the matched string
you break after the first line in the file
myfile is an iterator, so once the first for line in myfile loop has finished, the other two won't have any lines left to loop (it will never finish because of the break anyway, though)
as already noted, you do not need regular expression at all
One (among many) possible ways of doing this would be this (not tested):
counts = {"worm": 0, "guest": 0, "anonymous": 0}
for line in myfile:
for word in counts:
if word in line:
counts[word] += 1
This is the code I ran
fname = raw_input('Enter file name: ')
if ( len(fname) < 1 ) : fname = 'shi.txt'
fh = open(fname)
for line in fh:
email=re.findall('^From (.*)',line)
print len(email)
print email[0]
x=email[0]
This is the output and error I'm getting
Enter file name: shi.txt
1
stephen.marquard#uct.ac.za Sat Jan 5 09:14:16 2008
0
Traceback (most recent call last):
File "C:\Users\Shivam\Desktop\test1.py", line 21, in <module>
print email[0]
IndexError: list index out of range
My issue is that as in the output you can see email[0] shouldn't be out of index but still I'm getting this error even after email[0] is actually being printed.Moreover I don't understand why am I getting this 0 output after printing email[0].My code isn't getting executed after that.This is snippet of a sqlite access code.Thanks in advance
Your code contains a for loop iterating through all the lines in the file.
The first line starts with "From " and thus satisfies the ^From (.*) regex, and thus the first line parsing results in Count = 1 and the captured group value is printed (re.findall only returns captured values if capture groups are defined in the pattern).
The second line cannot be matched by your regex, and thus re.findall resultung list is empty. Thus, you get an error.
To work around that issue and check all the lines, just make sure you check the length before accessing the first item in the list:
for line in fh:
email=re.findall('^From (.*)',line)
if len(email) > 0:
print email[0]
Note that there is no point in using re.findall here since the match will always be single. You may use re.search, check if there was a match, and print the contents of the match:
for line in fh:
email=re.search(r'^From (.*)', line) # get the match object
if email: # if the match is not none
print email.group(0) # print Group 0 (match value)
Another question :
I'm trying to search for a specific pattern in a fiel , but I have to deal with the following case :
This line returns a correct interpretation
f27 = re.findall( b'\x03\x00\x00\x27''(.*?)''\xF7\x00\xF0', s)
but this one got badly interpreted as x28 is related to the '()' parenthesis
f28 = re.findall( b'\x03\x00\x00\x28''(.*?)''\xF7\x00\xF0', s)
Traceback (most recent call last):
File "", line 1, in
File "D:\Portable Python 2.7.2.1\App\lib\re.py", line 177, in findall
return _compile(pattern, flags).findall(string)
File "D:\Portable Python 2.7.2.1\App\lib\re.py", line 244, in _compile
raise error, v # invalid expression
error: unbalanced parenthesis
I tried with several escapes '\' and '/' but no way !
Any solution ?
Thx
Try using raw bytestrings. The re module itself understands escape sequences.
f28 = re.findall(br'\x03\x00\x00\x28(.*?)\xF7\x00\xF0', s)