Python 3 regex from file - regex

I'm trying to parse a barely formated text to a price list.
I store a bunch of regex patterns in a file looing like this:
[^S](7).*(\+|(plus)|➕).*(128)
When i attempt to verify whether there is a match like this:
def trMatch(line):
for tr in trs:
nr = re.compile(tr.nameReg, re.IGNORECASE)
cr = re.compile(tr.colourReg, re.IGNORECASE)
if (nr.search(line.text) is not None): doStuff()
I get an error
File "<stdin>", line 1, in <module>
File "<stdin>", line 10, in go
File "<stdin>", line 3, in trMatch
File "/usr/lib/python3.5/re.py", line 224, in compile
return _compile(pattern, flags)
File "/usr/lib/python3.5/re.py", line 292, in _compile
raise TypeError("first argument must be string or compiled pattern")
TypeError: first argument must be string or compiled pattern
I assume it can't compile a pattern because it is missing 'r' flag.
Is there a proper way to make this method to cooperate?
Thanks!

The r"" syntax is not mandatory for working with regular expressions - this is just a helper syntax for escaping fewer characters, but it results in the same string. See What exactly do "u" and "r" string flags do, and what are raw string literals?
I'm not sure what trs is in your code, but it's a good guess that tr.nameReg and tr.colourReg are not strings: try to debug or print them and make sure they have the correct value.

Turns out re.search doesn't omit null patterns as I assumed. I added a simple check if there's a valid pattern and string to look in.
Works like charm

Related

print all lines of a file which contain a specific string

I want to search and a log file for specific informations which are contained in the file and get the whole line printed out
for example line 32 in a .log file is:
2019-08-07 15:21:09.783 'lineid' -> 'DEU.DTAG.NBGNE00111'
I want every line printed out that includes the word lineid.
If someone could help me with my problem I would very appreciate it.
thanks
I am quite new to ruby and regex so I tried by searching something from the internet.
File.open("/Users/filip/Documents/Testcases/NG-ART_TC1054921_MKS+Results1/1054921_1042510_2242_TID-2123/TC1042510-TID2123-sequencer.log", "r") do |file|
for line in file.readlines().include? "lineid"
puts line
end
end
what I am getting back is only
/Users/filip/Documents/cucumber/features/features/step_definitions/lineid.rb:7: syntax error, unexpected end-of-input, expecting end
In ruby is recommended to use each to iterate, you don't usually use for loops.
You could try this:
File.foreach("/Users/filip/Documents/Testcases/NG-ART_TC1054921_MKS+Results1/1054921_1042510_2242_TID-2123/TC1042510-TID2123-sequencer.log").each do |line|
puts line if line.include?('lineid')
end
This will execute the given block for each line in the file without slurping the entire file into memory.
See: IO::foreach
Making your example work would look like:
File.open("/Users/filip/Documents/Testcases/NG-ART_TC1054921_MKS+Results1/1054921_1042510_2242_TID-2123/TC1042510-TID2123-sequencer.log", "r") do |file|
file.readlines().each do |line|
puts line if line.include?("erc")
end
end

In python how can i print only the first matching line from the log output?

I have input like this
line 1: [DEBUG]...
line 2: [DEBUG]...
line 2: [DEBUG]...
From this I want to print only the first matching string meaning only
The first matching line 1: [DEBUG] and stop the traversing.
I have tried the code below:
for num1,line1 in enumerate(reversed(newline),curline):
ustr1="[DEBUG]"
if ustr1 in line1:
firstnum=num1
Can anyone help me in this?
your question is not formatted very well, I cannot really see what your object is looking like etc.... I'd assume it's this way:
input="1: [DEBUG]....\n2: [DEBUG]...\n3:...."
# now you could do e.g.:
print(input.split("\n")[0].strip("\r"))
# which would be the first line. as you are searching a line containing a certain string "ustr1", you could do:
for line in input.split("\n"):
if ustr1 in line:
print(line)
break #as you dont want more than one line
#furthermore, if you need the index of the line, do:
for i,line in enumerate(input.split("\n")):
pass #i = index, line = line
Hope i understood it right ;)

Unicode support in regular expression during group capturing in python

I am currently using re2,re and pcre for regular expression matching in python. when I use regular expression such as re.compile("(?P(\S*))") it is fine and compiled without error but when I use with unicode character such as re.compile("(?P<årsag>(\S*))") then there will be error and can not be compiled. Is there is any python library that support unicode completely.
edit : Please refer my output:
>>> import regex
>>> m = regex.compile(r"(?P<årsag>(\S*))")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/site-packages/regex.py", line 331, in compile
return _compile(pattern, flags, kwargs)
File "/usr/local/lib/python2.7/site-packages/regex.py", line 499, in _compile
caught_exception.pos)
_regex_core.error: bad character in group name at position 10
You need to use the external regex module. regex module would support Unicode character in the name of named capturing group.
>>> import regex
>>> m = regex.compile(r"(?P<årsag>(\S*))")
>>> m.search('foo').group('årsag')
'foo'
>>> m.search('foo bar').group('årsag')
'foo'

Cannot compile 8 digit unicode regex ranges in Python 2.7 re

Using Python 2.7, re
I'm trying to compile unicode character classes. I can get it to work with 4 digit ranges (u'\uxxxx') but not 8 digits (u'\Uxxxxxxxx').I
The following works:
re.compile(u'[\u0010-\u0012]')
The following does not:
re.compile(u'[\U00010000-\U00010001]')
The resultant error is:
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\lib\re.py", line 242, in _compile
raise error, v # invalid expression
error: bad character range
It appears to be an issue with 8 digit ranges only as the following works:
re.compile(u'\U00010000')
Separate question, I am new to stackoverflow and I am really struggling with how to post questions. I would expect that Trackback to appear on multiple lines, not on one line. I would also like to be able to paste in content copied from the interpreter but this UI makes a mess out of '>>>'
Don't know how to add this in a comment editing question.
The expression I really want to compile is:
re.compile(u'[\U00010000-\U0010FFFF]')
Expanding it with list(u'[\U00010000-\U0010FFFF]') looks pretty intractable as far as extending the suggested workaround:
>>> list(u'[\U00010000-\U0010FFFF]')
[u'[', u'\ud800', u'\udc00', u'-', u'\udbff', u'\udfff', u']']
Depending on the compilation option, Python 2 may store Unicode strings as UTF-16 code units, and thus \U00010000 is actually a two-code-unit string:
>>> list(u'[\U00010000-\U00010001]')
[u'[', u'\ud800', u'\udc00', u'-', u'\ud800', u'\udc01', u']']
The regex parser thus sees the character class containing \udc00-\ud800 which is a "bad character range". In this setting I can't think of a solution other than to match the surrogate pairs explicitly (after ensuring sys.maxunicode == 0xffff):
>>> r = re.compile(u'\ud800[\udc00-\udc01]')
>>> r.match(u'\U00010000')
<_sre.SRE_Match object at 0x10cf6f440>
>>> r.match(u'\U00010001')
<_sre.SRE_Match object at 0x10cf4ed98>
>>> r.match(u'\U00010002')
>>> r.match(u'\U00020000')

Regular expression in Python 2.7

Another question :
I'm trying to search for a specific pattern in a fiel , but I have to deal with the following case :
This line returns a correct interpretation
f27 = re.findall( b'\x03\x00\x00\x27''(.*?)''\xF7\x00\xF0', s)
but this one got badly interpreted as x28 is related to the '()' parenthesis
f28 = re.findall( b'\x03\x00\x00\x28''(.*?)''\xF7\x00\xF0', s)
Traceback (most recent call last):
File "", line 1, in
File "D:\Portable Python 2.7.2.1\App\lib\re.py", line 177, in findall
return _compile(pattern, flags).findall(string)
File "D:\Portable Python 2.7.2.1\App\lib\re.py", line 244, in _compile
raise error, v # invalid expression
error: unbalanced parenthesis
I tried with several escapes '\' and '/' but no way !
Any solution ?
Thx
Try using raw bytestrings. The re module itself understands escape sequences.
f28 = re.findall(br'\x03\x00\x00\x28(.*?)\xF7\x00\xF0', s)