Python regex (.*?) isn't giving an output [duplicate] - regex

This question already has answers here:
Python regex, matching pattern over multiple lines.. why isn't this working?
(2 answers)
Closed 4 years ago.
I'm making a project and part of it is taking in a python file as a text file and parsing it using regular expressions.
I was able to use this fine (where program is a string containing the code with newlines):
findall(r"def (.*?)\((.*?)\)", program)
But this line just gives None when I expect it to give a Match object where .group() returns "func1(None, None)"
mainblock = search(r'if __name__ == "__main__":(.*?)#END', program)
An abbreviated version of the python file I'm parsing is below:
def func1(stuff, morestuff):
pass
if __name__ == "__main__":
func1(None, None)
#END
I've checked for any discrepencies in the regex itself and I can't find any. I also tried copy/pasting it directly from the code file and it still couldn't find a match

You need to either include the newline characters \n in the regular expression, like this,
r'if __name__ == "__main__":\n(.*?)\n#END'
or enable the DOTALL flag, meaning that . also matches line breaks.
(MULTILINE means something else, which can be counterintuitive.)

Related

Python3 regular expression not working on script, but works on pythex.org

I am writing a script in Python3 and I want to use regular expressions. I have some utf-8 encoded files used as configuration files for my main script.
I wish to change some lines on them (classic configuration changes).
My code ,still in poc condition, is this:
regex = re.compile('^SHOW_ALL\s[^:]')
with open('./config.txt', encoding='utf-8', mode='r+') as old_file:
for line in old_file.read():
if regex.match(line):
print(line)
and the config.txt is this:
#Κάτι στα ελληνικά
SHOW_ALL OFF 15
PRINT ON
SHOW_VALUES O
COM 0
PRINTER_NAME samsung_not_a_real_name
CAMERA 33
I checked my regular expression at pythex.org and it seems to work fine.
What could be going wrong?
*the link redirects to the exact regex and text I tried myself at regex.org
Try to replace old_file.read(); by old_file.readLines();

Matching multiple python regexes in a line in tarfile opened tar file

Please how do I overcome the problem of
TypeError: cannot use a string pattern on a bytes-like object
when trying to run multiple regexes match against a line from the file?
The multiple match I am trying is:
re.match('|'.join('(?:{0})'.format(x) for x in (regex1, regex2, regex3)), line):
which works in plain text file matches and which I attribute to StackOverflow assistance.
I have compiled the regexes like so:
regex1 = re.compile(b'http\:\/\/ipaddress\:port\/service\?')
regex2 = re.compile(b'\_event\=new?')
regex3 = re.compile(b'askment\:')
but this TypeError still appears.
Earlier in my script I can get away with this:
match = re.search(b'something-string:\s+111+\d{2,5}', line)
So I thought prefixing the regexes with 'b' in the multiple match was sufficient.
Please what am I doing wrong?
I had to decode the line, since its coming in as a binary stream.
re.match('|'.join('(?:{0})'.format(x) for x in (regex1, regex2, regex3)), line.decode("ascii or something else")):

Regex: Get first value from single line [duplicate]

This question already has answers here:
Can you provide some examples of why it is hard to parse XML and HTML with a regex? [closed]
(12 answers)
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 7 years ago.
I have the below xml on a single line, I want to get the string of DBB and replace it using regex
<Configuration ConfiguredType="Property" Path="\Package.Connections[DBA DB].Properties[ConnectionString]" ValueType="String"><ConfiguredValue>Data Source=.\test;Initial Catalog=DBA;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;Application Name=B;</ConfiguredValue></Configuration><Configuration ConfiguredType="Property" Path="\Package.Connections[DBB DB].Properties[ConnectionString]" ValueType="String"><ConfiguredValue>Data Source=.\test;Initial Catalog=DBB;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;Application Name=C;</ConfiguredValue></Configuration></DTSConfiguration>
I have the following which works on multi line xml but not this single line example
Data Source=.+?(?=[a-z])*\;Initial Catalog=DBB;(.*?)Integrated(.*?)[^;]*;
The above regex highlights both DBA and DBB and ends there.
Could you help in finding the missing piece in the regex I have created
Replace Data Source=.+? with Data Source=[^<]+? to avoid traversing the start of the tag.

Matching both greedy, nongreedy and all others in between [duplicate]

This question already has answers here:
Parsing valid parent directories with regex
(3 answers)
Closed 8 years ago.
Given a string like "/foo/bar/baz/quux" (think of it like a path to a file on a unixy system), how could I (if at all possible) formulate a regular expression that gives me all possible paths that can be said to contain file quux?
In other words, upon running a regexp against the given string ("/foo/bar/baz/quux"), I would like to get as results:
"/foo/"
"/foo/bar/"
"/foo/bar/baz/"
I've tried the following:
'/\/.+\//g' - this is greedy by default, matches "/foo/bar/baz/"
'/\/.+?\//g' - lazy version, matches "/foo/" and also "/baz/"
P.S.: I'm using Perl-compatible Regexps in PHP in function preg_match(), for that matter)
Felipe not looking for /foo/bar/baz, /bar/baz, /baz but for /foo, /foo/bar, /foo/bar/baz
One solution building on regex idea in comments but give the right strings:
reverse the string to be matched: xuuq/zab/rab/oof/ For instance in PHP use strrev($string )
match with (?=((?<=/)(?:\w+/)+))
This give you
zab/rab/oof/
rab/oof/
oof/
Then reverse the matches with strrev($string)
This give you
/foo/bar/baz
/foo/bar
/foo
If you had .NET not PCRE you could do matching right to left and proably come up with same.
This solution will not give exact output as you are expecting but still give you pretty useful result that you can post-process to get what you need:
$s = '/foo/bar/baz/quux';
if ( preg_match_all('~(?=((?:/[^/]+)+(?=/[^/]+$)))~', $s, $m) )
print_r($m[0]);
Working Demo
OUTPUT:
Array
(
[0] => /foo/bar/baz
[1] => /bar/baz
[2] => /baz
)
Completely different answer without reversing string.
(?<=((?:\w+(?:/|$))+(?=\w)))
This matches
foo/
foo/bar/
foo/bar/baz/
but you have to use C# which use variable lookbehind not PCRE

Python RegEX that gets each name of a function in a file

Trying to implement a regex script that gets the name of each function and returns them to a text file. The returning to text file part I got, the part I need some pointers in I don't.
# I just want to extract "name_i_want"
def name_i_want(self):
A regex for this could be:
(?<=def )(\w+)(?=\()
Working regex example:
http://regex101.com/r/qR3fE7