Regex in python trouble - regex

I have a text file that I would like to search through it to see how many of a certain word is in it. I'm getting the wrong count for the words.
File is here
code:
import re
with open('SysLog.txt', 'rt') as myfile:
for line in myfile:
m = re.search('guest', line, re.M|re.I)
if m is not None:
m.group(0)
print( "Found it.")
print('Found',len(m.group()), m.group(),'s')
break
for line in myfile:
n = re.search('Worm', line)
if n is not None:
n.group(0)
print("\n\tNext Match.")
print('Found', len(n.group()), n.group(), 's')
break
for line in myfile:
o = re.search('anonymous', line)
if o is not None:
o.group(0)
print("\n\tNext Match.")
print('Found', len(o.group()), o.group(), 's')
break

There is no need to use a regex, you can use str.count() to make the process much more simple:
with open('SysLog.txt', 'rt') as myfile:
text = myfile.read()
for word in ('guest', 'Worm', 'anonymous'):
print("\n\tNext Match.")
print('Found', text.count(word), word, 's')
To test this, I downloaded the file and ran the code above, and got the output:
Next Match.
Found 4 guest s
Next Match.
Found 91 Worm s
Next Match.
Found 18 anonymous s
which is correct if you do a find on the document in a text editor!
*As a sidenote, I'm not sure why you want to print a tab (\t) before 'Next Match' each time as it just looks weird in the output but it doesn't matter :)

There are multiple problems with your code:
re.search will only give you the first match, if any; this does not have to be a problem, though, as it seems like the word is only expected to appear once per line; otherwise, use re.findall
the line n.group(0) does not do anything without an assignment
len(n.group()) does not give you the number of matches, but the length of the matched string
you break after the first line in the file
myfile is an iterator, so once the first for line in myfile loop has finished, the other two won't have any lines left to loop (it will never finish because of the break anyway, though)
as already noted, you do not need regular expression at all
One (among many) possible ways of doing this would be this (not tested):
counts = {"worm": 0, "guest": 0, "anonymous": 0}
for line in myfile:
for word in counts:
if word in line:
counts[word] += 1

Related

Regex Python concatenate lines if some text is found the line below

import re
output = open("teste-out.txt","w")
input = open("teste.txt")
for line in input:
output.write(re.sub(r"\n\r03110", r"|03110", line))
input.close()
output.close()
Why this code isn´t working, anyone can help me fix it? I wanna read from a txt and if the line starts with 03110 I wanna merge only this line with the previous line and add | before the merge
I´ve tried \n03110 \r03110 and other options, but none is working. In notepad++ I can do this using \R++03110 and replace with |03110 using regular expressions, but I wanna a python solution to optimize the job.
Input
01000|0107160
02000|1446
03100|01|316,00
03110|||316,00|0|0|7|
03100|29|135,00
03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489
02000|4720,905|1967,05|0
03100|31|705,26
03100|32|6073,00
03110|||6073,00|0|0|0,00|8
99999|23
Output
01000|0107160
02000|1446
03100|01|316,00|03110|||316,00|0|0|7|
03100|29|135,00|03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489
02000|4720,905|1967,05|0
03100|31|705,26
03100|32|6073,00|03110|||6073,00|0|0|0,00|8
99999|23
I´m using python at windows.
2nd EDIT: sorry - I guess I didn't read carefully enough...
Well, to merge lines with regards to the beginning of the second line is also possible, but perhaps not as beautifully clean:
with open('teste.txt') as fin, open('teste-out.txt', 'w') as fout:
fout.write(next(fin)[:-1])
for line in fin:
if line.startswith('03110'):
fout.write(f'|{line[:-1]}')
else:
fout.write(f'\n{line[:-1]}')
fout.write('\n')
EDIT: solution working with files:
with open('teste.txt') as fin, open('teste-out.txt', 'w') as fout:
for line in fin:
if line.startswith('03100'):
fout.write(line[:-1] + '|' + next(fin))
else:
fout.write(line)
Just for the case of interest - this is no re job imho:
s_in = '''01000|0107160
02000|1446
03100|01|316,00
03110|||316,00|0|0|7|
03100|29|135,00
03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489'''
from io import StringIO
with StringIO(s_in) as fin:
for line in fin:
if line.startswith('03100'):
print(line[:-1] + '|' + next(fin), end='')
else:
print(line, end='')
results in requested
01000|0107160
02000|1446
03100|01|316,00|03110|||316,00|0|0|7|
03100|29|135,00|03110|||135,00|0|0|0|
99999|83
00000|00350235201512001|01071603100090489
For those who like sed, this is a very short solution (not that efficient, though, as it reads all lines before printing anything):
< input_file sed '$!N;s/\n03110/03110/g'
The following sed script is a more efficient solution:
#!/usr/bin/sed -f
:h
N
s/\n03110/|03110/
t h
h
s/\n.*//
p
g
D
For the casual reader who really likes sed like I do, here's a short explanation:
the 4 lines from :h to t h are essentially a "do-while" loop in which we append a new line to the pattern space (N), and we keep doing so (t h is a "goto"), as long as the substitution command (s) is successful in changing the embedded newline \n to a |;
as soon as the s command is unsuccessful, we "save" the multiline pattern space copying it into the hold space (h), safely delete the \n and whatever is after it (s/\n.*//), and finally print the what remains (p), which is the lines that we've been successfully joining;
it's now time to get back the last line we appended which did not start by 03110: we get (g) the multiline back from the hold space, delete \n together with whatever precedes it and go to the top without printing (D).
we are back to the top of the script with a line which is not printed yet, just like we started.

Python Regex pattern matching with a variable name not working

The below code does not return True for the match. I am wondering why? Any help is appreciated.
Note:
id_list = ['YYY-100', 'YYYMM1640ASS20', 'Cruzer', 'SSDSC2BA20', 'BBBPEDMD40']
'drives.txt' contains lines like this (and does contain above IDs in some lines).
'RED SSDSC2BA200G4R 200 GB 2.5 SATA 6G Class E: 30,000-100,000 writes per second'
So I would assume that id 'SSDSC2BA20' will match the second word in above line, but below match does not return True.
For double-checking, I tried 'if match: print match.group()' but that returns nothing as well. What am I missing?
import re
with open('drives.txt', 'r') as fr:
for id in id_list:
for line in fr:
match = re.search(r'%s' % id, line, re.I)
if match:
print 'True'
Note that instead of above regex, I tried the below also, but that did not work either.
my_regex = r".?" + re.escape(id) + r".?"
match = re.search(my_regex, line, re.I)
fr is a file pointer. With your current approach, you're iterating over the lines multiple times, once for each regex. Don't do this. Everytime you read a line, you advance the file pointer till it points to the end of the file. This happens on the first iteration itself, so forthcoming iterations will have you read empty strings from the file.
One fix for this is to do fr.seek(0, 0) after each inner loop, which I don't recommend. The other fix is to reorder your loops. Iterate over your file once. Here's how you do that:
with open('drives.txt', 'r') as fr:
for line in fr:
for id in id_list:
match = re.search(r'%s' % id, line, re.I)
if match:
print id, 'matches for line:', line
Also, I should mention that using id as a variable name shadows the builtin id() function, so I recommend you change it.

Python Search File For Specific Word And Find Exact Match And Print Line

I wrote a script to print the lines containing a specific word from a bible txt file.The problem is i couldn't get the exact word with the line instead it prints all variations of the word.
For eg. if i search for "am" it prints sentences with words containing "lame","name" etc.
Instead i want it to print only the sentences with "am" only
i.e, "I am your saviour", "Here I am" etc
Here is the code i use:
import re
text = raw_input("enter text to be searched:")
shakes = open("bible.txt", "r")
for line in shakes:
if re.match('(.+)' +text+ '(.+)', line):
print line
This is another approach to take to complete your task, it may be helpful although it doesn't follow your current approach very much.
The test.txt file I fed as input had four sentences:
This is a special cat. And this is a special dog. That's an average cat. But better than that loud dog.
When you run the program, include the text file. In command line, that'd look something like:
python file.py test.txt
This is the accompanying file.py:
import fileinput
key = raw_input("Please enter the word you with to search for: ")
#print "You've selected: ", key, " as you're key-word."
with open('test.txt') as f:
content = str(f.readlines())
#print "This is the CONTENT", content
list_of_sentences = content.split(".")
for sentence in list_of_sentences:
words = sentence.split(" ")
for word in words:
if word == key:
print sentence
For the keyword "cat", this returns:
That is a special cat
That's an average cat
(note the periods are no longer there).
I think if you, in the strings outside text, put spaces like this:
'(.+) ' + text + ' (.+)'
That would do the trick, if I correctly understand what is going on in the code.
re.findall may be useful in this case:
print re.findall(r"([^.]*?" + text + "[^.]*\.)", shakes.read())
Or even without regex:
print [sentence + '.' for sentence in shakes.split('.') if text in sentence]
reading this text file:
I am your saviour. Here I am. Another sentence.
Second line.
Last line. One more sentence. I am done.
both give same results:
['I am your saviour.', ' Here I am.', ' I am done.']

Python - using raw_input() to search a text document

I am trying to write a simple script that a user can enter what he/she wants to search in a specified txt file. If the word they searching is found then print it to a new text file. This is what I got so far.
import re
import os
os.chdir("C:\Python 2016 Training")
patterns = open("rtr.txt", "r")
what_directory_am_i_in = os.getcwd()
print what_directory_am_i_in
search = raw_input("What you looking for? ")
for line in patterns:
re.findall("(.*)search(.*)", line)
fo = open("test", "wb")
fo.write(line)
fo.close
This successfully creates a file called test, but the output is nothing close to what word was entered into the search variable.
Any advice appreciated.
First of all, you have not read a file
patterns = open("rtr.txt", "r")
this is a file object and not the content of file, to read the file contents you need to use
patterns.readlines()
secondly, re.findall returns a list of matched strings, so you would want to store that. You regex is also not correct as pointed by Hani, It should be
matched = re.findall("(.*)" + search + "(.*)", line)
rather it should be :
if you want the complete line
matched = re.findall(".*" + search + ".*", line)
or simply
matched = line if search in line else None
Thirdly, you don't need to keep opening your output file in the for loop. You are overwriting your file everytime in the loop so it will capture only the last result. Also remember to call the close method on the files.
Hope this helps
you are searching here for all lines that has "search" word in it
you need to get the lines that has the text you entered in the shell
so change this line
re.findall("(.*)search(.*)", line)
to
re.findall("(.*)"+search+"(.*)", line)

How do I print specific lines of a file in python?

I'm trying to print everything in a file with python. But, whenever I use python's built-in readfile() function it only print the first line of my text file. Here's my code:
File = open("test.txt", 'r', 0)
line = File.readline()[:]
print line
and thank you for everyone that answers
and to make my question clearer every time I run the code it prints only "word list food
Is this what you are looking for?
printline = 6
lineCounter = 0
with open('anyTxtFile.txt','r') as f:
for line in f:
lineCounter += 1
if lineCounter == printline:
print(line, end='')
Opens text file, in working directory, and prints printLine
File.readlines()
will, as emre. said, return a list of all the lines in your file. If you'd like to produce a similar result using the readline() command,
s=File.readline()
while s!="":
print s
s=File.readline()
Both methods above leave a newline at the end of each string, except for the last string.
Another alternative would be:
for s in File:
print s
To search for a specific string, or a specific line number, I'd say the first method is best. Looking for a specific line number would be as simple as:
File.readlines()[i]
Where i is the line number you are interested in accessing. Looking for a string is a bit more work, but looping through the list would not be too challenging. Something like:
L=File.readlines()
s="yourStringHere"
i=0
while i<len(L):
if L[i].find(s)!=-1:
break
i+=1
print i
would give you the line number that contained the string you were looking for.
Make it more pythonic.
print_line = 6
with open('input_txt_file.txt', 'r') as f:
for i, line in enumerate(f):
if i == print_line:
print(line, end='')
break