Python word search in a text file - python-2.7

I have a text file in which I need to search for specific 3 words using Python. For example the words are account, online and offer and I need the count of how many times it appears in the system.

with open('fixtures/file1.csv') as f:
print len(filter(
lambda line: "account" in line or "online" in line or "offer" in line,
f.readlines()
))
You can also check directly if the words are in the each line.
Update
To count how many times does each word appear in a file, the most effective way I find is to iterate once over the file and check how many times each word is found in the line. For that, try the following:
keys = ('account', 'online', 'offer')
with open('fixtures/file1.csv') as f:
found = dict((k, 0) for k in keys)
for line in f.readlines():
for k in keys:
found[k] += 1 if k in line else 0
found will then be a dictionary with what you are looking for.
Hope this helps!

I am assuming it is a plain text document. In that case you would open('file.txt') as f and then get every [for] line in f and check if 'word' in f.lower() and then incrament a counter accordingly (say wordxtotal += 1)

Related

Extrude Acc (Gene ID or accession number) from a fasta file

What does ".gb\\|(.*)\\|.*","\\1 in the function gsub mean?
If you have a single FASTA sequence in the file you can solve the problem by reading the first line of the file and then split it by the pipe character |.
If you have multiple sequences then you can read the first character for each line and look for the > character.
Here is a code example in Python. If you need another ID then you can change the index.
with open('AE004437.faa') as fh:
header_line = fh.readline()
ids = header_line.split('|')
gene_ids = ids[3]

Know the number of times this word appears: PYTHON 2.7

I have a file.txt with about 1,000 lines that look like this:
--- Adding sections to FwLogger: [],2020-01-13 16:09:18,2020-01-13 16:09:22
--- Clearing all sections from FwLogger,2020-01-13 16:09:17,2020-01-13 16:09:22
--- (1/0) The value was discarded due to being too separated from previous value
--- (1/0) ContinueBoot#b7630fd Rebooting device due to capabilities request freeze
And I would need to know how many times the word "FwLogger" appears ( in number ).
There are definitely more elegant ways to do it, but in my version you replace the delimiters manually:
with open('test.txt') as file:
for line in (line.strip() for line in file):
#here you replace all possible delimiters in your file with a space to split afterwards according to the spaces
c=line.replace(","," ").replace(";"," ").replace("#"," ").replace(":"," ")
for word in c.split(" "):
if word == "FwLogger":
# print(line)
counter= counter+1
print(counter)
read in your txt file and use the string find method like below
loop
istart = str.find(sub, istart)
I= I + 1.
end loop
I start is the position where the string you're looking for was last found. before starting your loop assign istart = 1
each time one is found increment a counter
i.e. I= I + 1

Regex in python trouble

I have a text file that I would like to search through it to see how many of a certain word is in it. I'm getting the wrong count for the words.
File is here
code:
import re
with open('SysLog.txt', 'rt') as myfile:
for line in myfile:
m = re.search('guest', line, re.M|re.I)
if m is not None:
m.group(0)
print( "Found it.")
print('Found',len(m.group()), m.group(),'s')
break
for line in myfile:
n = re.search('Worm', line)
if n is not None:
n.group(0)
print("\n\tNext Match.")
print('Found', len(n.group()), n.group(), 's')
break
for line in myfile:
o = re.search('anonymous', line)
if o is not None:
o.group(0)
print("\n\tNext Match.")
print('Found', len(o.group()), o.group(), 's')
break
There is no need to use a regex, you can use str.count() to make the process much more simple:
with open('SysLog.txt', 'rt') as myfile:
text = myfile.read()
for word in ('guest', 'Worm', 'anonymous'):
print("\n\tNext Match.")
print('Found', text.count(word), word, 's')
To test this, I downloaded the file and ran the code above, and got the output:
Next Match.
Found 4 guest s
Next Match.
Found 91 Worm s
Next Match.
Found 18 anonymous s
which is correct if you do a find on the document in a text editor!
*As a sidenote, I'm not sure why you want to print a tab (\t) before 'Next Match' each time as it just looks weird in the output but it doesn't matter :)
There are multiple problems with your code:
re.search will only give you the first match, if any; this does not have to be a problem, though, as it seems like the word is only expected to appear once per line; otherwise, use re.findall
the line n.group(0) does not do anything without an assignment
len(n.group()) does not give you the number of matches, but the length of the matched string
you break after the first line in the file
myfile is an iterator, so once the first for line in myfile loop has finished, the other two won't have any lines left to loop (it will never finish because of the break anyway, though)
as already noted, you do not need regular expression at all
One (among many) possible ways of doing this would be this (not tested):
counts = {"worm": 0, "guest": 0, "anonymous": 0}
for line in myfile:
for word in counts:
if word in line:
counts[word] += 1

Python - using raw_input() to search a text document

I am trying to write a simple script that a user can enter what he/she wants to search in a specified txt file. If the word they searching is found then print it to a new text file. This is what I got so far.
import re
import os
os.chdir("C:\Python 2016 Training")
patterns = open("rtr.txt", "r")
what_directory_am_i_in = os.getcwd()
print what_directory_am_i_in
search = raw_input("What you looking for? ")
for line in patterns:
re.findall("(.*)search(.*)", line)
fo = open("test", "wb")
fo.write(line)
fo.close
This successfully creates a file called test, but the output is nothing close to what word was entered into the search variable.
Any advice appreciated.
First of all, you have not read a file
patterns = open("rtr.txt", "r")
this is a file object and not the content of file, to read the file contents you need to use
patterns.readlines()
secondly, re.findall returns a list of matched strings, so you would want to store that. You regex is also not correct as pointed by Hani, It should be
matched = re.findall("(.*)" + search + "(.*)", line)
rather it should be :
if you want the complete line
matched = re.findall(".*" + search + ".*", line)
or simply
matched = line if search in line else None
Thirdly, you don't need to keep opening your output file in the for loop. You are overwriting your file everytime in the loop so it will capture only the last result. Also remember to call the close method on the files.
Hope this helps
you are searching here for all lines that has "search" word in it
you need to get the lines that has the text you entered in the shell
so change this line
re.findall("(.*)search(.*)", line)
to
re.findall("(.*)"+search+"(.*)", line)

How do I print specific lines of a file in python?

I'm trying to print everything in a file with python. But, whenever I use python's built-in readfile() function it only print the first line of my text file. Here's my code:
File = open("test.txt", 'r', 0)
line = File.readline()[:]
print line
and thank you for everyone that answers
and to make my question clearer every time I run the code it prints only "word list food
Is this what you are looking for?
printline = 6
lineCounter = 0
with open('anyTxtFile.txt','r') as f:
for line in f:
lineCounter += 1
if lineCounter == printline:
print(line, end='')
Opens text file, in working directory, and prints printLine
File.readlines()
will, as emre. said, return a list of all the lines in your file. If you'd like to produce a similar result using the readline() command,
s=File.readline()
while s!="":
print s
s=File.readline()
Both methods above leave a newline at the end of each string, except for the last string.
Another alternative would be:
for s in File:
print s
To search for a specific string, or a specific line number, I'd say the first method is best. Looking for a specific line number would be as simple as:
File.readlines()[i]
Where i is the line number you are interested in accessing. Looking for a string is a bit more work, but looping through the list would not be too challenging. Something like:
L=File.readlines()
s="yourStringHere"
i=0
while i<len(L):
if L[i].find(s)!=-1:
break
i+=1
print i
would give you the line number that contained the string you were looking for.
Make it more pythonic.
print_line = 6
with open('input_txt_file.txt', 'r') as f:
for i, line in enumerate(f):
if i == print_line:
print(line, end='')
break