Reading specific characters from a file in Python - python-2.7

Suppose I want to read file in this format:
2
300 234 2 3
23444
If I use readline() it iterates over the entire line. What I want is for it to read only the numbers nothing else. How should I do this??

You can use re module.
import re
numbers = re.findall('[0-9]+', readline())
It will return all numbers as a list.

Use readline() to get the entire line as a string, then split the string using split(), which will return a list of strings (in your case, numbers) in the line.
Example:
line = yourFile.readline()
numList = line.split()
Now numList contains the numbers that were on that line.
Source: https://docs.python.org/2/library/stdtypes.html#str.split

Related

Extrude Acc (Gene ID or accession number) from a fasta file

What does ".gb\\|(.*)\\|.*","\\1 in the function gsub mean?
If you have a single FASTA sequence in the file you can solve the problem by reading the first line of the file and then split it by the pipe character |.
If you have multiple sequences then you can read the first character for each line and look for the > character.
Here is a code example in Python. If you need another ID then you can change the index.
with open('AE004437.faa') as fh:
header_line = fh.readline()
ids = header_line.split('|')
gene_ids = ids[3]

Python word search in a text file

I have a text file in which I need to search for specific 3 words using Python. For example the words are account, online and offer and I need the count of how many times it appears in the system.
with open('fixtures/file1.csv') as f:
print len(filter(
lambda line: "account" in line or "online" in line or "offer" in line,
f.readlines()
))
You can also check directly if the words are in the each line.
Update
To count how many times does each word appear in a file, the most effective way I find is to iterate once over the file and check how many times each word is found in the line. For that, try the following:
keys = ('account', 'online', 'offer')
with open('fixtures/file1.csv') as f:
found = dict((k, 0) for k in keys)
for line in f.readlines():
for k in keys:
found[k] += 1 if k in line else 0
found will then be a dictionary with what you are looking for.
Hope this helps!
I am assuming it is a plain text document. In that case you would open('file.txt') as f and then get every [for] line in f and check if 'word' in f.lower() and then incrament a counter accordingly (say wordxtotal += 1)

Change values in Python file (tab-delimited list)

I have read a *.INP file into Python. Here is the code I used:
import csv
r = csv.reader(open('T_JAC.INP')) # Here your csv file
lines = [l for l in r]
print lines[23]
print lines[26]
The first print statement produces ['9E21\t\texthere (text) text alphabets text alphanumeric'].
The second print statement produces ['4E15\t\texthere (text) text alphabets text alphanumeric'].
I need to change the numbers 7E21 and 4E15. I need to change them to values from a list fil_replace = [9E21,6E15].i.e. I need to replace 7E21 to 9E21 and I need to change 4E21 to 6E21.
Is there a way to replace these numbers?
Something with str.replace should work (as long as you read r in as a string), albeit not the most efficient solution:
r.replace('7E21', '9E21')
file = open('YAC.IN', 'w')
file.write(r)
file.close()
If you're looking for a way to just replace the values 'in place' in the file unfortunately it's not possible. The entire file has to be read in, modified, then re-written.

Applying a regular expression to a text file Python 3

#returns same result i.e. only the first line as many times as 'draws'
infile = open("results_from_url.txt",'r')
file =infile.read() # essential to get correct formatting
for line in islice(file, 0, draws): # allows you to limit number of draws
for line in re.split(r"Wins",file)[1].split('\n'):
mains.append(line[23:38]) # slices first five numbers from line
stars.append(line[39:44]) # slices last two numbers from line
infile.close()
I am trying to use the above code to iterate through a list of numbers to extract the bits of interest. In this attempt to learn how to use regular expressions in Python 3, I am using lottery results opened from the internet. All this does is to read one line and return it as many times as I instruct in the value of 'draws'. Could someone tell me what I have done incorrectly, please. Does re 'terminate' somehow? The strange thing is if I copy the file into a string and run this routine, it works. I am at a loss - problem 'reading' a file or in my use of the regular expression?
I can't tell you why your code doesn't work, because I cannot reproduce the result you're getting. I'm also not sure what the purpose of
for line in islice(file, 0, draws):
is, because you never use the line variable after that, you immediately overwrite it with
for line in re.split(r"Wins",file)[1].split('\n'):
Plus, you could have used file.split('Wins') instead of re.split(r"Wins",file), so you aren't really using regex at all.
Regex is a tool to find data of a certain format. Why do you use it to split the input text, when you could use it to find the data you're looking for?
What is it you're looking for? A sequence of seven numbers, separated by commas. Translated into regex:
(?:\d+,){7}
However, we want to group the first 5 numbers - the "mains" - and the last 2 numbers - the "stars". So we'll add two named capture groups, named "mains" and "stars":
(?P<mains>(?:\d+,){5})(?P<stars>(?:\d+,){2})
This pattern will find all numbers you're looking for.
import re
data= open("infile.txt",'r').read()
mains= []
stars= []
pattern= r'(?P<mains>(?:\d+,){5})(?P<stars>(?:\d+,){2})'
iterator= re.finditer(pattern, data)
for count in range(int(input('Enter number of draws to examine: '))):
try:
match= next(iterator)
except StopIteration:
print('no more matches')
break
mains.append(match.group('mains'))
stars.append(match.group('stars'))
print(mains,stars)
This will print something like ['01,03,31,42,46,'] ['04,11,']. You may want to remove the commas and convert the numbers to ints, but in essence, this is how you would use regex.

How do I print specific lines of a file in python?

I'm trying to print everything in a file with python. But, whenever I use python's built-in readfile() function it only print the first line of my text file. Here's my code:
File = open("test.txt", 'r', 0)
line = File.readline()[:]
print line
and thank you for everyone that answers
and to make my question clearer every time I run the code it prints only "word list food
Is this what you are looking for?
printline = 6
lineCounter = 0
with open('anyTxtFile.txt','r') as f:
for line in f:
lineCounter += 1
if lineCounter == printline:
print(line, end='')
Opens text file, in working directory, and prints printLine
File.readlines()
will, as emre. said, return a list of all the lines in your file. If you'd like to produce a similar result using the readline() command,
s=File.readline()
while s!="":
print s
s=File.readline()
Both methods above leave a newline at the end of each string, except for the last string.
Another alternative would be:
for s in File:
print s
To search for a specific string, or a specific line number, I'd say the first method is best. Looking for a specific line number would be as simple as:
File.readlines()[i]
Where i is the line number you are interested in accessing. Looking for a string is a bit more work, but looping through the list would not be too challenging. Something like:
L=File.readlines()
s="yourStringHere"
i=0
while i<len(L):
if L[i].find(s)!=-1:
break
i+=1
print i
would give you the line number that contained the string you were looking for.
Make it more pythonic.
print_line = 6
with open('input_txt_file.txt', 'r') as f:
for i, line in enumerate(f):
if i == print_line:
print(line, end='')
break