extracting data from tow different files to produce a fasta file - python-2.7

I have two different files one is a fasta file, and the other a txt file produced from a dictionary with json.
file_A looks like this;
> {
"gene_1005 ['gene description_B']":2,
"gene_1009 ['gene description_C']":1,
"gene_104 ['gene description_D']":2,
"gene_1046 ['gene description_A']":1,
}
file_B looks like this:
gene_1005 ['gen description_B'] ATGTGGATCCGCCCGTTGCAGGCGGAACTGAGCGATAACACGCTGGCTTTGTATGCGCCAAACCGTTTTGTGCTCGA
gene_2 ['gene description_C'] ATGAAATTTACCGTTGAACGTGAACATTTATTAAAACCGCTGCAACAGGTGAGTGGCCCATTAGGTGGCCGCCCAAC
what I would like to create is a new fasta file only containing those genes that have the value 2 in the file_A. I have tried the code below but I am quite lost. It will print the word[0], that is the name of the gene but it will not print word[1], that should be the number. It sends the error
'out of range'
import json
def readlines():
input_file=open('file_A.txt')
lines=input_file.readlines()
print lines[1]
for line in lines:
words=lines.split(':')
print words[0]
print words[1]
#print line
input_file.close()
readlines()
Could anyone kindly give a hand with this, please?
Thanks

I see people like giving negative without explaining why or giving an suggestion, and that was the suggestion of this post. But as I see that the negative-voter has not bother with a suggestion, I will post the answer to it.
input_file= open('file.fa', 'r')
output_file= open(wanted_genes.fa', 'w')
for line in input_file:
if line[0]=='>':
geneID=line[1:-1]
if geneID in my_dict:
output_file.write(line)
skip=0
else:
skip=1
else:
if not skip:
output_file.write(line)
input_file.close()
output_file.close()

Related

How to concatenate in Python from .txt?

I have a server.txt file where i have 3 names listed down:
server.txt
CFMPAPP1
CFMPAPP2
CFMPAPP3
i am looking to take these names by calling that server.txt file and want the output.txt file as mentioned below.
output.txt
CI_Name like 'CFMPAPP1%' or
CI_Name like 'CFMPAPP2%' or
CI_Name like 'CFMPAPP3%' or
Any Idea how to do this ?
This can be easily done in three lines:
with open('server.txt', 'r') as file:
s = "".join(file.read())
amended_string = "\n".join([ "CI_Name like '{}%' or".format(a) for a in s.split('\n')])
And then you just need to save amended_string to output.txt. I hope that helps.
This solution only keeps one line at a time in memory:
with open('server.txt', 'r') as infile:
with open('output.txt', 'w') as outfile:
for line in infile:
outfile.write("CI_Name like '{}%' or\n".format(line.rstrip()))

How best to display the content of a text file in python

Ok, I am abit of a python beginner. So, forgive me if this question sounds silly.
I have a directory that contains some .txt files as shown in the image below:
The 1.txt file contains :
Lo! I am lost.
I want to write a programme that goes through each file in the shakespeare directory and print out the content of the .txt file. Below is a programme I have written but I am not sure how to print out the content of each file. all it prints is the name of each file but how do I really print out the content of each file.
def readFromCorpus(path):
os.chdir(path)
for fu in glob.glob("*.txt"):
print fu
readFromCorpus('./trainingData/shakespeare')
I am sorry if this is really a silly question. I just need a pointer to what I am doing wrong.
Thanks
Try this:
def readFromCorpus(path):
os.chdir(path)
for fu in glob.glob("*.txt"):
print('\n\n'+fu)
with open(fu,'r') as f:
data = f.readlines()
for line in data:
print(line.replace('\n',''))

Python: Returning a filename for matching a specific condition

import sys, hashlib
import os
inputFile = 'C:\Users\User\Desktop\hashes.txt'
sourceDir = 'C:\Users\User\Desktop\Test Directory'
hashMatch = False
for root, dirs, files in os.walk(sourceDir):
for filename in files:
sourceDirHashes = hashlib.md5(filename)
for digest in inputFile:
if sourceDirHashes.hexdigest() == digest:
hashMatch = True
break
if hashMatch:
print str(filename)
else:
print 'hash not found'
Contents of inputFile =
2899ebdb5f7a90a216e97b3187851fc1
54c177418615a90a6424cb945f7a6aec
dd18bf3a8e0a2a3e53e2661c7fb53534
Contents of sourceDir files =
test
test 1
test 2
I almost have the code working, I'm just tripping up somewhere. My current code that I have posted always returns the else statement, that the hash hasn't been found, even although they do as I have verified this. I have provided the content of my sourceDir so that someone case try this, the file names are test, test 1 and test 2, the same content is in the files.
I must add however, I am not looking for the script to print the actual file content, but rather the name of the file.
Could anyone suggest to where I am going wrong and why it is saying the condition is false?
You need to open the inputFile using open(inputFile, 'rt') then you can read the hashes. Also when you do read the hashes make sure you strip them first to get rid of new line characters \n at the end of the lines

Python read and write in same function

My code is currently taking in a csv file and outputting to text file. The piece of code I have below and am having trouble with is from the csv I am searching for a keyword like issues and every row that has that word I want to output that to a text file. Currently, I have it printing to a JSON file but its all on one line like this
"something,something1,something2,something3,something4,something5,something6,something7\r\n""something,something1,something2,something3,something4,something5,something6,something7\r\n"
But i want it to print out like this:
"something,something1,something2,something3,something4,something5,something6,something7"
"something,something1,something2,something3,something4,something5,something6,something7"
Here is the code I have so far:
def search(self, filename):
with open(filename, 'rb') as searchfile, open("weekly_test.txt", 'w') as text_file:
for line in searchfile:
if 'PBI 43125' in line:
#print (line)
json.dump(line, text_file, sort_keys=True, indent = 4)
So again I just need a little guidance on how to get my json file to be formatted the way I want.
Just replace print line with print >>file, line
def search(self, filename):
with open('test.csv', 'r') as searchfile, open('weekly_test.txt', 'w') as search_results_file:
for line in searchfile:
if 'issue' in line:
print >>search_results_file, line
# At this point, both the files will be closed automatically

django file upload doesn't work: f.read() returns ''

I'm trying to upload and parse json files using django. Everything works great up until the moment I need to parse the json. Then I get this error:
No JSON object could be decoded: line 1 column 0 (char 0)
Here's my code. (I'm following the instructions here, and overwriting the handle_uploaded_file method.)
def handle_uploaded_file(f, collection):
# assert False, [f.name, f.size, f.read()[:50]]
t = f.read()
for j in serializers.deserialize("json", t):
add_item_to_database(j)
The weird thing is that when I uncomment the "assert" line, I get this:
[u'myfile.json', 59478, '']
So it looks like my file is getting uploaded with the right size (I've verified this on the server), but the read command seems to be failing entirely.
Any ideas?
I've seen this before. Your file has length, but reading it doesn't. I'm wondering if it's been read previously... try this:
f.seek(0)
f.read()