How to get a file to be used as input of the program that ends with special character in python - python-2.7

I have an output file from a code which its name will ends to "_x.txt" and I want to connect two codes which second code will use this file as an input and will add more data into it. Finally, it will ends into "blabla_x_f.txt"
I am trying to work it out as below, but seems it is not correct and I could not solve it. Please help:
inf = str(raw_input(*+"_x.txt"))
with open(inf+'_x.txt') as fin, open(inf+'_x_f.txt','w') as fout:
....(other operations)
The main problem is that the "blabla" part of the file could change to any thing every time and will be random strings, so the code needs to be flexible and just search for whatever ends with "_x.txt".

Have a look at Python's glob module:
import glob
files = glob.glob('*_x.txt')
gives you a list of all files ending in _x.txt. Continue with
for path in files:
newpath = path[:-4] + '_f.txt'
with open(path) as in:
with open(newpath, 'w') as out:
# do something

Related

The glob.glob function to extract data from files

I am trying to run the script below. The intention of the script is to open different fasta files one after the other, and extract the geneID. The script works well if I don't use the glob.glob function. I get this message TypeError: coercing to Unicode: need string or buffer, list found
files='/home/pathtofiles/files'
#print files
#sys.exit()
for file in files:
fastas=sorted(glob.glob(files + '/*.fasta'))
#print fastas[0]
output_handle=(open(fastas, 'r+'))
genes_files=list(SeqIO.parse(output_handle, 'fasta'))
geneID=genes_files[0].id
print geneID
I am running of ideas on how to direct the script to open when file after another to give me the require information.
I see what you are trying to do, but let me first explain why your current approach is not working.
You have a path to a directory with fasta files and you want to loop over the files in that directory. But observe what happens if we do:
>>> files='/home/pathtofiles/files'
>>> for file in files:
>>> print file
/
h
o
m
e
/
p
a
t
h
t
o
f
i
l
e
s
/
f
i
l
e
s
Not the list of filenames you expected! files is a string and when you apply a for loop on a string you simply iterate over the characters in that string.
Also, as doctorlove correctly observed, in your code fastas is a list and open expects a path to a file as first argument. That's why you get the TypeError: ... need string, ... list found.
As an aside (and this is more a problem on Windows then on Linux or Mac), but it is good practice to always use raw string literals (prefix the string with an r) when working with pathnames to prevent the unwanted expansion of backslash escaped sequences like \n and \t to newline and tab.
>>> path = 'C:\Users\norah\temp'
>>> print path
C:\Users
orah emp
>>> path = r'C:\Users\norah\temp'
>>> print path
C:\Users\norah\temp
Another good practice is to use os.path.join() when combining pathnames and filenames. This prevents subtle bugs where your script works on your machine bug gives an error on the machine of your colleague who has a different operating system.
I would also recommend using the with statement when opening files. This assures that the filehandle gets properly closed when you're done with it.
As a final remark, file is a built-in function in Python and it is bad practice to use a variable with the same name as a built-in function because that can cause bugs or confusion later on.
Combing all of the above, I would rewrite your code like this:
import os
import glob
from Bio import SeqIO
path = r'/home/pathtofiles/files'
pattern = os.path.join(path, '*.fasta')
for fasta_path in sorted(glob.glob(pattern)):
print fasta_path
with open(fasta_path, 'r+') as output_handle:
genes_records = SeqIO.parse(output_handle, 'fasta')
for gene_record in genes_records:
print gene_record.id
This is way I solved the problem, and this script works.
import os,sys
import glob
from Bio import SeqIO
def extracting_information_gene_id():
#to extract geneID information and add the reference gene to each different file
files=sorted(glob.glob('/home/path_to_files/files/*.fasta'))
#print file
#sys.exit()
for file in files:
#print file
output_handle=open(file, 'r+')
ref_genes=list(SeqIO.parse(output_handle, 'fasta'))
geneID=ref_genes[0].id
#print geneID
#sys.exit()
#to extract the geneID as a reference record from the genes_files
query_genes=(SeqIO.index('/home/path_to_file/file.fa', 'fasta'))
#print query_genes[geneID].format('fasta') #check point
#sys.exit()
ref_gene=query_genes[geneID].format('fasta')
#print ref_gene #check point
#sys.exit()
output_handle.write(str(ref_gene))
output_handle.close()
query_genes.close()
extracting_information_gene_id()
print 'Reference gene sequence have been added'

How to read through multiple files in a folder searching for a word + python 2.7

I'm building a little program that reads every line in a log file and if it finds a match it prints that line. The problem is, I have about 20 different log files and they all in the same folder. Is there a way I can parse through every single log file in a folder and print out the line that matches the searched word? Below is an example of what I have so far, but it prints nothing. The script needs to be able to incorporate readlines() and split()
What I have below doesn't work, but this is what I would expect it to look like. Any advice welcome.
def Preview():
path = ('C:Users/kev/Desktop/test/*.log')
files = glob.glob(path)
files.readlines()
for line in files:
if "test_word" in line:
print line
Preview()
This is how your code should look:
def Preview():
path = ('C:Users/kev/Desktop/test/*.log')
files = glob.glob(path)
for f in files:
f = open(f)
f = f.readlines():
for line in f:
if "test_word" in line:
print line
f.close()
Preview()

Python: only run command once in for loop

I have a for loop which creates a CSV of values of several files in a directory.
Within this loop I only want to create the file and write in the header once, currently I am doing this:
#name&path to table file
test = tablefile+"/"+str(cell[:-10])+"_Table.csv"
#write file
if not os.path.isfile(test):
csv.writer(open(test, "wt"))
with open(test, 'w') as output:
wr = csv.writer(output, lineterminator=',')
for val in header_note:
wr.writerow([val])
and to append data I have:
with open(test, 'a') as output:
wr = csv.writer(output, lineterminator=',')
for val in table_all:
wr.writerow([val])
Which works well, however, when I run the script over again another time it will append more data to the bottom of that same .csv. What I want is for the first time through the for-loop, is to just overwrite any existing .csv with a new one with a header then continue on appending data, and overwrite/re-write header once the script is run again. Thanks!
It look like you may have some code problems other than file handling, but here goes: You problem is basically that opening a file in 'w' mode will overwrite everything in the file, and opening in 'a' mode will not allow you to change the header line.
To get around this, you will have to get the contents of the file (if it already exists), then overwrite the file, including those lines that where there to begin with.
You will want something along the lines of:
if os.path.exists(file_name): # if file already exists
with open(file_name, 'r') as in_file: # open it
old_lines = in_file.readlines()[1:] # read all lines from file EXCEPT header line
with open(file_name, 'w') as out_file: # open file again, with 'w' to create/overwrite
out_file.write(new_header_line) # write new header line to file
for line in old_lines:
out_file.write(line) # write all preexisting lines back into file
# continue writing whatever you want.

Python: Returning a filename for matching a specific condition

import sys, hashlib
import os
inputFile = 'C:\Users\User\Desktop\hashes.txt'
sourceDir = 'C:\Users\User\Desktop\Test Directory'
hashMatch = False
for root, dirs, files in os.walk(sourceDir):
for filename in files:
sourceDirHashes = hashlib.md5(filename)
for digest in inputFile:
if sourceDirHashes.hexdigest() == digest:
hashMatch = True
break
if hashMatch:
print str(filename)
else:
print 'hash not found'
Contents of inputFile =
2899ebdb5f7a90a216e97b3187851fc1
54c177418615a90a6424cb945f7a6aec
dd18bf3a8e0a2a3e53e2661c7fb53534
Contents of sourceDir files =
test
test 1
test 2
I almost have the code working, I'm just tripping up somewhere. My current code that I have posted always returns the else statement, that the hash hasn't been found, even although they do as I have verified this. I have provided the content of my sourceDir so that someone case try this, the file names are test, test 1 and test 2, the same content is in the files.
I must add however, I am not looking for the script to print the actual file content, but rather the name of the file.
Could anyone suggest to where I am going wrong and why it is saying the condition is false?
You need to open the inputFile using open(inputFile, 'rt') then you can read the hashes. Also when you do read the hashes make sure you strip them first to get rid of new line characters \n at the end of the lines

Python: copy line, conditional criteria

I have been searching for following Python solution to copy selectively lines from 1 txt file to another. I can copy the whole file, but with only a few lines I get an error.
My code:
f = open(from_file, "r")
g = open(to_file, "w")
#copy = open(to_file, "w") # this instruction copies whole file
rowcond2 = 'xxxx' # look for this string sequence in every line
for line in f:
if rowcond2 in f:
copy.write(line,"w") in g # write every corresponding line to destination
f.close()
# copy.close() # code receive error to close destination
g.close()
So without the rowcond2, I can copy the whole file. Yet with the condition nothing is written to destination file.
Thank you for your help.
Why not to put your condition inside the for loop?
for line in f:
if condition:
copy.write(line)
I have been able to solve this case searching on SO:
Using python to write specific lines from one file to another file
#Lukas Graf: thank you for your detailed step wise explanation.