How do I deal with output files when using argparse? - python-2.7

I have a program and I want to make it so that if the user specifies the exact name of the 2 output files given, then those files will be named at the user's preference.
For ex:
-o file1.txt file2.txt
If the output files aren't specified then the script will automatically generate the files with default names.

You can use the argparse module, which provides clean ways of handling input arguments. For your task, you can use something like
arguments = argparse.ArgumentParser()
arguments.add_argument('fileNames', nargs='*', help='Output file names', default = ['val1', 'val2'])
inputArgs = arguments.parse_args()
# User may decide to give just one file name
outFileName1 = inputArgs.fileNames[0]
outFileName2 = 'val2' if len(inputArgs.fileNames) == 1 else inputArgs.fileNames[1]

Related

How do I list files and directories those are not hidden in current directory using crystal language?

I wrote my own minimal version of "ls" command (Linux) using crystal language and here is my code:
require "dir"
require "file"
def main()
pwd = Dir.current
list_dir = Dir.children(pwd)
puts("[+] location: #{pwd}")
puts("------------------------------------------")
list_dir.each do |line|
check = File.file?(line)
if check == true
puts("[+] file : #{line}")
elsif check == false
puts("[+] directory: #{line}")
else
puts("[+] unknown : #{line}")
end
end
end
main
It works but it also listing all hidden files and directories (.files & .directories) too and I do not want to show those. I want the result more like "ls -l" command's result not like "ls -la".
So, what I need to implement to stop showing hidden files and directories?
There is nothing special about "hidden" files. It's just a convention to hide file names starting with a dot in some contexts by default. Dir.children does not follow that convention and expects the user to apply approriate filtering.
The recommended way to check if a file name starts with a dot is file_name.starts_with?(".").

Django not recognizing files deleted/added

I have the following function that gives me the list of files(complete path) in a given list of directories:
from os import walk
from os.path import join
# Returns a list of all the files in the list of directories passed
def get_files(directories = get_template_directories()):
files = []
for directory in directories:
for dir, dirnames, filenames in walk(directory):
for filename in filenames:
file_name = join(dir, filename)
files.append(file_name)
return files
I'am adding some files to the template directories in Django. But this function always return the same list of files even though some are added/deleted in the run time. These changes are reflected only when I do a server restart. Is that because of some caching that os.walk() performs or is it required that we need to restart the server after adding/removing some files ?
It is not django problem, your behaviour is result of python interpreter specific:
Default arguments may be provided as plain values or as the result of a function call, but this latter technique need a very big warning. Default values evaluated once at start application and never else.
I' m sure this code will solve your problem:
def get_files(directories = None):
if not directories:
directories = get_template_directories()
files = []
for directory in directories:
for dir, dirnames, filenames in walk(directory):
for filename in filenames:
file_name = join(dir, filename)
files.append(file_name)
return files
You can find same questions on Stackoverflow Default Values for function parameters in Python

The glob.glob function to extract data from files

I am trying to run the script below. The intention of the script is to open different fasta files one after the other, and extract the geneID. The script works well if I don't use the glob.glob function. I get this message TypeError: coercing to Unicode: need string or buffer, list found
files='/home/pathtofiles/files'
#print files
#sys.exit()
for file in files:
fastas=sorted(glob.glob(files + '/*.fasta'))
#print fastas[0]
output_handle=(open(fastas, 'r+'))
genes_files=list(SeqIO.parse(output_handle, 'fasta'))
geneID=genes_files[0].id
print geneID
I am running of ideas on how to direct the script to open when file after another to give me the require information.
I see what you are trying to do, but let me first explain why your current approach is not working.
You have a path to a directory with fasta files and you want to loop over the files in that directory. But observe what happens if we do:
>>> files='/home/pathtofiles/files'
>>> for file in files:
>>> print file
/
h
o
m
e
/
p
a
t
h
t
o
f
i
l
e
s
/
f
i
l
e
s
Not the list of filenames you expected! files is a string and when you apply a for loop on a string you simply iterate over the characters in that string.
Also, as doctorlove correctly observed, in your code fastas is a list and open expects a path to a file as first argument. That's why you get the TypeError: ... need string, ... list found.
As an aside (and this is more a problem on Windows then on Linux or Mac), but it is good practice to always use raw string literals (prefix the string with an r) when working with pathnames to prevent the unwanted expansion of backslash escaped sequences like \n and \t to newline and tab.
>>> path = 'C:\Users\norah\temp'
>>> print path
C:\Users
orah emp
>>> path = r'C:\Users\norah\temp'
>>> print path
C:\Users\norah\temp
Another good practice is to use os.path.join() when combining pathnames and filenames. This prevents subtle bugs where your script works on your machine bug gives an error on the machine of your colleague who has a different operating system.
I would also recommend using the with statement when opening files. This assures that the filehandle gets properly closed when you're done with it.
As a final remark, file is a built-in function in Python and it is bad practice to use a variable with the same name as a built-in function because that can cause bugs or confusion later on.
Combing all of the above, I would rewrite your code like this:
import os
import glob
from Bio import SeqIO
path = r'/home/pathtofiles/files'
pattern = os.path.join(path, '*.fasta')
for fasta_path in sorted(glob.glob(pattern)):
print fasta_path
with open(fasta_path, 'r+') as output_handle:
genes_records = SeqIO.parse(output_handle, 'fasta')
for gene_record in genes_records:
print gene_record.id
This is way I solved the problem, and this script works.
import os,sys
import glob
from Bio import SeqIO
def extracting_information_gene_id():
#to extract geneID information and add the reference gene to each different file
files=sorted(glob.glob('/home/path_to_files/files/*.fasta'))
#print file
#sys.exit()
for file in files:
#print file
output_handle=open(file, 'r+')
ref_genes=list(SeqIO.parse(output_handle, 'fasta'))
geneID=ref_genes[0].id
#print geneID
#sys.exit()
#to extract the geneID as a reference record from the genes_files
query_genes=(SeqIO.index('/home/path_to_file/file.fa', 'fasta'))
#print query_genes[geneID].format('fasta') #check point
#sys.exit()
ref_gene=query_genes[geneID].format('fasta')
#print ref_gene #check point
#sys.exit()
output_handle.write(str(ref_gene))
output_handle.close()
query_genes.close()
extracting_information_gene_id()
print 'Reference gene sequence have been added'

How to get a file to be used as input of the program that ends with special character in python

I have an output file from a code which its name will ends to "_x.txt" and I want to connect two codes which second code will use this file as an input and will add more data into it. Finally, it will ends into "blabla_x_f.txt"
I am trying to work it out as below, but seems it is not correct and I could not solve it. Please help:
inf = str(raw_input(*+"_x.txt"))
with open(inf+'_x.txt') as fin, open(inf+'_x_f.txt','w') as fout:
....(other operations)
The main problem is that the "blabla" part of the file could change to any thing every time and will be random strings, so the code needs to be flexible and just search for whatever ends with "_x.txt".
Have a look at Python's glob module:
import glob
files = glob.glob('*_x.txt')
gives you a list of all files ending in _x.txt. Continue with
for path in files:
newpath = path[:-4] + '_f.txt'
with open(path) as in:
with open(newpath, 'w') as out:
# do something

Multiple Command Line Arguments in Python

In my python script, I am reading one text file. For that file, I am giving path to command line in UNIX as follows:
python My_script.py --d /fruit/apple/data1.txt
I am going to read one more file in same script. So I just wanted to know how to pass 2 arguments to get path to 2 files.
I have following code which is working perfectly for one argument.
parser=argparse.ArgumentParser()
parser.add_argument('--d', '--directory', required=True, action='store', dest='directory', default=False, help="provide directory name")
args=parser.parse_args()
file_apple=args.directory
A=open(file_apple)
file1=A.read()
so in my unix command line I write following and script runs successfully
python My_script.py --d /fruit/apple/data1.txt
Goal is to provide second argument as follows and want to read that file as the first one.
python My_script.py --d /fruit/apple/data1.txt --d /fruit/orange/data2.txt
I will appreciate your help on this.
You can make use of nargs.
parser=argparse.ArgumentParser()
parser.add_argument('-d', '--directory', nargs='+' required=True, action='store', dest='directory', default=False, help="provide directory name")
args=parser.parse_args()
file_apple=args.directory
print file_apple
...
I have given nargs value as + which means 1 or many arguments for that command. So, you have to give at least one file path argument.
If you are sure that you are going to have only two or some fixed number always, then you can specify that also like nargs = 3
Now file_apple will be a variable containing list of paths you passed.
$ python My_script.py -d /fruit/apple/data1.txt
['/fruit/apple/data1.txt']
and:
$ python My_script.py -d /fruit/apple/data1.txt /fruit/orange/data2.txt
['/fruit/apple/data1.txt', '/fruit/orange/data2.txt']
PS: conventionally single dash is used for single character flags and doubledash for multi characters. like -d or --directory