Beginner, python - how to read a list from a file - list

I have a Word document that is literally a list of lists, that is 8 pages long. Eg:
[['WTCS','Dec 21'],['THWD','Mar 22']...]
I am using Linux Mint, Python 3.2 and the IDLE interface, plus my own .py programs. I need to read and reference this list frequently and when I stored it inside .py programs it seemed to slow down the window considerably as I was editing code. How can I store this information in a separate file and read it into python? I have it in a .txt file now and tried the following code:
def readlist():
f = open(r'/home/file.txt','r')
info = list(f.read())
return(info)
but I get each character as an element of a list. I also tried info = f.read() but I get a string. Thanks!

You can convert a Python list read from a text file from a text file as a string into a list using the ast module:
>>> import ast
>>> s = "[['WTCS','Dec 21'],['THWD','Mar 22']]"
>>> ast.literal_eval(s)
[['WTCS', 'Dec 21'], ['THWD', 'Mar 22']]

Related

The glob.glob function to extract data from files

I am trying to run the script below. The intention of the script is to open different fasta files one after the other, and extract the geneID. The script works well if I don't use the glob.glob function. I get this message TypeError: coercing to Unicode: need string or buffer, list found
files='/home/pathtofiles/files'
#print files
#sys.exit()
for file in files:
fastas=sorted(glob.glob(files + '/*.fasta'))
#print fastas[0]
output_handle=(open(fastas, 'r+'))
genes_files=list(SeqIO.parse(output_handle, 'fasta'))
geneID=genes_files[0].id
print geneID
I am running of ideas on how to direct the script to open when file after another to give me the require information.
I see what you are trying to do, but let me first explain why your current approach is not working.
You have a path to a directory with fasta files and you want to loop over the files in that directory. But observe what happens if we do:
>>> files='/home/pathtofiles/files'
>>> for file in files:
>>> print file
/
h
o
m
e
/
p
a
t
h
t
o
f
i
l
e
s
/
f
i
l
e
s
Not the list of filenames you expected! files is a string and when you apply a for loop on a string you simply iterate over the characters in that string.
Also, as doctorlove correctly observed, in your code fastas is a list and open expects a path to a file as first argument. That's why you get the TypeError: ... need string, ... list found.
As an aside (and this is more a problem on Windows then on Linux or Mac), but it is good practice to always use raw string literals (prefix the string with an r) when working with pathnames to prevent the unwanted expansion of backslash escaped sequences like \n and \t to newline and tab.
>>> path = 'C:\Users\norah\temp'
>>> print path
C:\Users
orah emp
>>> path = r'C:\Users\norah\temp'
>>> print path
C:\Users\norah\temp
Another good practice is to use os.path.join() when combining pathnames and filenames. This prevents subtle bugs where your script works on your machine bug gives an error on the machine of your colleague who has a different operating system.
I would also recommend using the with statement when opening files. This assures that the filehandle gets properly closed when you're done with it.
As a final remark, file is a built-in function in Python and it is bad practice to use a variable with the same name as a built-in function because that can cause bugs or confusion later on.
Combing all of the above, I would rewrite your code like this:
import os
import glob
from Bio import SeqIO
path = r'/home/pathtofiles/files'
pattern = os.path.join(path, '*.fasta')
for fasta_path in sorted(glob.glob(pattern)):
print fasta_path
with open(fasta_path, 'r+') as output_handle:
genes_records = SeqIO.parse(output_handle, 'fasta')
for gene_record in genes_records:
print gene_record.id
This is way I solved the problem, and this script works.
import os,sys
import glob
from Bio import SeqIO
def extracting_information_gene_id():
#to extract geneID information and add the reference gene to each different file
files=sorted(glob.glob('/home/path_to_files/files/*.fasta'))
#print file
#sys.exit()
for file in files:
#print file
output_handle=open(file, 'r+')
ref_genes=list(SeqIO.parse(output_handle, 'fasta'))
geneID=ref_genes[0].id
#print geneID
#sys.exit()
#to extract the geneID as a reference record from the genes_files
query_genes=(SeqIO.index('/home/path_to_file/file.fa', 'fasta'))
#print query_genes[geneID].format('fasta') #check point
#sys.exit()
ref_gene=query_genes[geneID].format('fasta')
#print ref_gene #check point
#sys.exit()
output_handle.write(str(ref_gene))
output_handle.close()
query_genes.close()
extracting_information_gene_id()
print 'Reference gene sequence have been added'

python3 convert str to bytes-like obj without use encode

I wrote a httpserver to serve html files for python2.7 and python3.5.
def do_GET(self):
...
#if resoure is api
data = json.dumps({'message':['thanks for your answer']})
#if resource is file name
with open(resource, 'rb') as f:
data = f.read()
self.send_response(response)
self.send_header('Access-Control-Allow-Origin', '*')
self.end_headers()
self.wfile.write(data) # this line raise TypeError: a bytes-like object is required, not 'str'
the code works in python2.7, but in python 3, it raised the above the error.
I could use bytearray(data, 'utf-8') to convert str to bytes, but the html is changed in web.
My question:
How to do to support python2 and python3 without use 2to3 tools and without change the file's encoding.
is there a better way to read a file and sent it content to client with the same way in python2 and python3 ?
thanks in advance.
You just have to open your file in binary mode, not in text mode:
with open(resource,"rb") as f:
data = f.read()
then, data is a bytes object in python 3, and a str in python 2, and it works for both versions.
As a positive side-effect, when this code hits a Windows box, it still works (else binary files like images are corrupt because of the endline termination conversion when opened in text mode).

Os.walk - WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect:

new to python and looking for some help on a problem I am having with os.walk. I have had a solid look around and cannot find the right solution to my problem.
What the code does:
Scans a users selected HD or folder and returns all the filenames, subdirs and size. This is then manipulated in pandas (not in code below) and exported to an excel spreadsheet in the formatting I desired.
However, in the first part of the code, in Python 2.7, I am currently experiencing the below error:
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'E:\03. Work\Bre\Files\folder2\icons greyscale flatten\._Icon_18?10 Stainless Steel.psd'
I have explored using raw string (r') but to no avail. Perhaps I am writing it wrong.
I will note that I never get this in 3.5 or on cleanly labelled selected folders. Due to Pandas and pysinstaller problems with 3.5, I am hoping to stick with 2.7 until the error with 3.5 is resolved.
import pandas as pd
import xlsxwriter
import os
from io import StringIO
#Lists for Pandas Dataframes
fpath = []
fname = []
fext = []
sizec = []
# START #Select file directory to scan
filed = raw_input("\nSelect a directory to scan: ")
#Scan the Hard-Drive and add to lists for Pandas DataFrames
print "\nGetting details..."
for root, dirs, files in os.walk(filed):
for filename in files:
f = os.path.abspath(root) #File path
fpath.append(f)
fname.append(filename) #File name
s = os.path.splitext(filename)[1] #File extension
s = str(s)
fext.append(s)
p = os.path.join(root, filename) #File size
si = os.stat(p).st_size
sizec.append(si)
print "\nDone!"
Any help would be greatly appreciated :)
In order to traverse filenames with unicode characters, you need to give os.walk a unicode path name.
Your path contains a unicode character, which is being displayed as ? in the exception.
If you pass in the unicode path, like this os.walk(unicode(filed)) you should not get that exception.
As noted in Convert python filenames to unicode sometimes you'll get a bytestring if the path is "undecodable" by Python 2.

How to get a file to be used as input of the program that ends with special character in python

I have an output file from a code which its name will ends to "_x.txt" and I want to connect two codes which second code will use this file as an input and will add more data into it. Finally, it will ends into "blabla_x_f.txt"
I am trying to work it out as below, but seems it is not correct and I could not solve it. Please help:
inf = str(raw_input(*+"_x.txt"))
with open(inf+'_x.txt') as fin, open(inf+'_x_f.txt','w') as fout:
....(other operations)
The main problem is that the "blabla" part of the file could change to any thing every time and will be random strings, so the code needs to be flexible and just search for whatever ends with "_x.txt".
Have a look at Python's glob module:
import glob
files = glob.glob('*_x.txt')
gives you a list of all files ending in _x.txt. Continue with
for path in files:
newpath = path[:-4] + '_f.txt'
with open(path) as in:
with open(newpath, 'w') as out:
# do something

Formatting text file

I have a txt file that I would like to alter so I will be able to place the data into columns see example below. The reason behind this is so I can import this data into a database / array and perform calculations on them. I tried importing/pasting the data into LibreCalc but it just imports everything into one column or it opens the file in LibreWriter I'm using ubuntu 10.04. Any ideas? I'm willing to use another program to work around this issue. I could also work with a comma delimited file but I'm not to sure how to convert the data to that format automatically.
Trying to get this:
WAVELENGTH, WAVENUMBER, INTENSITY, CLASSIFICATION, CODE,
1132.8322, 88274.326, 2300, PT II, 9356- 97630, 05,
Here's a link to the full file.
pt.txt file
Try this:
sed -e "s/(\s+)/,$1/g" pt.txt
is this what you want?
awk 'BEGIN{OFS=","}NF>1{$1=$1;print}' pt.txt
if you want the output format looks better, and you have "column" installed, you can try this too:
awk 'BEGIN{OFS=", "}NF>1{$1=$1;print}' pt.txt|column -t
The awk and sed one-liners are cool, but I expect you'll end up needing to do more than simply splitting up the file. If you do, and if you have access to Python 2.7, the following little script will get you going.
# -*- coding: utf-8 -*-
"""Convert to comma-delimited"""
import csv
from os import path
import re
import sys
def splitline(line):
return re.split('\s{2,}', line)
def main():
srcpath = path.abspath(sys.argv[1])
targetpath = path.splitext(srcpath)[0] + '.csv'
with open(srcpath) as infile, open(targetpath, 'w') as outfile:
writer = csv.writer(outfile)
for line in infile:
if line.startswith(' '):
line = line.strip()
cols = splitline(line)
writer.writerow(cols)
if __name__ == '__main__':
main()
The easiest way turned out to be importing using a fixed width like tohuwawohu suggested
Thanks
Without transforming it to a comma-separated file, you could access the csv import options by simply changing the file extension to .csv (maybe you should remove the "header" part manually, so that only the columns heads and the data rows do remain). After that, you can try to use whitespace as column delimiter, or even easier: select "fixed width" and set the columns manually. – tohuwawohu Oct 20 at 9:23