Os.walk - WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: - python-2.7

new to python and looking for some help on a problem I am having with os.walk. I have had a solid look around and cannot find the right solution to my problem.
What the code does:
Scans a users selected HD or folder and returns all the filenames, subdirs and size. This is then manipulated in pandas (not in code below) and exported to an excel spreadsheet in the formatting I desired.
However, in the first part of the code, in Python 2.7, I am currently experiencing the below error:
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'E:\03. Work\Bre\Files\folder2\icons greyscale flatten\._Icon_18?10 Stainless Steel.psd'
I have explored using raw string (r') but to no avail. Perhaps I am writing it wrong.
I will note that I never get this in 3.5 or on cleanly labelled selected folders. Due to Pandas and pysinstaller problems with 3.5, I am hoping to stick with 2.7 until the error with 3.5 is resolved.
import pandas as pd
import xlsxwriter
import os
from io import StringIO
#Lists for Pandas Dataframes
fpath = []
fname = []
fext = []
sizec = []
# START #Select file directory to scan
filed = raw_input("\nSelect a directory to scan: ")
#Scan the Hard-Drive and add to lists for Pandas DataFrames
print "\nGetting details..."
for root, dirs, files in os.walk(filed):
for filename in files:
f = os.path.abspath(root) #File path
fpath.append(f)
fname.append(filename) #File name
s = os.path.splitext(filename)[1] #File extension
s = str(s)
fext.append(s)
p = os.path.join(root, filename) #File size
si = os.stat(p).st_size
sizec.append(si)
print "\nDone!"
Any help would be greatly appreciated :)

In order to traverse filenames with unicode characters, you need to give os.walk a unicode path name.
Your path contains a unicode character, which is being displayed as ? in the exception.
If you pass in the unicode path, like this os.walk(unicode(filed)) you should not get that exception.
As noted in Convert python filenames to unicode sometimes you'll get a bytestring if the path is "undecodable" by Python 2.

Related

The glob.glob function to extract data from files

I am trying to run the script below. The intention of the script is to open different fasta files one after the other, and extract the geneID. The script works well if I don't use the glob.glob function. I get this message TypeError: coercing to Unicode: need string or buffer, list found
files='/home/pathtofiles/files'
#print files
#sys.exit()
for file in files:
fastas=sorted(glob.glob(files + '/*.fasta'))
#print fastas[0]
output_handle=(open(fastas, 'r+'))
genes_files=list(SeqIO.parse(output_handle, 'fasta'))
geneID=genes_files[0].id
print geneID
I am running of ideas on how to direct the script to open when file after another to give me the require information.
I see what you are trying to do, but let me first explain why your current approach is not working.
You have a path to a directory with fasta files and you want to loop over the files in that directory. But observe what happens if we do:
>>> files='/home/pathtofiles/files'
>>> for file in files:
>>> print file
/
h
o
m
e
/
p
a
t
h
t
o
f
i
l
e
s
/
f
i
l
e
s
Not the list of filenames you expected! files is a string and when you apply a for loop on a string you simply iterate over the characters in that string.
Also, as doctorlove correctly observed, in your code fastas is a list and open expects a path to a file as first argument. That's why you get the TypeError: ... need string, ... list found.
As an aside (and this is more a problem on Windows then on Linux or Mac), but it is good practice to always use raw string literals (prefix the string with an r) when working with pathnames to prevent the unwanted expansion of backslash escaped sequences like \n and \t to newline and tab.
>>> path = 'C:\Users\norah\temp'
>>> print path
C:\Users
orah emp
>>> path = r'C:\Users\norah\temp'
>>> print path
C:\Users\norah\temp
Another good practice is to use os.path.join() when combining pathnames and filenames. This prevents subtle bugs where your script works on your machine bug gives an error on the machine of your colleague who has a different operating system.
I would also recommend using the with statement when opening files. This assures that the filehandle gets properly closed when you're done with it.
As a final remark, file is a built-in function in Python and it is bad practice to use a variable with the same name as a built-in function because that can cause bugs or confusion later on.
Combing all of the above, I would rewrite your code like this:
import os
import glob
from Bio import SeqIO
path = r'/home/pathtofiles/files'
pattern = os.path.join(path, '*.fasta')
for fasta_path in sorted(glob.glob(pattern)):
print fasta_path
with open(fasta_path, 'r+') as output_handle:
genes_records = SeqIO.parse(output_handle, 'fasta')
for gene_record in genes_records:
print gene_record.id
This is way I solved the problem, and this script works.
import os,sys
import glob
from Bio import SeqIO
def extracting_information_gene_id():
#to extract geneID information and add the reference gene to each different file
files=sorted(glob.glob('/home/path_to_files/files/*.fasta'))
#print file
#sys.exit()
for file in files:
#print file
output_handle=open(file, 'r+')
ref_genes=list(SeqIO.parse(output_handle, 'fasta'))
geneID=ref_genes[0].id
#print geneID
#sys.exit()
#to extract the geneID as a reference record from the genes_files
query_genes=(SeqIO.index('/home/path_to_file/file.fa', 'fasta'))
#print query_genes[geneID].format('fasta') #check point
#sys.exit()
ref_gene=query_genes[geneID].format('fasta')
#print ref_gene #check point
#sys.exit()
output_handle.write(str(ref_gene))
output_handle.close()
query_genes.close()
extracting_information_gene_id()
print 'Reference gene sequence have been added'

Read & write txt file error - 'str' object has no attribute 'name', polish dialectical chars in path error

I use Python 2.7 on Win 7 Pro SP1.
I try code:
import os
path = "E:/data/keyword"
os.chdir(path)
files = os.listdir(path)
query = "{keyword} AND NOT("
result = open("query.txt", "w")
for file in files:
if file.endswith(".txt"):
file_path = file.name
dane = open(file_path, "r")
query.append(dane)
result.append(" OR ")
result.write(query)
result.write(")")
result.close()
I get error:
file_path = file.name AttributeError: 'str' object has no attribute
'name'
I can't figure why.
I have secon error when path is with polish dialectical chars like "ąęłńóżć". I get error for:
path = "E:/Bieżące projekty/keyword"
I try fix it to:
path =u"E:/Bieżące projekty/keyword"
but it not help. I'm starting with Python and I can't find out why this code is not working.
What i want
Find all text file in the directory.
Join all text file in one file text named "query.txt"
fx.
file 1
data1 data2
file 2
data 3 data 4
Output from "query.txt":
data1 data2 data 3 data 4
Above code working fine when path variable is without polish dialectical characters. When I change path I get error:
SyntaXError: Non-ASCII character '\xc5' in file query.py on line 9, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
On python doc PEP263 I find magic quote. Polish lang coding characters like "ąęłńóźżć" standard is ISO-8859-2. So i try add encoding to code. I try use UTF-8 too and I get the same error. My all code is (without 5 first lines with comment what code doing):
import os
#path = r"E:/data"
# -*- coding: iso-8859-2 -*-
path = r"E:/Bieżące przedsięwzięcia"
os.chdir(path)
files = os.listdir(path)
query = "{keyword} AND NOT("
for file in files:
if file.endswith(".txt"):
dane = open(file, "r")
text = dane.read()
query += text
print(query)
dane.close()
query.join(" OR ")
result = open("query.txt", "w")
result.write(query)
result.write(")")
result.close()
On Unicode/UTF-8 character here I found that polish char "ż" is coded in UTF-8 as "\xc5\xbc". Mark # to coding line with path with "ż" as comment make error too. When I remove line with this char code:
path = r"E:/Bieżące przedsięwzięcia"
working fine and I get result which I want.
For editing I use Notepad++ with default setings. I only set in python code tab replace by four space.
*
Second Question
I try find in Python doc in variable path what r does mean. I can't find it in Python 2.7 string documentation. Could someone tell my how this part of Python (like u, r before string value) is named fx.
path = u"somedata"
path = r"somedata"?
I would get doc to read about it.

How to read & export certain files from a Python GUI-prompted directory?

OK guys,
I'm currently working on a file reading and processing with Python & OpenCV cs' GUI feature. The feature will prompt the user to select a directory path for a folder containing 340 JPEG images, which I labelled them as "frame1" to "frame340". Then, I want to select several frames, process them, and save the processed ones in a different directory.
My big issue is, I'm trying to get only frame87, frame164, and frame248 from this folder with 340 images, and Python just keep returning error that claimed "directory name is invalid", like this:
Traceback (most recent call last):
File "C:\Users\maxwell_hamzah\Documents\Python27\imageReadBeta.py", line 25, in <module>
imgRead = os.listdir(str(dirname) + "/frame"+ str(i) + ".jpg")
WindowsError: [Error 267] The directory name is invalid: 'C:/Users/maxwell_hamzah/Documents/Python27/toby arm framed/frame87.jpg/*.*'
To help familiarize with the situation, here's what my work looks like:
import os
import numpy as np
import cv2
from matplotlib import pyplot as plt
from skimage import color, data, restoration
import Tkinter, tkFileDialog
# first, we setup the Tkinter features for file-reading
root = Tkinter.Tk()
root.withdraw()
# prompt user to ask about the file directory
dirname = tkFileDialog.askdirectory\
(parent=root,initialdir="/",title='Pick FRAMES directory')
X = [] # initiate an array to store read images
frameIndex = [87, 163, 248] #this index is which frames we are interested in
imgRead = ""
temp = []
# we begin to read only frame87, frame163, and frame248
for i in frameIndex:
imgRead = os.listdir(str(dirname) + "/frame"+ str(i) + ".jpg")
temp = cv2.imread(imgRead, -1)
X.append(temp)
I'm totally stuck on how to fix this bug on especially the for loop part, where the error comes from. Python keeps freeking out on the imgRead variable claiming that the directory is invalid. Plus, I'm also wondering on how to "export" processed files to other directories (e.g. saving processed images from "My Pictures" to "My Music")
Really appreciate your help, guys.
Maxwell
In the last block, you call a method to list files, which is expecting a directory, but you pass it a file path. That's a bug, and actually you don't need that here in the first place:
for i in frameIndex:
imgRead = "{0}/frame{1}.jpg".format(dirname, i)
temp = cv2.imread(imgRead, -1)
X.append(temp)
As to moving files in Python, that's a pretty classic need, there's plenty of doc out there. One example.

Python 2.7 Using Tkinter to concatenate a source path and filename gives error of nonetype

I am using Python 2.7 and imported Tkinter and TK.
What I am trying to do is use a sourced path (a directory path) and concatenate it from picking a file by opening windows explorer. This will enable the user to not have to type in a file name.
I realized I wasn't using a return and would get the following error:
TypeError: cannot concatenate 'str' and 'NoneType' objects
After searching here for this error I found I needed to do a return. I tried to put string in the parenthesis but it doesn't' work. I am definitely missing something.
Here is a sample of my code:
from Tkinter import *
from Tkinter import Tk
from tkFileDialog import askopenfilename
source = '\\\\Isfs\\data$\\GIS Carto\TTP_Draw_Count' ## this a public directory path
filename = ''
filename = getFileName() ##this part is in a different def area.
with open (os.path.join(source + filename), 'r' ) as f: ## this is were it failing.
def getFileName():
Tk().withdraw()
filename = askopenfilename()
return getFileName()
I need to concatenate the source + filename to be used to process a csv file.
I didn't want to put all the code here since it is long and requires a csv file and custom dictionary to merge. All of that works. I hope I have put enough information in this question.
def getFileName():
Tk().withdraw()
filename = askopenfilename()
return getFileName()
You aren't returning the filename that you get here. Change this to:
def getFileName():
Tk().withdraw()
filename = askopenfilename()
return filename
Also note that askopenfilename gets the full path of the chosen file, so source+filename will evaluate to something like u'\\\\Isfs\\data$\\GIS Carto\\TTP_Draw_CountC:/Users/kevin/Desktop/myinput.txt'

Formatting text file

I have a txt file that I would like to alter so I will be able to place the data into columns see example below. The reason behind this is so I can import this data into a database / array and perform calculations on them. I tried importing/pasting the data into LibreCalc but it just imports everything into one column or it opens the file in LibreWriter I'm using ubuntu 10.04. Any ideas? I'm willing to use another program to work around this issue. I could also work with a comma delimited file but I'm not to sure how to convert the data to that format automatically.
Trying to get this:
WAVELENGTH, WAVENUMBER, INTENSITY, CLASSIFICATION, CODE,
1132.8322, 88274.326, 2300, PT II, 9356- 97630, 05,
Here's a link to the full file.
pt.txt file
Try this:
sed -e "s/(\s+)/,$1/g" pt.txt
is this what you want?
awk 'BEGIN{OFS=","}NF>1{$1=$1;print}' pt.txt
if you want the output format looks better, and you have "column" installed, you can try this too:
awk 'BEGIN{OFS=", "}NF>1{$1=$1;print}' pt.txt|column -t
The awk and sed one-liners are cool, but I expect you'll end up needing to do more than simply splitting up the file. If you do, and if you have access to Python 2.7, the following little script will get you going.
# -*- coding: utf-8 -*-
"""Convert to comma-delimited"""
import csv
from os import path
import re
import sys
def splitline(line):
return re.split('\s{2,}', line)
def main():
srcpath = path.abspath(sys.argv[1])
targetpath = path.splitext(srcpath)[0] + '.csv'
with open(srcpath) as infile, open(targetpath, 'w') as outfile:
writer = csv.writer(outfile)
for line in infile:
if line.startswith(' '):
line = line.strip()
cols = splitline(line)
writer.writerow(cols)
if __name__ == '__main__':
main()
The easiest way turned out to be importing using a fixed width like tohuwawohu suggested
Thanks
Without transforming it to a comma-separated file, you could access the csv import options by simply changing the file extension to .csv (maybe you should remove the "header" part manually, so that only the columns heads and the data rows do remain). After that, you can try to use whitespace as column delimiter, or even easier: select "fixed width" and set the columns manually. – tohuwawohu Oct 20 at 9:23