How to read & export certain files from a Python GUI-prompted directory? - python-2.7

OK guys,
I'm currently working on a file reading and processing with Python & OpenCV cs' GUI feature. The feature will prompt the user to select a directory path for a folder containing 340 JPEG images, which I labelled them as "frame1" to "frame340". Then, I want to select several frames, process them, and save the processed ones in a different directory.
My big issue is, I'm trying to get only frame87, frame164, and frame248 from this folder with 340 images, and Python just keep returning error that claimed "directory name is invalid", like this:
Traceback (most recent call last):
File "C:\Users\maxwell_hamzah\Documents\Python27\", line 25, in <module>
imgRead = os.listdir(str(dirname) + "/frame"+ str(i) + ".jpg")
WindowsError: [Error 267] The directory name is invalid: 'C:/Users/maxwell_hamzah/Documents/Python27/toby arm framed/frame87.jpg/*.*'
To help familiarize with the situation, here's what my work looks like:
import os
import numpy as np
import cv2
from matplotlib import pyplot as plt
from skimage import color, data, restoration
import Tkinter, tkFileDialog
# first, we setup the Tkinter features for file-reading
root = Tkinter.Tk()
# prompt user to ask about the file directory
dirname = tkFileDialog.askdirectory\
(parent=root,initialdir="/",title='Pick FRAMES directory')
X = [] # initiate an array to store read images
frameIndex = [87, 163, 248] #this index is which frames we are interested in
imgRead = ""
temp = []
# we begin to read only frame87, frame163, and frame248
for i in frameIndex:
imgRead = os.listdir(str(dirname) + "/frame"+ str(i) + ".jpg")
temp = cv2.imread(imgRead, -1)
I'm totally stuck on how to fix this bug on especially the for loop part, where the error comes from. Python keeps freeking out on the imgRead variable claiming that the directory is invalid. Plus, I'm also wondering on how to "export" processed files to other directories (e.g. saving processed images from "My Pictures" to "My Music")
Really appreciate your help, guys.

In the last block, you call a method to list files, which is expecting a directory, but you pass it a file path. That's a bug, and actually you don't need that here in the first place:
for i in frameIndex:
imgRead = "{0}/frame{1}.jpg".format(dirname, i)
temp = cv2.imread(imgRead, -1)
As to moving files in Python, that's a pretty classic need, there's plenty of doc out there. One example.


How to download a snappy.parquet file from s3 using Boto in Python

I'm new to this, and trying to download a snappy.parquet file from Amazon s3 I can later convert to CSV file.
I tried working with the following example I've found online, and I get an empty folder. can anyone please help me?
import boto
import sys, os
from boto.s3.key import Key
from boto.exception import S3ResponseError
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_ACCESS_SECRET_KEY)
bucket = conn.get_bucket(BUCKET_NAME)
#goto through the list of files
bucket_list = bucket.list()
for l in bucket_list:
key_string = str(l.key)
s3_path = DOWNLOAD_LOCATION_PATH + key_string
print ("Current File is ", s3_path)
except (OSError, S3ResponseError) as e:
# check if the file has been downloaded locally
if not os.path.exists(s3_path):
except OSError as exc:
# let guard againts race conditions
import errno
if exc.errno != errno.EEXIST:
The script you are using appears to recursively download the contents of the specified S3 bucket (BUCKET_NAME) to the specified local directory (DOWNLOAD_LOCATION_PATH). FWIW, I notice this script looks like it comes from here.
The "Current File is ..." output line should show you the progress of these files being written. One problem you might be having is due to this line:
s3_path = DOWNLOAD_LOCATION_PATH + key_string
If you had specified DOWNLOAD_LOCATION_PATH at the top as a directory without a trailing '/' character, e.g. like this:
then the files being downloaded would be written not underneath the /tmp/my_dir directory, but directly in /tmp/ with a my_dir prefix on each filename! You can fix this by changing this line to:
s3_path = os.path.join(DOWNLOAD_LOCATION_PATH, key_string)
Other than that, the script appears to work alright. You may want to add this line at the very top:
from __future__ import print_function
if you are still using Python 2.x, otherwise the print output will look a bit odd (print will think you are printing a 2-Tuple).
Your question also makes it sound like you really only want/need to download a single file from the bucket -- if so, this isn't really a great script to be using, since it's downloading everything.

How to load retrained_graph.pb and retrained_label.txt using pycharm editor

Using pete warden tutorials i had trained the inception network and training of which i am getting two files
Using this i wanted to classify the flower image.
I had install pycharm and linked all the tensorflow library , i had also test the sample tensorflow code it is working fine.
Now when i run the program which is
import tensorflow as tf, sys
image_path = sys.argv[1]
# Read in the image_data
image_data = tf.gfile.FastGFile(image_path, 'rb').read()
# Loads label file, strips off carriage return
label_lines = [line.rstrip() for line
in tf.gfile.GFile("/tf_files/retrained_labels.txt")]
# Unpersists graph from file
with tf.gfile.FastGFile("/tf_files/retrained_graph.pb", 'rb') as f:
graph_def = tf.GraphDef()
_ = tf.import_graph_def(graph_def, name='')
with tf.Session() as sess:
# Feed the image_data as input to the graph and get first prediction
softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
predictions =, \
{'DecodeJpeg/contents:0': image_data})
# Sort to show labels of first prediction in order of confidence
top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]
for node_id in top_k:
human_string = label_lines[node_id]
score = predictions[0][node_id]
print('%s (score = %.5f)' % (human_string, score))
i am getting this error message
/home/chandan/Tensorflow/bin/python /home/chandan/PycharmProjects/tf/tf_folder/tf_files/
Traceback (most recent call last):
File "/home/chandan/PycharmProjects/tf/tf_folder/tf_files/", line 7, in <module>
image_path = sys.argv[1]
IndexError: list index out of range
Could any one please help me with this issue.
You are getting this error because it is expecting image name (with path) as an argument.
In pycharm go to View->Tool windows->Terminal.
It is same as opening separate terminal. And run
python /image_path/image_name.jpg
You are trying to get the command line argument by calling sys.argv[1]. So you need to give command line arguments to satisfy it. Looks like the argument required is a test image, you should pass its location as a parameter.
Pycharm should have a script parameters and interpreter options dialog which you can use to enter the required parameters.
Or you can call the script from a command line and enter the parameter via;
>python my_python_parameter.jpg
According to the documents (I don't have pycharm installed on this computer), you should go to Run/Debug configuration menu and edit the configurations for your script. Add the absolute path of your file into Script Parameters box in quotes.
Or alternatively if you just want to skip the parameter thing completely, just get the path as raw_input (input in python3) or just simply give it to image_path = r"absolute_image_path.jpg"

I don't understand how this "os.join" function is working? I am getting errors constantly and no reading on os functions is helping me

Here's the code
sys.path.append( "../tools/" )
from parse_out_email_text import parseOutText #(its just another .py file that has a function I wrote)
from_sara = open("from_sara.txt", "r")
from_chris = open("from_chris.txt", "r")
from_data = []
word_data = []
temp_counter = 0
for name, from_person in [("sara", from_sara), ("chris", from_chris)]:
for path in from_person:
### only look at first 200 emails when developing
### once everything is working, remove this line to run over full dataset
temp_counter += 1
if temp_counter < 200:
path = os.path.join('..', path[:-1]) #(THIS IS THE PART I CAN'T GET MY HEAD AROUND)
print path
email = open(path, "r")
print "emails processed"
When I run this, it gives me an error as shown below:
Traceback (most recent call last):
File "C:/Users/AmitSingh/Desktop/Data/Udacity/Naya_attempt/", line 47, in <module>
email = open(path, "r")
IOError: [Errno 2] No such file or directory: '..\\maildir/bailey-s/deleted_items/101.'
I don't even have this """'..\maildir/bailey-s/deleted_items/101.'""" directory path on my laptop, I tried to change the path by replacing the '..' in the code by the actual path name to the folder where I keep all the files, and nothing changes.
path = os.path.join('..', path[:-1])
This code is part of an online course on machine learning and I have been stuck at this point for 3 hours now. Any help would be really appreciated.
(P.S. This is not a homework question and there are no grades attached to this, its a free online course)
your test data is not there so it cannot find it. you should run start-up code again and make sure the necessary maildir are all there.
Go to tools inside your udacity project directory and run
It is about 400 Mb so sit back and relax!
I know this is extremely late, but I found this post after having the exact same problem.
All the answers that I found here and on other sites, even the issue requests in the original github, were just "run" I already did that. However, it was telling me:
Traceback (most recent call last):
File "K:\documents\Udacity\Mini-Projects\ud120-projects\text_learning\", line 48, in <module>
email = open(path, "r")
FileNotFoundError: [Errno 2] No such file or directory: '..\\maildir/bailey-s/deleted_items/101.'
Just like yours. I then found where this file was located and it was indeed on my computer
I added 'tools' to the os.path.join() line as you can see here:
for name, from_person in [("sara", from_sara), ("chris", from_chris)]:
for path in from_person:
### only look at first 200 emails when developing
### once everything is working, remove this line to run over full dataset
temp_counter += 1
if temp_counter < 200:
#path = os.path.join('..', path[:-1]) <---original
path = os.path.join('..','tools', path[:-1])
email = open(path, "r")
This worked for me finally. So, I hope it helps anyone else that stumbles on this problem in the future.
Also, I noticed on some examples I found of other repos of the lessons. Their 'tools' folder was named 'utils'.
Here is an example, this is a repo that someone tweaked to use jupyter notebooks to run the lessons So, use the one that you have.
In your Udacity course folder, first go to tools directory, check if you have maildir folder present and if it has got subfolders in it, if they are present then go back to text_learning/, find this line of code path = os.path.join('..', path[:-1]), change it to path = os.path.join('../tools/', path[:-1]),
On terminal, cd text_learning , then python, this should solve the issue.
If this does not solve the issue, then Go to tools inside your udacity project directory and run Wait till the process is complete
Repeat step 1.

Os.walk - WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect:

new to python and looking for some help on a problem I am having with os.walk. I have had a solid look around and cannot find the right solution to my problem.
What the code does:
Scans a users selected HD or folder and returns all the filenames, subdirs and size. This is then manipulated in pandas (not in code below) and exported to an excel spreadsheet in the formatting I desired.
However, in the first part of the code, in Python 2.7, I am currently experiencing the below error:
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'E:\03. Work\Bre\Files\folder2\icons greyscale flatten\._Icon_18?10 Stainless Steel.psd'
I have explored using raw string (r') but to no avail. Perhaps I am writing it wrong.
I will note that I never get this in 3.5 or on cleanly labelled selected folders. Due to Pandas and pysinstaller problems with 3.5, I am hoping to stick with 2.7 until the error with 3.5 is resolved.
import pandas as pd
import xlsxwriter
import os
from io import StringIO
#Lists for Pandas Dataframes
fpath = []
fname = []
fext = []
sizec = []
# START #Select file directory to scan
filed = raw_input("\nSelect a directory to scan: ")
#Scan the Hard-Drive and add to lists for Pandas DataFrames
print "\nGetting details..."
for root, dirs, files in os.walk(filed):
for filename in files:
f = os.path.abspath(root) #File path
fname.append(filename) #File name
s = os.path.splitext(filename)[1] #File extension
s = str(s)
p = os.path.join(root, filename) #File size
si = os.stat(p).st_size
print "\nDone!"
Any help would be greatly appreciated :)
In order to traverse filenames with unicode characters, you need to give os.walk a unicode path name.
Your path contains a unicode character, which is being displayed as ? in the exception.
If you pass in the unicode path, like this os.walk(unicode(filed)) you should not get that exception.
As noted in Convert python filenames to unicode sometimes you'll get a bytestring if the path is "undecodable" by Python 2.

PYPDF watermarking returns error

hi im trying to watermark a pdf fileusing pypdf2 though i get this error i cant figure out what goes wrong.
i get the following error:
Traceback (most recent call last): File "", line 13, in <module>
page.mergePage(watermark.getPage(0)) File "C:\Python27\site-packages\PyPDF2\", line 1594, in mergePage
self._mergePage(page2) File "C:\Python27\site-packages\PyPDF2\", line 1651, in _mergePage
page2Content, rename, self.pdf) File "C:Python27\site-packages\PyPDF2\", line 1547, in
op = operands[i] KeyError: 0
using python 2.7.6 with pypdf2 1.19 on windows 32bit.
hopefully someone can tell me what i do wrong.
my python file:
from PyPDF2 import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input = PdfFileReader(open("test.pdf", "rb"))
watermark = PdfFileReader(open("watermark.pdf", "rb"))
# print how many pages input1 has:
print("test.pdf has %d pages." % input.getNumPages())
print("watermark.pdf has %d pages." % watermark.getNumPages())
# add page 0 from input, but first add a watermark from another PDF:
page = input.getPage(0)
# finally, write "output" to document-output.pdf
outputStream = file("outputs.pdf", "wb")
Try writing to a StringIO object instead of a disk file. So, replace this:
outputStream = file("outputs.pdf", "wb")
with this:
outputStream = StringIO.StringIO()
output.write(outputStream) #write merged output to the StringIO object
If above code works, then you might be having file writing permission issues. For reference, look at the PyPDF working example in my article.
I encountered this error when attempting to use PyPDF2 to merge in a page which had been generated by reportlab, which used an inline image canvas.drawInlineImage(...), which stores the image in the object stream of the PDF. Other PDFs that use a similar technique for images might be affected in the same way -- effectively, the content stream of the PDF has a data object thrown into it where PyPDF2 doesn't expect it.
If you're able to, a solution can be to re-generate the source pdf, but to not use inline content-stream-stored images -- e.g. generate with canvas.drawImage(...) in reportlab.
Here's an issue about this on PyPDF2.