retriving data saved under HDF5 group as Carray - python-2.7

I am new to HDF5 file format and I have a data(images) saved in HDF5 format. The images are saved undere a group called 'data' which is under the root group as Carrays. what I want to do is to retrive a slice of the saved images. for example the first 400 or somthing like that. The following is what I did.
h5f = h5py.File('images.h5f', 'r')
image_grp= h5f['/data/'] #the image group (data) is opened
print(image_grp[0:400])
but I am getting the following error
Traceback (most recent call last):
File "fgf.py", line 32, in <module>
print(image_grp[0:40])
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
(/feedstock_root/build_artefacts/h5py_1496410723014/work/h5py-2.7.0/h5py/_objects.c:2846)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
(/feedstock_root/build_artefacts/h5py_1496410723014/work/h5py
2.7.0/h5py/_objects.c:2804)
File "/..../python2.7/site-packages/h5py/_hl/group.py", line 169, in
__getitem__oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "/..../python2.7/site-packages/h5py/_hl/base.py", line 133, in _e name = name.encode('ascii')
AttributeError: 'slice' object has no attribute 'encode'
I am not sure why I am getting this error but I am not even sure if I can slice the images which are saved as individual datasets.

I know this is an old question, but it is the first hit when searching for 'slice' object has no attribute 'encode' and it has no solution.
The error happens because the "group" is a group which does not have the encoding attribute. You are looking for the dataset element.
You need to find/know the key for the item that contains the dataset.
One suggestion is to list all keys in the group, and then guess which one it is:
print(list(image_grp.keys()))
This will give you the keys in the group.
A common case is that the first element is the image, so you can do this:
image_grp= h5f['/data/']
image= image_grp(image_grp.keys[0])
print(image[0:400])

yesterday I had a similar error and wrote this little piece of code to take my desired slice of h5py file.
import h5py
def h5py_slice(h5py_file, begin_index, end_index):
slice_list = []
with h5py.File(h5py_file, 'r') as f:
for i in range(begin_index, end_index):
slice_list.append(f[str(i)][...])
return slice_list
and it can be used like
the_desired_slice_list = h5py_slice('images.h5f', 0, 400)

Related

tensorflow object detection API: generate TF record of custom data set

I am trying to retrain the tensorflow object detection API with my own data
i have labelled my image with labelImg but when i am using the script create_pascal_tf_record.py which is included in the tensorflow/models/research, i got some errors and i dont really know why it happens
python object_detection/dataset_tools/create_pascal_tf_record.py --data_dir=/home/jim/Documents/tfAPI/workspace/training_cabbage/images/train/ --label_map_path=/home/jim/Documents/tfAPI/workspace/training_cabbage/annotations/label_map.pbtxt --output_path=/home/jim/Desktop/cabbage_pascal.record --set=train --annotations_dir=/home/jim/Documents/tfAPI/workspace/training_cabbage/images/train/ --year=merged
Traceback (most recent call last):
File "object_detection/dataset_tools/create_pascal_tf_record.py", line 185, in <module>
tf.app.run()
File "/home/jim/.virtualenvs/enrouteDeepDroneTF/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "object_detection/dataset_tools/create_pascal_tf_record.py", line 167, in main
examples_list = dataset_util.read_examples_list(examples_path)
File "/home/jim/Documents/tfAPI/models/research/object_detection/utils/dataset_util.py", line 59, in read_examples_list
lines = fid.readlines()
File "/home/jim/.virtualenvs/enrouteDeepDroneTF/local/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 188, in readlines
self._preread_check()
File "/home/jim/.virtualenvs/enrouteDeepDroneTF/local/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 85, in _preread_check
compat.as_bytes(self.__name), 1024 * 512, status)
File "/home/jim/.virtualenvs/enrouteDeepDroneTF/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: /home/jim/Documents/tfAPI/workspace/training_cabbage/images/train/VOC2007/ImageSets/Main/aeroplane_train.txt; No such file or directory
the train folder contains the xml and the jpg
the annotation folder contains my labelmap.pbtxt for my custom class
and i want to publish the TF record file on the desktop
it seems that it cant find a file in my images and annotations folder but i dont know why
If someone has idea, thank you in advance
This error happens because you use the code for PASCAL VOC, which requires certain data folders structure. Basically, you need to download and unpack VOCdevkit to make the script work. As user phd pointed you out, you need the file VOC2007/ImageSets/Main/aeroplane_train.txt.
I recommend you to write your own script for tfrecords creation, it's not difficult. You need just two key components:
Loop over your data that reads the images and annotations
A function that encodes the data into tf.train.Example. For that you can pretty much re-use the dict_to_tf_example
Inside the loop, having created the tf_example, pass it to TFRecordWriter:
writer.write(tf_example.SerializeToString())
OK for future references, this is how i add background images to the dataset allowing the model to train on it.
Functions used from: datitran/raccoon_dataset
Generate CSV file -> xml_to_csv.py
Generate TFRecord from CSV file -> generate_tfrecord.py
First Step - Creating XML file for it
Example of background image XML file
<annotation>
<folder>test/</folder>
<filename>XXXXXX.png</filename>
<path>your_path/test/XXXXXX.png</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>640</width>
<height>640</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
</annotation>
Basically you remove the entire <object> (i.e no annotation )
Second Step - Generate CSV file
Using the xml_to_csv.py I just add a little change, to consider the XML file that do not have any annotation (the background images) as so:
From the original:
https://github.com/datitran/raccoon_dataset/blob/93938849301895fb73909842ba04af9b602f677a/xml_to_csv.py#L12-L22
I add:
value = None
for member in root.findall('object'):
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
int(member[4][0].text),
int(member[4][1].text),
int(member[4][2].text),
int(member[4][3].text)
)
xml_list.append(value)
if value is None:
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
'-1',
'-1',
'-1',
'-1',
'-1'
)
xml_list.append(value)
I'm just adding negative values to the coordinates of the bounding box if there is no in the XML file, which is the case for the background images, and it will be usefull when generating the TFRecords.
Third and Final Step - Generating the TFRecords
Now, when creating the TFRecords, if the the corresponding row/image has negative coordinates, i just add zero values to the record (before, this would not even be possible).
So from the original:
https://github.com/datitran/raccoon_dataset/blob/93938849301895fb73909842ba04af9b602f677a/generate_tfrecord.py#L60-L66
I add:
for index, row in group.object.iterrows():
if int(row['xmin']) > -1:
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
else:
xmins.append(0)
xmaxs.append(0)
ymins.append(0)
ymaxs.append(0)
classes_text.append('something'.encode('utf8')) # this doe not matter for the background
classes.append(5000)
To note that in the class_text (of the else statement), since for the background images there are no bounding boxes, you can replace the string with whatever you would like, for the background cases, this will not appear anywhere.
And lastly for the classes (of the else statement) you just need to add a number label that does not belong to neither of your own classes.
For those who are wondering, I've used this procedure many times, and currently works for my use cases.
Hope it helped in some way.

Turi Create - Please use dropna() to drop rows

I am having issues with Apple Turi Create and image classifier. I have successfully created a model with 22 categories. I have recently added 5 more categories and console is giving me error warning
Please use dropna() to drop rows with missing target values.
The full console log looks like this:
[16:30:30] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[16:30:30] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
Resizing images...
Performing feature extraction on resized images...
Premature end of JPEG file
Completed 512/1883
Completed 1024/1883
Completed 1536/1883
Completed 1883/1883
PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
You can set ``validation_set=None`` to disable validation tracking.
[ERROR] turicreate.toolkits._main: Toolkit error: Target column has missing value.
Please use dropna() to drop rows with missing target values.
Traceback (most recent call last):
File "train.py", line 8, in <module>
model = tc.image_classifier.create(train_data, target='label', max_iterations=1000)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/turicreate/toolkits/image_classifier/image_classifier.py", line 132, in create
verbose=verbose)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/turicreate/toolkits/classifier/logistic_classifier.py", line 312, in create
seed=seed)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/turicreate/toolkits/_supervised_learning.py", line 397, in create
options, verbose)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/turicreate/toolkits/_main.py", line 75, in run
raise ToolkitError(str(message))
turicreate.toolkits._main.ToolkitError: Target column has missing value.
Please use dropna() to drop rows with missing target values.
I have upgraded turi and coremltools to the lates versions, but I don't know where I should implement the dropna() in the code. I only found this reference and followed the code.
It looks like this:
data.py
import turicreate as tc
image_data = tc.image_analysis.load_images('images', with_path=True)
labels = ['A', 'B', 'C', 'D']
def get_label(path, labels=labels):
for label in labels:
if label in path:
return label
image_data['label'] = image_data['path'].apply(get_label)
#import os
#image_data['label'] = image_data['path'].apply(lambda path: os.path.dirname(path).split('/')[-1])
image_data.save('boxes.sframe')
image_data.explore()
train.py
import turicreate as tc
data = tc.SFrame('boxes.sframe')
data.dropna()
train_data, test_data = data.random_split(0.8)
model = tc.image_classifier.create(train_data, target='label', max_iterations=1000)
predictions = model.classify(test_data)
results = model.evaluate(test_data)
print "Accuracy : %s" % results['accuracy']
print "Confusion Matrix : \n%s" % results['confusion_matrix']
model.save('boxes.model')
How do I drop all the empty columns and rows please? Does the max_iterations=1000 have also effect on the error?
Thank you for suggestions
data.dropna() isn't done in place, you need to write it: data = data.dropna()
See documentation here https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.dropna.html

scheduler produces empty files

I'm using pythonanywhere for a simple scheduled task.
I want to download data from a link once a day and save csv files. Later once i have a decent time series I'll figure out how I actually want to manage the data. It's not much data so don't need anything fancy like a database.
My script takes the data from the google sheets link, adds a log column and a time column, then writes a csv with the date in the filename.
It works exactly as I want it to when I run it manually in pythonanywhere, but the scheduler is just creating empty csv files albeit with the correct name.
Any ideas what's up? I don't understand the log file. Surely the error should happen when it is run manually?
script:
import pandas as pd
import time
import datetime
def write_today(df):
date = time.strftime("%Y-%m-%d")
df.to_csv('Properties_'+date+'.csv')
url = 'https://docs.google.com/spreadsheets/d/19h2GmLN-2CLgk79gVxcazxtKqS6rwW36YA-qvuzEpG4/export?format=xlsx'
df = pd.read_excel(url, header=1).rename(columns={'Unnamed: 1':'code'})
source = pd.read_excel(url).columns[0]
df['source'] = source
df['time'] = datetime.datetime.now()
write_today(df)
the scheduler is set up as so:
log file:
Traceback (most recent call last):
File "/home/abmoore/load_data.py", line 24, in <module>
write_today(df)
File "/home/abmoore/load_data.py", line 16, in write_today
df.to_csv('Properties_'+date+'.csv')
File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 1344, in to_csv
formatter.save()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1551, in save
self._save()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1638, in _save
self._save_header()
File "/usr/local/lib/python2.7/dist-packages/pandas/formats/format.py", line 1634, in _save_header
writer.writerow(encoded_labels)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: ordinal not in range(128)
Your problem there is the UnicodeDecodeError -- you have some non-ascii data in your spreadsheet, and the pandas to_csv function defaults to ascii encoding. try specifying utf8 instead:
def write_today(df):
filename = 'Properties_{date}.csv'.format(date=time.strftime("%Y-%m-%d"))
df.to_csv(filename, encoding='utf8')
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

how to use format read from xlrd and write thru xlsxwriter in python

I am reading an excel file using xlrd. Doing some macro replacing and then writing thru xlsxwriter. Without reading and copying formatting info the code works but when I add formatting info I get an error (at the bottom)
The code snippet is below..I read a xls file, for each data row I replace token macros with the values and write back. when I try to close the output_workbook I get an error
filePath = os.path.realpath(os.path.join(inputPath,filename))
input_workbook = open_workbook(filePath, formatting_info=True)
input_DataSheet = input_workbook.sheet_by_index(0)
data = [[input_DataSheet.cell_value(r,c) for c in range(input_DataSheet.ncols)] for r in range(input_DataSheet.nrows)]
output_workbook = xlsxwriter.Workbook('C:\Users\Manish\Downloads\Sunny\Drexel_Funding\MacroReplacer\demo.xlsx')
output_worksheet = output_workbook.add_worksheet()
for rowIndex, value in enumerate(data):
copyItem = []
for individualItem in value:
tempItem = individualItem
if (isinstance(individualItem, basestring)):
tempItem = tempItem.replace("[{0}]".format(investorNameMacro), investorName)
tempItem = tempItem.replace("[{0}]".format(investorPhoneMacro), investorPhone)
tempItem = tempItem.replace("[{0}]".format(investorEmailMacro), investorEmail)
tempItem = tempItem.replace("[{0}]".format(loanNumberMacro), loanNumber)
copyItem.append(tempItem)
for columnIndex, val in enumerate(copyItem):
fmt =input_workbook.xf_list[input_DataSheet.cell(rowIndex,columnIndex).xf_index]
output_worksheet.write(rowIndex,columnIndex, val,fmt)
output_workbook.close()
The error that I get is
Traceback (most recent call last):
File "C:/Users/Manish/Downloads/Sunny/Drexel_Funding/MacroReplacer/drexelfundingmacroreplacer.py", line 87, in
output_workbook.close()
File "build\bdist.win-amd64\egg\xlsxwriter\workbook.py", line 297, in close
File "build\bdist.win-amd64\egg\xlsxwriter\workbook.py", line 605, in _store_workbook
File "build\bdist.win-amd64\egg\xlsxwriter\packager.py", line 131, in _create_package
File "build\bdist.win-amd64\egg\xlsxwriter\packager.py", line 189, in _write_worksheet_files
File "build\bdist.win-amd64\egg\xlsxwriter\worksheet.py", line 3426, in _assemble_xml_file
File "build\bdist.win-amd64\egg\xlsxwriter\worksheet.py", line 4829, in _write_sheet_data
File "build\bdist.win-amd64\egg\xlsxwriter\worksheet.py", line 5015, in _write_rows
File "build\bdist.win-amd64\egg\xlsxwriter\worksheet.py", line 5183, in _write_cell
AttributeError: 'XF' object has no attribute '_get_xf_index'
any help is appreciated
Thanks
The Xlrd and XlsxWriter formats are different object types and are not interchangeable.
If you wish to preserve formatting you will have to write some code that translates the properties from one to the other.

How to solve AttributeError in python active_directory?

Running the below script works for 60% of the entries from the MasterGroupList however suddenly fails with the below error. although my questions seem to be poor ou guys have been able to help me before. Any idea how I can avoid getting this error? or what is trhoughing off the script? The masterGroupList looks like:
Groups Pulled from AD
SET00 POWERUSER
SET00 USERS
SEF00 CREATORS
SEF00 USERS
...another 300 entries...
Error:
Traceback (most recent call last):
File "C:\Users\ks185278\OneDrive - NCR Corporation\Active Directory Access Scr
ipt\test.py", line 44, in <module>
print group.member
File "C:\Python27\lib\site-packages\active_directory.py", line 805, in __getat
tr__
raise AttributeError
AttributeError
Code:
from active_directory import *
import os
file = open("C:\Users\NAME\Active Directory Access Script\MasterGroupList.txt", "r")
fileAsList = file.readlines()
indexOfTitle = fileAsList.index("Groups Pulled from AD\n")
i = indexOfTitle + 1
while i <= len(fileAsList):
fileLocation = 'C:\\AD Access\\%s\\%s.txt' % (fileAsList[i][:5], fileAsList[i][:fileAsList[i].find("\n")])
#Creates the dir if it does not exist already
if not os.path.isdir(os.path.dirname(fileLocation)):
os.makedirs(os.path.dirname(fileLocation))
fileGroup = open(fileLocation, "w+")
#writes group members to the open file
group = find_group(fileAsList[i][:fileAsList[i].find("\n")])
print group.member
for group_member in group.member: #this is line 44
fileGroup.write(group_member.cn + "\n")
fileGroup.close()
i+=1
Disclaimer: I don't know python, but I know Active Directory fairly well.
If it's failing on this:
for group_member in group.member:
It could possibly mean that the group has no members.
Depending on how phython handles this, it could also mean that the group has only one member and group.member is a plain string rather than an array.
What does print group.member show?
The source code of active_directory.py is here: https://github.com/tjguk/active_directory/blob/master/active_directory.py
These are the relevant lines:
if name not in self._delegate_map:
try:
attr = getattr(self.com_object, name)
except AttributeError:
try:
attr = self.com_object.Get(name)
except:
raise AttributeError
So it looks like it just can't find the attribute you're looking up, which in this case looks like the 'member' attribute.