tensorflow object detection API: generate TF record of custom data set - python-2.7

I am trying to retrain the tensorflow object detection API with my own data
i have labelled my image with labelImg but when i am using the script create_pascal_tf_record.py which is included in the tensorflow/models/research, i got some errors and i dont really know why it happens
python object_detection/dataset_tools/create_pascal_tf_record.py --data_dir=/home/jim/Documents/tfAPI/workspace/training_cabbage/images/train/ --label_map_path=/home/jim/Documents/tfAPI/workspace/training_cabbage/annotations/label_map.pbtxt --output_path=/home/jim/Desktop/cabbage_pascal.record --set=train --annotations_dir=/home/jim/Documents/tfAPI/workspace/training_cabbage/images/train/ --year=merged
Traceback (most recent call last):
File "object_detection/dataset_tools/create_pascal_tf_record.py", line 185, in <module>
tf.app.run()
File "/home/jim/.virtualenvs/enrouteDeepDroneTF/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "object_detection/dataset_tools/create_pascal_tf_record.py", line 167, in main
examples_list = dataset_util.read_examples_list(examples_path)
File "/home/jim/Documents/tfAPI/models/research/object_detection/utils/dataset_util.py", line 59, in read_examples_list
lines = fid.readlines()
File "/home/jim/.virtualenvs/enrouteDeepDroneTF/local/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 188, in readlines
self._preread_check()
File "/home/jim/.virtualenvs/enrouteDeepDroneTF/local/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 85, in _preread_check
compat.as_bytes(self.__name), 1024 * 512, status)
File "/home/jim/.virtualenvs/enrouteDeepDroneTF/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: /home/jim/Documents/tfAPI/workspace/training_cabbage/images/train/VOC2007/ImageSets/Main/aeroplane_train.txt; No such file or directory
the train folder contains the xml and the jpg
the annotation folder contains my labelmap.pbtxt for my custom class
and i want to publish the TF record file on the desktop
it seems that it cant find a file in my images and annotations folder but i dont know why
If someone has idea, thank you in advance

This error happens because you use the code for PASCAL VOC, which requires certain data folders structure. Basically, you need to download and unpack VOCdevkit to make the script work. As user phd pointed you out, you need the file VOC2007/ImageSets/Main/aeroplane_train.txt.
I recommend you to write your own script for tfrecords creation, it's not difficult. You need just two key components:
Loop over your data that reads the images and annotations
A function that encodes the data into tf.train.Example. For that you can pretty much re-use the dict_to_tf_example
Inside the loop, having created the tf_example, pass it to TFRecordWriter:
writer.write(tf_example.SerializeToString())

OK for future references, this is how i add background images to the dataset allowing the model to train on it.
Functions used from: datitran/raccoon_dataset
Generate CSV file -> xml_to_csv.py
Generate TFRecord from CSV file -> generate_tfrecord.py
First Step - Creating XML file for it
Example of background image XML file
<annotation>
<folder>test/</folder>
<filename>XXXXXX.png</filename>
<path>your_path/test/XXXXXX.png</path>
<source>
<database>Unknown</database>
</source>
<size>
<width>640</width>
<height>640</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
</annotation>
Basically you remove the entire <object> (i.e no annotation )
Second Step - Generate CSV file
Using the xml_to_csv.py I just add a little change, to consider the XML file that do not have any annotation (the background images) as so:
From the original:
https://github.com/datitran/raccoon_dataset/blob/93938849301895fb73909842ba04af9b602f677a/xml_to_csv.py#L12-L22
I add:
value = None
for member in root.findall('object'):
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
int(member[4][0].text),
int(member[4][1].text),
int(member[4][2].text),
int(member[4][3].text)
)
xml_list.append(value)
if value is None:
value = (root.find('filename').text,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
'-1',
'-1',
'-1',
'-1',
'-1'
)
xml_list.append(value)
I'm just adding negative values to the coordinates of the bounding box if there is no in the XML file, which is the case for the background images, and it will be usefull when generating the TFRecords.
Third and Final Step - Generating the TFRecords
Now, when creating the TFRecords, if the the corresponding row/image has negative coordinates, i just add zero values to the record (before, this would not even be possible).
So from the original:
https://github.com/datitran/raccoon_dataset/blob/93938849301895fb73909842ba04af9b602f677a/generate_tfrecord.py#L60-L66
I add:
for index, row in group.object.iterrows():
if int(row['xmin']) > -1:
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
else:
xmins.append(0)
xmaxs.append(0)
ymins.append(0)
ymaxs.append(0)
classes_text.append('something'.encode('utf8')) # this doe not matter for the background
classes.append(5000)
To note that in the class_text (of the else statement), since for the background images there are no bounding boxes, you can replace the string with whatever you would like, for the background cases, this will not appear anywhere.
And lastly for the classes (of the else statement) you just need to add a number label that does not belong to neither of your own classes.
For those who are wondering, I've used this procedure many times, and currently works for my use cases.
Hope it helped in some way.

Related

Django matplotlib tkinter.TclError: NULL main window

I've been working with matplotlib.pyplot, pdflatex and django to create an app that retrieves data from a database, createspng charts based on that data and compile a tex file that has that images. I create four kinds of charts, each one associated with a function:
#A bar/line with a rezised chart
plt.figure(figsize=(16.0, 10.0))
plt.bar(y_pos, values)
plt.plot(areas, standardValues)
plt.clf()
# A barchart with resized chart
plt.figure(figsize=(16.0, 10.0))
plt.bar(y_pos, diferrenceValues)
plt.clf()
# 10 histogram with distribution chart
shape, loc, scale = stats.lognorm.fit(hours_per_program, floc=0)
x_fit = np.linspace(data.min(), data.max(), 100)
hours_per_program_fit = stats.lognorm.pdf(x_fit, shape, loc=loc, scale=scale)
ax.hist(data, N_bins)
plt.clf()
# And 10 side bar charts
plt.barh(r1, data1, barHeight)
plt.barh(r2, data2, barHeight)
plt.barh(r3, data3, barHeight)
plt.clf()
The functions for each kind of chart is called in that order and that number of timer. So when I generate the pdf for the first time
...
File "/home/me/docs/project/t1/client/views.py", line 1359, in generatePdf
lineBarChart(areas_names, areas_program_hours, areas_standard_value, "cliente/static/img")
File "/home/me/docs/project/t1/client/chartGenerator.py", line 45, in lineBarChart
plt.figure(figsize=(16.0, 10.0))
File "/home/me/docs/project/t1/venv/lib/python3.5/site-packages/matplotlib/pyplot.py", line 548, in figure
**kwargs)
File "/home/me/docs/project/t1/venv/lib/python3.5/site-packages/matplotlib/backend_bases.py", line 161, in new_figure_manager
return cls.new_figure_manager_given_figure(num, fig)
File "/home/me/docs/project/t1/venv/lib/python3.5/site-packages/matplotlib/backends/_backend_tk.py", line 1053, in new_figure_manager_given_figure
icon_img = Tk.PhotoImage(file=icon_fname)
File "/usr/lib/python3.5/tkinter/__init__.py", line 3397, in __init__
Image.__init__(self, 'photo', name, cnf, master, **kw)
File "/usr/lib/python3.5/tkinter/__init__.py", line 3353, in __init__
self.tk.call(('image', 'create', imgtype, name,) + options)
_tkinter.TclError: NULL main window
So tkinter, which i presume is what matplotlib works with to generate the images throws this exception, that there is no main window, I dont really understand why this happens, I generate 22 charts with no problem, but at the time I need to call the functions again, I cant resize , nor generate the charts because there is no tkinter window.
I would really appreciate some help, i have been fighting with this problem for days now I could not find the solution.
Thanks a lot in advance

Turi Create - Please use dropna() to drop rows

I am having issues with Apple Turi Create and image classifier. I have successfully created a model with 22 categories. I have recently added 5 more categories and console is giving me error warning
Please use dropna() to drop rows with missing target values.
The full console log looks like this:
[16:30:30] src/nnvm/legacy_json_util.cc:190: Loading symbol saved by previous version v0.8.0. Attempting to upgrade...
[16:30:30] src/nnvm/legacy_json_util.cc:198: Symbol successfully upgraded!
Resizing images...
Performing feature extraction on resized images...
Premature end of JPEG file
Completed 512/1883
Completed 1024/1883
Completed 1536/1883
Completed 1883/1883
PROGRESS: Creating a validation set from 5 percent of training data. This may take a while.
You can set ``validation_set=None`` to disable validation tracking.
[ERROR] turicreate.toolkits._main: Toolkit error: Target column has missing value.
Please use dropna() to drop rows with missing target values.
Traceback (most recent call last):
File "train.py", line 8, in <module>
model = tc.image_classifier.create(train_data, target='label', max_iterations=1000)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/turicreate/toolkits/image_classifier/image_classifier.py", line 132, in create
verbose=verbose)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/turicreate/toolkits/classifier/logistic_classifier.py", line 312, in create
seed=seed)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/turicreate/toolkits/_supervised_learning.py", line 397, in create
options, verbose)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/turicreate/toolkits/_main.py", line 75, in run
raise ToolkitError(str(message))
turicreate.toolkits._main.ToolkitError: Target column has missing value.
Please use dropna() to drop rows with missing target values.
I have upgraded turi and coremltools to the lates versions, but I don't know where I should implement the dropna() in the code. I only found this reference and followed the code.
It looks like this:
data.py
import turicreate as tc
image_data = tc.image_analysis.load_images('images', with_path=True)
labels = ['A', 'B', 'C', 'D']
def get_label(path, labels=labels):
for label in labels:
if label in path:
return label
image_data['label'] = image_data['path'].apply(get_label)
#import os
#image_data['label'] = image_data['path'].apply(lambda path: os.path.dirname(path).split('/')[-1])
image_data.save('boxes.sframe')
image_data.explore()
train.py
import turicreate as tc
data = tc.SFrame('boxes.sframe')
data.dropna()
train_data, test_data = data.random_split(0.8)
model = tc.image_classifier.create(train_data, target='label', max_iterations=1000)
predictions = model.classify(test_data)
results = model.evaluate(test_data)
print "Accuracy : %s" % results['accuracy']
print "Confusion Matrix : \n%s" % results['confusion_matrix']
model.save('boxes.model')
How do I drop all the empty columns and rows please? Does the max_iterations=1000 have also effect on the error?
Thank you for suggestions
data.dropna() isn't done in place, you need to write it: data = data.dropna()
See documentation here https://apple.github.io/turicreate/docs/api/generated/turicreate.SFrame.dropna.html

retriving data saved under HDF5 group as Carray

I am new to HDF5 file format and I have a data(images) saved in HDF5 format. The images are saved undere a group called 'data' which is under the root group as Carrays. what I want to do is to retrive a slice of the saved images. for example the first 400 or somthing like that. The following is what I did.
h5f = h5py.File('images.h5f', 'r')
image_grp= h5f['/data/'] #the image group (data) is opened
print(image_grp[0:400])
but I am getting the following error
Traceback (most recent call last):
File "fgf.py", line 32, in <module>
print(image_grp[0:40])
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
(/feedstock_root/build_artefacts/h5py_1496410723014/work/h5py-2.7.0/h5py/_objects.c:2846)
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
(/feedstock_root/build_artefacts/h5py_1496410723014/work/h5py
2.7.0/h5py/_objects.c:2804)
File "/..../python2.7/site-packages/h5py/_hl/group.py", line 169, in
__getitem__oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "/..../python2.7/site-packages/h5py/_hl/base.py", line 133, in _e name = name.encode('ascii')
AttributeError: 'slice' object has no attribute 'encode'
I am not sure why I am getting this error but I am not even sure if I can slice the images which are saved as individual datasets.
I know this is an old question, but it is the first hit when searching for 'slice' object has no attribute 'encode' and it has no solution.
The error happens because the "group" is a group which does not have the encoding attribute. You are looking for the dataset element.
You need to find/know the key for the item that contains the dataset.
One suggestion is to list all keys in the group, and then guess which one it is:
print(list(image_grp.keys()))
This will give you the keys in the group.
A common case is that the first element is the image, so you can do this:
image_grp= h5f['/data/']
image= image_grp(image_grp.keys[0])
print(image[0:400])
yesterday I had a similar error and wrote this little piece of code to take my desired slice of h5py file.
import h5py
def h5py_slice(h5py_file, begin_index, end_index):
slice_list = []
with h5py.File(h5py_file, 'r') as f:
for i in range(begin_index, end_index):
slice_list.append(f[str(i)][...])
return slice_list
and it can be used like
the_desired_slice_list = h5py_slice('images.h5f', 0, 400)

Datanitro's remove_sheet throws exception

I'm trying to copy in a sheet from one workbook to another but because i might already have a previous version of this sheet in my workbook I want to check first if it exists and delete it first before copying. However, keep getting an error when executing remove_sheet (the sheet does get removed though). Any ideas? (BTW It's not the only sheet on the file - so not that issue)
def import_sheet(book_from, book_to ,sheet_to_cpy):
active_wkbk(book_to)
if sheet_to_cpy in all_sheets(True):
remove_sheet(sheet_to_cpy)
active_wkbk(book_from)
copy_sheet(book_to, sheet_to_cpy)
File "blah.py", line 22, in import_sheet
remove_sheet(sheet_to_cpy)
File "27/basic_io.py", line 1348, in remove_sheet
File "27/basic_io.py", line 1215, in active_sheet
NameError: MyTab is not an existing worksheet
This is a bug on our end - we'll fix it as soon as we can!
In the meantime, here's a workaround:
def import_sheet(book_from, book_to, sheet_to_cpy):
active_wkbk(book_to)
if sheet_to_cpy in all_sheets(True):
if sheet_to_cpy == active_sheet():
## avoid bug by manually setting last sheet active
active_sheet(all_sheets()[-1])
remove_sheet(sheet_to_cpy)
active_wkbk(book_from)
copy_sheet(book_to, sheet_to_cpy)
Source: I'm one of the DataNitro developers.

error: unpack requires a string argument of length 8

I was running my script and I stumbled upon on this error
WARNING *** file size (24627) not 512 + multiple of sector size (512)
WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
Traceback (most recent call last):
File "C:\Email Attachments\whatever.py", line 20, in <module>
main()
File "C:\Email Attachments\whatever.py", line 17, in main
csv_from_excel()
File "C:\Email Attachments\whatever.py", line 7, in csv_from_excel
sh = wb.sheet_by_name('B2B_REP_YLD_100_D_SQ.rpt')
File "C:\Python27\lib\site-packages\xlrd\book.py", line 442, in sheet_by_name
return self.sheet_by_index(sheetx)
File "C:\Python27\lib\site-packages\xlrd\book.py", line 432, in sheet_by_index
return self._sheet_list[sheetx] or self.get_sheet(sheetx)
File "C:\Python27\lib\site-packages\xlrd\book.py", line 696, in get_sheet
sh.read(self)
File "C:\Python27\lib\site-packages\xlrd\sheet.py", line 1055, in read
dim_tuple = local_unpack('<ixxH', data[4:12])
error: unpack requires a string argument of length 8
I was trying to process this excel file.
https://drive.google.com/file/d/0B12NevhOGQGRMkRVdExuYjFveDQ/edit?usp=sharing
One solution that I found is that I have to open manually the spreadsheet, save it, then close it before I run my script of converting .xls to .csv. I find this solution a bit cumbersome and clunky.
This kind of spreadsheet is saved daily in my drive via an Outlook Macro. Unprocessed data is increasing that's why I turned into scripting to ease the job.
Who made the Outlook macro that's dumping this file? xlrd uses byte level unpacking to read in the Excel file, and is failing to read a field in this excel file. There are ways to follow where its failing, but none to automatically recover from this type of error.
The erroneous data seems to be at data[4:12] of a specific frame (we'll see later), which should be a bytestring that's parsed as such:
one integer (i)
2 pad bytes (xx)
unsigned short 2 byte integer (H).
You can set xlrd to DEBUG mode, which will show you which bytes its parsing, and exactly where in the file there is an error:
import xlrd
xlrd.DEBUG = 2
workbook = xlrd.open_workbook(u'/home/sparker/Downloads/20131117_040934_B2B_REP_YLD_100_D_LT.xls')
Here's the results, slightly trimmed down for the sake of SO:
parse_globals: record code is 0x0293
parse_globals: record code is 0x0293
parse_globals: record code is 0x0085
CODEPAGE: codepage 1200 -> encoding 'utf_16_le'
BOUNDSHEET: bv=80 data '\xfd\x04\x00\x00\x00\x00\x18\x00B2B_REP_YLD_100_D_SQ.rpt'
BOUNDSHEET: inx=0 vis=0 sheet_name=u'B2B_REP_YLD_100_D_SQ.rpt' abs_posn=1277 sheet_type=0x00
parse_globals: record code is 0x000a
GET_SHEETS: [u'B2B_REP_YLD_100_D_SQ.rpt'] [1277]
GET_SHEETS: sheetno = 0 [u'B2B_REP_YLD_100_D_SQ.rpt'] [1277]
reqd: 0x0010
getbof(): data='\x00\x06\x10\x00\xbb\r\xcc\x07\x00\x00\x00\x00\x06\x00\x00\x00'
getbof(): op=0x0809 version2=0x0600 streamtype=0x0010
getbof(): BOF found at offset 1277; savpos=1277
BOF: op=0x0809 vers=0x0600 stream=0x0010 buildid=3515 buildyr=1996 -> BIFF80
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "build/bdist.macosx-10.4-x86_64/egg/xlrd/__init__.py", line 457, in open_workbook
File "build/bdist.macosx-10.4-x86_64/egg/xlrd/__init__.py", line 1007, in get_sheets
File "build/bdist.macosx-10.4-x86_64/egg/xlrd/__init__.py", line 998, in get_sheet
File "build/bdist.macosx-10.4-x86_64/egg/xlrd/sheet.py", line 864, in read
struct.error: unpack requires a string argument of length 8
Specifically, you can see that it parses the name of the workbook named u'B2B_REP_YLD_100_D_SQ.rpt.
Lets check the source code. The traceback throws an error here where we can see from the parent loop that we're trying to parse the XL_DIMENSION and XL_DIMENSION2 values. These directly correspond to the shape of the Excel Sheet.
And that's where there's a problem in your workbook. It's not being made correctly. So, back to my original question, who made the excel macro? It needs to be fixed. But that's for another SO question, some other time.