I am trying to create a generative network based on the pre-trained Inception_v3.
1) I fix all the weights in the model
2) create a Variable whose size is (2, 3, 299, 299)
3) create targets of size (2, 1000) that I want my final layer activations to become as close as possible to by optimizing the Variable.
(I do not set the batchsize of 1, because unlike VGG16, Inception_v3 doesn't take batchsize=1, but that's not the point).
The following code should work, but gives me the error: «RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation».
# minimalist code with Inception_v3 that throws the error:
import torch
from torch.autograd import Variable
import torch.optim as optim
import torch.nn as nn
import torchvision
torch.set_default_tensor_type('torch.FloatTensor')
Iv3 = torchvision.models.inception_v3(pretrained=True)
for i in Iv3.parameters():
i.requires_grad = False
criterion = nn.CrossEntropyLoss()
x = Variable(torch.randn(2, 3, 299, 299), requires_grad=True)
target = torch.empty(2, dtype=torch.long).random_(1000)
output = Iv3(x)
loss = criterion(output[0], target)
loss.backward()
print(x.grad)
This is very strange, because if I do the same thing with VGG16, everything works fine:
# minimalist working code with VGG16:
import torch
from torch.autograd import Variable
import torch.optim as optim
import torch.nn as nn
import torchvision
# torch.cuda.empty_cache()
# vgg16 = torchvision.models.vgg16(pretrained=True).cuda()
# torch.set_default_tensor_type('torch.cuda.FloatTensor')
torch.set_default_tensor_type('torch.FloatTensor')
vgg16 = torchvision.models.vgg16(pretrained=True)
for i in vgg16.parameters():
i.requires_grad = False
criterion = nn.CrossEntropyLoss()
x = Variable(torch.randn(2, 3, 229, 229), requires_grad=True)
target = torch.empty(2, dtype=torch.long).random_(1000)
output = vgg16(x)
loss = criterion(output, target)
loss.backward()
print(x.grad)
Please help.
Thanks to #iacolippo the issue is solved. Turns out the problem was due to Pytorch 1.0.0. No problem with Pytorch 0.4.1. though.
I'm trying to run the optimization example with non-linear constraints shown here
https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html
>>> def cons_f(x):
... return [x[0]**2 + x[1], x[0]**2 - x[1]]
>>> def cons_J(x):
... return [[2*x[0], 1], [2*x[0], -1]]
>>> def cons_H(x, v):
... return v[0]*np.array([[2, 0], [0, 0]]) + v[1]*np.array([[2, 0], [0, 0]])
>>> from scipy.optimize import NonlinearConstraint
>>> nonlinear_constraint = NonlinearConstraint(cons_f, -np.inf, 1, jac=cons_J, hess=cons_H)
But when I try to import NonlinearConstraint this is what I get
ImportError: cannot import name NonlinearConstraint
I'm running scipy v.1.0.0
>>> import scipy
>>> print scipy.__version__
1.0.0
Any suggestions? Thanks in advance for your help
You will need scipy >= 1.1 or a master-branch based install!
As 1.1 was released recently (05.05.18), there are chances for binary-builds (depends a bit on how you use scipy).
Compare 1.1's optimize/init.py:
...
from ._lsq import least_squares, lsq_linear
from ._constraints import (NonlinearConstraint,
LinearConstraint,
Bounds)
from ._hessian_update_strategy import HessianUpdateStrategy, BFGS, SR1
__all__ = [s for s in dir() if not s.startswith('_')]
...
with 1.0.1's optimize/init.py:
...
from ._lsq import least_squares, lsq_linear
__all__ = [s for s in dir() if not s.startswith('_')]
...
More indications are available in the 1.1 release-text:
scipy.optimize improvements
The method trust-constr has been added to scipy.optimize.minimize. The
method switches between two implementations depending on the problem
definition. For equality constrained problems it is an implementation of
a trust-region sequential quadratic programming solver and, when
inequality constraints are imposed, it switches to a trust-region
interior point method. Both methods are appropriate for large scale
problems. Quasi-Newton options BFGS and SR1 were implemented and can be
used to approximate second order derivatives for this new method. Also,
finite-differences can be used to approximate either first-order or
second-order derivatives.
which is actually the solver introducing those abstractions.
Additionally, optimize/_constraints.py does not exist in 1.01.
I've been working with the Tensorflow Object Detection API - In my case, I'm attempting to detect vehicles in still images using the kitti-trained model (faster_rcnn_resnet101_kitti_2018_01_28) from the model zoo and I am using code modified from the object_detection_tutorial jupyter notebook included in the github repository .
I have included my modified code below but am finding the same results with the original notebook from github.
When running on a jupyter notebook server on an Amazon AWS g3x4large (GPU) instance with the deep learning AMI, it takes just shy of 4 seconds to process a single image. The time for the inference function is 1.3-1.5 seconds (see code below) - which seems ABNORMALLY high for the reported inference times for the model (20ms). While I don't expect to hit the reported mark, my times seem out of line and are impractical for my needs. I'm looking at processing 1-million+ images at a time and can't afford 46 days of processing time. Given that the model is used on video frame captures....I would think it should be possible to cut time per image to under 1 second, at least.
My questions are:
1) What explanations/solutions exist to reduce inference time?
2) Is 1.5 seconds to convert an image to a numpy (prior to processing) out-of-line?
3) If this is the best performance I can expect, how much increase in time could I hope to gain from reworking the model to batch process images?
Thanks for any help!
Code from python notebook:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import json
import collections
import os.path
import datetime
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
# This is needed to display the images.
get_ipython().magic('matplotlib inline')
#Setup variables
PATH_TO_TEST_IMAGES_DIR = 'test_images'
MODEL_NAME = 'faster_rcnn_resnet101_kitti_2018_01_28'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'kitti_label_map.pbtxt')
NUM_CLASSES = 2
from utils import label_map_util
from utils import visualization_utils as vis_util
def get_scores(
boxes,
classes,
scores,
category_index,
min_score_thresh=.5
):
import collections
# Create a display string (and color) for every box location, group any boxes
# that correspond to the same location.
box_to_display_str_map = collections.defaultdict(list)
for i in range(boxes.shape[0]):
if scores is None or scores[i] > min_score_thresh:
box = tuple(boxes[i].tolist())
if scores is None:
box_to_color_map[box] = groundtruth_box_visualization_color
else:
display_str = ''
if classes[i] in category_index.keys():
class_name = category_index[classes[i]]['name']
else:
class_name = 'N/A'
display_str = str(class_name)
if not display_str:
display_str = '{}%'.format(int(100*scores[i]))
else:
display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
box_to_display_str_map[i].append(display_str)
return box_to_display_str_map
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
def run_inference_for_single_image(image, graph):
with graph.as_default():
with tf.Session() as sess:
# Get handles to input and output tensors
ops = tf.get_default_graph().get_operations()
all_tensor_names = {output.name for op in ops for output in op.outputs}
tensor_dict = {}
for key in [
'num_detections', 'detection_boxes', 'detection_scores',
'detection_classes', 'detection_masks'
]:
tensor_name = key + ':0'
if tensor_name in all_tensor_names:
tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
tensor_name)
if 'detection_masks' in tensor_dict:
# The following processing is only for single image
detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
# Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
detection_masks, detection_boxes, image.shape[0], image.shape[1])
detection_masks_reframed = tf.cast(
tf.greater(detection_masks_reframed, 0.5), tf.uint8)
# Follow the convention by adding back the batch dimension
tensor_dict['detection_masks'] = tf.expand_dims(
detection_masks_reframed, 0)
image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')
# Run inference
output_dict = sess.run(tensor_dict,
feed_dict={image_tensor: np.expand_dims(image, 0)})
# all outputs are float32 numpy arrays, so convert types as appropriate
output_dict['num_detections'] = int(output_dict['num_detections'][0])
output_dict['detection_classes'] = output_dict[
'detection_classes'][0].astype(np.uint8)
output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
output_dict['detection_scores'] = output_dict['detection_scores'][0]
if 'detection_masks' in output_dict:
output_dict['detection_masks'] = output_dict['detection_masks'][0]
return output_dict
#get list of paths
exten='.jpg'
TEST_IMAGE_PATHS=[]
for dirpath, dirnames, files in os.walk(PATH_TO_TEST_IMAGES_DIR):
for name in files:
if name.lower().endswith(exten):
#print(os.path.join(dirpath,name))
TEST_IMAGE_PATHS.append(os.path.join(dirpath,name))
print((len(TEST_IMAGE_PATHS), 'Images To Process'))
#load model graph for inference
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
#setup class labeling parameters
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
#placeholder for timings
myTimings=[]
myX = 1
myResults = collections.defaultdict(list)
for image_path in TEST_IMAGE_PATHS:
if os.path.exists(image_path):
print(myX,"--------------------------------------",datetime.datetime.time(datetime.datetime.now()))
print(myX,"Image:", image_path)
myTimings.append((myX,"Image", image_path))
print(myX,"Open:",datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Open",datetime.datetime.time(datetime.datetime.now()).__str__()))
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
print(myX,"Numpy:",datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Numpy",datetime.datetime.time(datetime.datetime.now()).__str__()))
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
print(myX,"Expand:",datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Expand",datetime.datetime.time(datetime.datetime.now()).__str__()))
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
print(myX,"Detect:",datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Detect",datetime.datetime.time(datetime.datetime.now()).__str__()))
output_dict = run_inference_for_single_image(image_np, detection_graph)
# Visualization of the results of a detection.
print(myX,"Export:",datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Export",datetime.datetime.time(datetime.datetime.now()).__str__()))
op=get_scores(
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
min_score_thresh=.2)
myResults[image_path].append(op)
print(myX,"Done:", datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Done", datetime.datetime.time(datetime.datetime.now()).__str__()))
myX= myX + 1
#save results
with open((OUTPUTS_BASENAME+'_Results.json'), 'w') as fout:
json.dump(myResults, fout)
with open((OUTPUTS_BASENAME+'_Timings.json'), 'w') as fout:
json.dump(myTimings, fout)
Example Of Timings:
[1, "Image", "test_images/DE4T_11Jan2018/MFDC4612.JPG"]
[1, "Open", "19:20:08.029423"]
[1, "Numpy", "19:20:08.052679"]
[1, "Expand", "19:20:09.977166"]
[1, "Detect", "19:20:09.977250"]
[1, "Export", "19:23:13.902443"]
[1, "Done", "19:23:13.903012"]
[2, "Image", "test_images/DE4T_11Jan2018/MFDC4616.JPG"]
[2, "Open", "19:23:13.903885"]
[2, "Numpy", "19:23:13.906320"]
[2, "Expand", "19:23:15.756308"]
[2, "Detect", "19:23:15.756597"]
[2, "Export", "19:23:17.153233"]
[2, "Done", "19:23:17.153699"]
[3, "Image", "test_images/DE4T_11Jan2018/MFDC4681.JPG"]
[3, "Open", "19:23:17.154510"]
[3, "Numpy", "19:23:17.156576"]
[3, "Expand", "19:23:19.012935"]
[3, "Detect", "19:23:19.013013"]
[3, "Export", "19:23:20.323839"]
[3, "Done", "19:23:20.324307"]
[4, "Image", "test_images/DE4T_11Jan2018/MFDC4697.JPG"]
[4, "Open", "19:23:20.324791"]
[4, "Numpy", "19:23:20.327136"]
[4, "Expand", "19:23:22.175578"]
[4, "Detect", "19:23:22.175658"]
[4, "Export", "19:23:23.472040"]
[4, "Done", "19:23:23.472297"]
1) What you can do is load the video directly instead of images, then change "run_inference_for_single_image()" to create the session once and load the images/video in it (re-creating the graph is very slow). Furthermore, you can edit the pipeline config file to reduce the number of proposals, which will directly speedup inference. Note you have to re-export the graph afterwards (https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md). Batch also helps (though I am sorry, I forgot by how much) and finally, you can employ multiprocessing to offload CPU specific operations (drawing bounding boxes, loading data) to utilize the GPU better.
2) Is 1.5 seconds to convert an image to a numpy (prior to processing) out-of-line <- yes, that is insanely slow and there is plenty of room for improvement.
3) While I don't know the exact gpu at AWS (k80?), you should be able to get over 10fps on a geforce 1080TI with all fixes, which is in line with the 79ms time they reported (where did you get 20ms for faster-rcnn_resnet_101?? )
You could also try OpenVINO for better performance of the inference. It optimizes the inference time by e.g. graph pruning and fusing some operations. OpenVINO is optimized for Intel hardware but it should work with any CPU (even with Cloud).
Here are some performance benchmarks for the Faster RCNN Resnet model and various CPUs.
It's rather straightforward to convert the Tensorflow model to OpenVINO unless you have fancy custom layers. The full tutorial on how to do it can be found here. Some snippets below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (just change data_type). Run in the command line:
mo --saved_model_dir "model" --input_shape "[1, 3, 224, 224]" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
There is even [OpenVINO Model Server][5] which is very similar to Tensorflow Serving.
Disclaimer: I work on OpenVINO.
I have built the C API by building the libtensorflow.so target. I want to load a pre-trained model with and run inference on it to make predictions. I was told I can do this by including the 'c_api.h' header file (along with copying that file plus 'libtensorflow.so' to the appropriate place), however, I had no luck finding any examples on that on the web. All I could find are examples which use the Bazel build system whereas I want to use another build system and use TensorFlow as a library. Can somebody help me with an example on how to import either a) a meta graph file; b) a protobuf graph file plus a checkpoint file, to make predictions? A C++ equivalent of the Python file below and built with g++?
#!/usr/bin/env python
import tensorflow as tf
import numpy as np
with tf.Session() as sess:
saver = tf.train.import_meta_graph('./metagraph.meta')
saver.restore(sess, './checkpoint.ckpt')
x = tf.get_collection("x")[0]
yhat = tf.get_collection("yhat")[0]
print sess.run(yhat, feed_dict={x : np.array([[2, 3], [4, 5]])})
Thanks in Advance!
p.s.: For the sake of completeness I have did the following to build the files:
#!/usr/bin/env python
import tensorflow as tf
import numpy as np
x = tf.placeholder(tf.float32, shape=[None, 2], name='x')
tf.add_to_collection("x", x)
y = tf.placeholder(tf.float32, shape=[None, 1], name='y')
w = tf.Variable(np.array([[10.0], [100.0]]), dtype=tf.float32, name='w')
b = tf.Variable(0.0, dtype=tf.float32, name='b')
yhat = tf.add(tf.matmul(x, w), b)
tf.add_to_collection("yhat", yhat)
mse_loss = tf.sqrt(tf.reduce_mean(tf.square(tf.sub(y, yhat))))
step_size = tf.constant(0.01)
optimizer = tf.train.GradientDescentOptimizer(step_size)
init_op = tf.initialize_all_variables()
train_op = optimizer.minimize(mse_loss)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(init_op)
for i in xrange(10000):
train_x = np.random.random([100, 2]) * 10
train_y = np.dot(train_x, np.array([[100.0], [10.0]])) + 1.0
sess.run(train_op, feed_dict={x : train_x, y : train_y})
print sess.run(w)
print sess.run(b)
saver.save(sess, './checkpoint.ckpt')
saver.export_meta_graph('./metagraph.meta')
tf.train.write_graph(sess.graph_def, './', 'graph')
I used Eclipse and added c_api.h to my project file and libtensorflow.so to /usr/local/bin. I then added the reference to the libtensorflow shared object to libraries on my GCC C++ Linker, finally created a simple program.
#include <iostream>
#include "c_api.h"
using namespace std;
int main() {
cout << TF_Version();
return 0;
}
This then allowed me to compile and use Tensorflow functions, including those that you want.
My goal is to visualize the results of a gradient boosting classifier.
from sklearn.ensemble import GradientBoostingClassifier
clf_gbc = GradientBoostingClassifier(random_state=42)
pipe = Pipeline(steps=[('SELECT', selector), ('GBC', clf_gbc)])
parameters_gbc = {'GBC__min_samples_split': [1, 5, 10, 15, 20, 25, 30],
'GBC__max_depth': [3,4,5,6,7,8],
'GBC__min_samples_leaf': [1, 5, 10, 15, 25, 30]}
grid = GridSearchCV(estimator=pipe, param_grid=parameters_gbc, cv=cv, scoring = 'f1')
grid.fit(features_train, labels_train)
clf=grid.best_estimator_
import pydotplus
dot_data = tree.export_graphviz(clf, out_file=None)
graph = pydotplus.graph_from_dot_data(dot_data)
graph.write_pdf("gbc.pdf")
I get the error ImportError: No module named pydotplus. I installed pydotplus using pip install pydotplus.
Any help on this or on how to visualize the results of a gradient boosting classifier would be appreciated!