Making predictions with TensorFlow trained model and C API - c++

I have built the C API by building the libtensorflow.so target. I want to load a pre-trained model with and run inference on it to make predictions. I was told I can do this by including the 'c_api.h' header file (along with copying that file plus 'libtensorflow.so' to the appropriate place), however, I had no luck finding any examples on that on the web. All I could find are examples which use the Bazel build system whereas I want to use another build system and use TensorFlow as a library. Can somebody help me with an example on how to import either a) a meta graph file; b) a protobuf graph file plus a checkpoint file, to make predictions? A C++ equivalent of the Python file below and built with g++?
#!/usr/bin/env python
import tensorflow as tf
import numpy as np
with tf.Session() as sess:
saver = tf.train.import_meta_graph('./metagraph.meta')
saver.restore(sess, './checkpoint.ckpt')
x = tf.get_collection("x")[0]
yhat = tf.get_collection("yhat")[0]
print sess.run(yhat, feed_dict={x : np.array([[2, 3], [4, 5]])})
Thanks in Advance!
p.s.: For the sake of completeness I have did the following to build the files:
#!/usr/bin/env python
import tensorflow as tf
import numpy as np
x = tf.placeholder(tf.float32, shape=[None, 2], name='x')
tf.add_to_collection("x", x)
y = tf.placeholder(tf.float32, shape=[None, 1], name='y')
w = tf.Variable(np.array([[10.0], [100.0]]), dtype=tf.float32, name='w')
b = tf.Variable(0.0, dtype=tf.float32, name='b')
yhat = tf.add(tf.matmul(x, w), b)
tf.add_to_collection("yhat", yhat)
mse_loss = tf.sqrt(tf.reduce_mean(tf.square(tf.sub(y, yhat))))
step_size = tf.constant(0.01)
optimizer = tf.train.GradientDescentOptimizer(step_size)
init_op = tf.initialize_all_variables()
train_op = optimizer.minimize(mse_loss)
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(init_op)
for i in xrange(10000):
train_x = np.random.random([100, 2]) * 10
train_y = np.dot(train_x, np.array([[100.0], [10.0]])) + 1.0
sess.run(train_op, feed_dict={x : train_x, y : train_y})
print sess.run(w)
print sess.run(b)
saver.save(sess, './checkpoint.ckpt')
saver.export_meta_graph('./metagraph.meta')
tf.train.write_graph(sess.graph_def, './', 'graph')

I used Eclipse and added c_api.h to my project file and libtensorflow.so to /usr/local/bin. I then added the reference to the libtensorflow shared object to libraries on my GCC C++ Linker, finally created a simple program.
#include <iostream>
#include "c_api.h"
using namespace std;
int main() {
cout << TF_Version();
return 0;
}
This then allowed me to compile and use Tensorflow functions, including those that you want.

Related

Strange behavior of Inception_v3

I am trying to create a generative network based on the pre-trained Inception_v3.
1) I fix all the weights in the model
2) create a Variable whose size is (2, 3, 299, 299)
3) create targets of size (2, 1000) that I want my final layer activations to become as close as possible to by optimizing the Variable.
(I do not set the batchsize of 1, because unlike VGG16, Inception_v3 doesn't take batchsize=1, but that's not the point).
The following code should work, but gives me the error: «RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation».
# minimalist code with Inception_v3 that throws the error:
import torch
from torch.autograd import Variable
import torch.optim as optim
import torch.nn as nn
import torchvision
torch.set_default_tensor_type('torch.FloatTensor')
Iv3 = torchvision.models.inception_v3(pretrained=True)
for i in Iv3.parameters():
i.requires_grad = False
criterion = nn.CrossEntropyLoss()
x = Variable(torch.randn(2, 3, 299, 299), requires_grad=True)
target = torch.empty(2, dtype=torch.long).random_(1000)
output = Iv3(x)
loss = criterion(output[0], target)
loss.backward()
print(x.grad)
This is very strange, because if I do the same thing with VGG16, everything works fine:
# minimalist working code with VGG16:
import torch
from torch.autograd import Variable
import torch.optim as optim
import torch.nn as nn
import torchvision
# torch.cuda.empty_cache()
# vgg16 = torchvision.models.vgg16(pretrained=True).cuda()
# torch.set_default_tensor_type('torch.cuda.FloatTensor')
torch.set_default_tensor_type('torch.FloatTensor')
vgg16 = torchvision.models.vgg16(pretrained=True)
for i in vgg16.parameters():
i.requires_grad = False
criterion = nn.CrossEntropyLoss()
x = Variable(torch.randn(2, 3, 229, 229), requires_grad=True)
target = torch.empty(2, dtype=torch.long).random_(1000)
output = vgg16(x)
loss = criterion(output, target)
loss.backward()
print(x.grad)
Please help.
Thanks to #iacolippo the issue is solved. Turns out the problem was due to Pytorch 1.0.0. No problem with Pytorch 0.4.1. though.

Saving data from traceplot in PyMC3

Below is the code for a simple Bayesian Linear regression. After I obtain the trace and the plots for the parameters, is there any way in which I can save the data that created the plots in a file so that if I need to plot it again I can simply plot it from the data in the file rather than running the whole simulation again?
import pymc3 as pm
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0,9,5)
y = 2*x + 5
yerr=np.random.rand(len(x))
def soln(x, p1, p2):
return p1+p2*x
with pm.Model() as model:
# Define priors
intercept = pm.Normal('Intercept', 15, sd=5)
slope = pm.Normal('Slope', 20, sd=5)
# Model solution
sol = soln(x, intercept, slope)
# Define likelihood
likelihood = pm.Normal('Y', mu=sol,
sd=yerr, observed=y)
# Sampling
trace = pm.sample(1000, nchains = 1)
pm.traceplot(trace)
print pm.summary(trace, ['Slope'])
print pm.summary(trace, ['Intercept'])
plt.show()
There are two easy ways of doing this:
Use a version after 3.4.1 (currently this means installing from master, with pip install git+https://github.com/pymc-devs/pymc3). There is a new feature that allows saving and loading traces efficiently. Note that you need access to the model that created the trace:
...
pm.save_trace(trace, 'linreg.trace')
# later
with model:
trace = pm.load_trace('linreg.trace')
Use cPickle (or pickle in python 3). Note that pickle is at least a little insecure, don't unpickle data from untrusted sources:
import cPickle as pickle # just `import pickle` on python 3
...
with open('trace.pkl', 'wb') as buff:
pickle.dump(trace, buff)
#later
with open('trace.pkl', 'rb') as buff:
trace = pickle.load(buff)
Update for someone like me who is still coming over to this question:
load_trace and save_trace functions were removed. Since version 4.0 even the deprecation waring for these functions were removed.
The way to do it is now to use arviz:
with model:
trace = pymc.sample(return_inferencedata=True)
trace.to_netcdf("filename.nc")
And it can be loaded with:
trace = arviz.from_netcdf("filename.nc")
This way works for me :
# saving trace
pm.save_trace(trace=trace_nb, directory=r"c:\Users\xxx\Documents\xxx\traces\trace_nb")
# loading saved traces
with model_nb:
t_nb = pm.load_trace(directory=r"c:\Users\xxx\Documents\xxx\traces\trace_nb")

Slow inference times for Tensorflow Object Detection API

I've been working with the Tensorflow Object Detection API - In my case, I'm attempting to detect vehicles in still images using the kitti-trained model (faster_rcnn_resnet101_kitti_2018_01_28) from the model zoo and I am using code modified from the object_detection_tutorial jupyter notebook included in the github repository .
I have included my modified code below but am finding the same results with the original notebook from github.
When running on a jupyter notebook server on an Amazon AWS g3x4large (GPU) instance with the deep learning AMI, it takes just shy of 4 seconds to process a single image. The time for the inference function is 1.3-1.5 seconds (see code below) - which seems ABNORMALLY high for the reported inference times for the model (20ms). While I don't expect to hit the reported mark, my times seem out of line and are impractical for my needs. I'm looking at processing 1-million+ images at a time and can't afford 46 days of processing time. Given that the model is used on video frame captures....I would think it should be possible to cut time per image to under 1 second, at least.
My questions are:
1) What explanations/solutions exist to reduce inference time?
2) Is 1.5 seconds to convert an image to a numpy (prior to processing) out-of-line?
3) If this is the best performance I can expect, how much increase in time could I hope to gain from reworking the model to batch process images?
Thanks for any help!
Code from python notebook:
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import json
import collections
import os.path
import datetime
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
# This is needed to display the images.
get_ipython().magic('matplotlib inline')
#Setup variables
PATH_TO_TEST_IMAGES_DIR = 'test_images'
MODEL_NAME = 'faster_rcnn_resnet101_kitti_2018_01_28'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'kitti_label_map.pbtxt')
NUM_CLASSES = 2
from utils import label_map_util
from utils import visualization_utils as vis_util
def get_scores(
boxes,
classes,
scores,
category_index,
min_score_thresh=.5
):
import collections
# Create a display string (and color) for every box location, group any boxes
# that correspond to the same location.
box_to_display_str_map = collections.defaultdict(list)
for i in range(boxes.shape[0]):
if scores is None or scores[i] > min_score_thresh:
box = tuple(boxes[i].tolist())
if scores is None:
box_to_color_map[box] = groundtruth_box_visualization_color
else:
display_str = ''
if classes[i] in category_index.keys():
class_name = category_index[classes[i]]['name']
else:
class_name = 'N/A'
display_str = str(class_name)
if not display_str:
display_str = '{}%'.format(int(100*scores[i]))
else:
display_str = '{}: {}%'.format(display_str, int(100*scores[i]))
box_to_display_str_map[i].append(display_str)
return box_to_display_str_map
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
def run_inference_for_single_image(image, graph):
with graph.as_default():
with tf.Session() as sess:
# Get handles to input and output tensors
ops = tf.get_default_graph().get_operations()
all_tensor_names = {output.name for op in ops for output in op.outputs}
tensor_dict = {}
for key in [
'num_detections', 'detection_boxes', 'detection_scores',
'detection_classes', 'detection_masks'
]:
tensor_name = key + ':0'
if tensor_name in all_tensor_names:
tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
tensor_name)
if 'detection_masks' in tensor_dict:
# The following processing is only for single image
detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
# Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
detection_masks, detection_boxes, image.shape[0], image.shape[1])
detection_masks_reframed = tf.cast(
tf.greater(detection_masks_reframed, 0.5), tf.uint8)
# Follow the convention by adding back the batch dimension
tensor_dict['detection_masks'] = tf.expand_dims(
detection_masks_reframed, 0)
image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')
# Run inference
output_dict = sess.run(tensor_dict,
feed_dict={image_tensor: np.expand_dims(image, 0)})
# all outputs are float32 numpy arrays, so convert types as appropriate
output_dict['num_detections'] = int(output_dict['num_detections'][0])
output_dict['detection_classes'] = output_dict[
'detection_classes'][0].astype(np.uint8)
output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
output_dict['detection_scores'] = output_dict['detection_scores'][0]
if 'detection_masks' in output_dict:
output_dict['detection_masks'] = output_dict['detection_masks'][0]
return output_dict
#get list of paths
exten='.jpg'
TEST_IMAGE_PATHS=[]
for dirpath, dirnames, files in os.walk(PATH_TO_TEST_IMAGES_DIR):
for name in files:
if name.lower().endswith(exten):
#print(os.path.join(dirpath,name))
TEST_IMAGE_PATHS.append(os.path.join(dirpath,name))
print((len(TEST_IMAGE_PATHS), 'Images To Process'))
#load model graph for inference
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
#setup class labeling parameters
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
#placeholder for timings
myTimings=[]
myX = 1
myResults = collections.defaultdict(list)
for image_path in TEST_IMAGE_PATHS:
if os.path.exists(image_path):
print(myX,"--------------------------------------",datetime.datetime.time(datetime.datetime.now()))
print(myX,"Image:", image_path)
myTimings.append((myX,"Image", image_path))
print(myX,"Open:",datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Open",datetime.datetime.time(datetime.datetime.now()).__str__()))
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
print(myX,"Numpy:",datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Numpy",datetime.datetime.time(datetime.datetime.now()).__str__()))
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
print(myX,"Expand:",datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Expand",datetime.datetime.time(datetime.datetime.now()).__str__()))
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
print(myX,"Detect:",datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Detect",datetime.datetime.time(datetime.datetime.now()).__str__()))
output_dict = run_inference_for_single_image(image_np, detection_graph)
# Visualization of the results of a detection.
print(myX,"Export:",datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Export",datetime.datetime.time(datetime.datetime.now()).__str__()))
op=get_scores(
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
min_score_thresh=.2)
myResults[image_path].append(op)
print(myX,"Done:", datetime.datetime.time(datetime.datetime.now()))
myTimings.append((myX,"Done", datetime.datetime.time(datetime.datetime.now()).__str__()))
myX= myX + 1
#save results
with open((OUTPUTS_BASENAME+'_Results.json'), 'w') as fout:
json.dump(myResults, fout)
with open((OUTPUTS_BASENAME+'_Timings.json'), 'w') as fout:
json.dump(myTimings, fout)
Example Of Timings:
[1, "Image", "test_images/DE4T_11Jan2018/MFDC4612.JPG"]
[1, "Open", "19:20:08.029423"]
[1, "Numpy", "19:20:08.052679"]
[1, "Expand", "19:20:09.977166"]
[1, "Detect", "19:20:09.977250"]
[1, "Export", "19:23:13.902443"]
[1, "Done", "19:23:13.903012"]
[2, "Image", "test_images/DE4T_11Jan2018/MFDC4616.JPG"]
[2, "Open", "19:23:13.903885"]
[2, "Numpy", "19:23:13.906320"]
[2, "Expand", "19:23:15.756308"]
[2, "Detect", "19:23:15.756597"]
[2, "Export", "19:23:17.153233"]
[2, "Done", "19:23:17.153699"]
[3, "Image", "test_images/DE4T_11Jan2018/MFDC4681.JPG"]
[3, "Open", "19:23:17.154510"]
[3, "Numpy", "19:23:17.156576"]
[3, "Expand", "19:23:19.012935"]
[3, "Detect", "19:23:19.013013"]
[3, "Export", "19:23:20.323839"]
[3, "Done", "19:23:20.324307"]
[4, "Image", "test_images/DE4T_11Jan2018/MFDC4697.JPG"]
[4, "Open", "19:23:20.324791"]
[4, "Numpy", "19:23:20.327136"]
[4, "Expand", "19:23:22.175578"]
[4, "Detect", "19:23:22.175658"]
[4, "Export", "19:23:23.472040"]
[4, "Done", "19:23:23.472297"]
1) What you can do is load the video directly instead of images, then change "run_inference_for_single_image()" to create the session once and load the images/video in it (re-creating the graph is very slow). Furthermore, you can edit the pipeline config file to reduce the number of proposals, which will directly speedup inference. Note you have to re-export the graph afterwards (https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/exporting_models.md). Batch also helps (though I am sorry, I forgot by how much) and finally, you can employ multiprocessing to offload CPU specific operations (drawing bounding boxes, loading data) to utilize the GPU better.
2) Is 1.5 seconds to convert an image to a numpy (prior to processing) out-of-line <- yes, that is insanely slow and there is plenty of room for improvement.
3) While I don't know the exact gpu at AWS (k80?), you should be able to get over 10fps on a geforce 1080TI with all fixes, which is in line with the 79ms time they reported (where did you get 20ms for faster-rcnn_resnet_101?? )
You could also try OpenVINO for better performance of the inference. It optimizes the inference time by e.g. graph pruning and fusing some operations. OpenVINO is optimized for Intel hardware but it should work with any CPU (even with Cloud).
Here are some performance benchmarks for the Faster RCNN Resnet model and various CPUs.
It's rather straightforward to convert the Tensorflow model to OpenVINO unless you have fancy custom layers. The full tutorial on how to do it can be found here. Some snippets below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (just change data_type). Run in the command line:
mo --saved_model_dir "model" --input_shape "[1, 3, 224, 224]" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, just use AUTO.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
There is even [OpenVINO Model Server][5] which is very similar to Tensorflow Serving.
Disclaimer: I work on OpenVINO.

Tensorflow C API segmentation fault

I used Keras to train a simple RNN with 2 layers of LSTM with dropout. I want to load the .pb graph in tensorflow C API and use it for later prediction, but I got segmentation fault. Later I found if I keep the network the same and only removing the dropout option and re-train it again, then everything runs OK. However I want to use the one with Dropout, because the accuracy is better in predicting test data. Some one with suggestions? There are so few examples for using tensorflow C API.
Here is where I got segmentation fault:
TF_SessionRun(session, NULL,
&inputs[0], &input_values[0], static_cast<int>(inputs.size()),
&outputs[0], &output_values[0], static_cast<int>(outputs.size()),
NULL, 0, NULL, status);
// Assign the values from the output tensor to a variable and iterate over them
ASSERT(!output_values.empty());
float* out_vals = static_cast<float*>(TF_TensorData(output_values[0]));
BTW, I used the following code from website to change from .mdl in Keras to .pb in tensorflow.
import tensorflow as tf
import sys
import numpy as np
# Create function to convert saved keras model to tensorflow graph
def convert_to_pb(weight_file,input_fld='',output_fld=''):
import os
import os.path as osp
from tensorflow.python.framework import graph_util
from tensorflow.python.framework import graph_io
from keras.models import load_model
from keras import backend as K
# weight_file is a .h5 keras model file
output_node_names_of_input_network = ["pred0"]
output_node_names_of_final_network = 'output_node'
# change filename to a .pb tensorflow file
output_graph_name = weight_file[:-3]+'pb'
weight_file_path = osp.join(input_fld, weight_file)
net_model = load_model(weight_file_path)
num_output = len(output_node_names_of_input_network)
pred = [None]*num_output
pred_node_names = [None]*num_output
for i in range(num_output):
pred_node_names[i] = output_node_names_of_final_network+str(i)
pred[i] = tf.identity(net_model.output[i], name=pred_node_names[i])
print('output nodes names are: ', pred_node_names)
sess = K.get_session()
constant_graph = graph_util.convert_variables_to_constants(sess, sess.graph.as_graph_def(), pred_node_names)
graph_io.write_graph(constant_graph, output_fld, output_graph_name, as_text=False)
print('saved the constant graph (ready for inference) at: ', osp.join(output_fld, output_graph_name))
return output_fld+output_graph_name
tfpath = convert_to_pb(sys.argv[1],'./','./')
print 'tfpath: ', tfpath
Then

name 'plot' is not defined

I installed succesfully scitools_no_easyviz from conda (I work on Spyder), but I cannot import plot. To be more specific, here's my code
from scitools.std import *
def f(t):
return t**2*exp(-t**2)
t = linspace(0, 3, 51)
y = f(t)
plot(t, y)
savefig('tmp1.pdf') # produce PDF
savefig('tmp1.png') # produce PNG
figure()
def f(t):
return t**2*exp(-t**2)
t = linspace(0, 3, 51)
y = f(t)
plot(t, y)
xlabel('t')
ylabel('y')
legend('t^2*exp(-t^2)')
axis([0, 3, -0.05, 0.6]) # [tmin, tmax, ymin, ymax]
title('My First Easyviz Demo')
figure()
plot(t, y)
xlabel('sss')
When I run the code, I get the following error
NameError: name 'plot' is not defined
What could be the problem?
Using import * is not considered a best practice, although very practical. Try importing the functions you need, such as:
from scitools.std import plot
Additionally, this way you will know if "plot" is valid when you import it along side any other function.
Ensure you have the dependencies installed in order to use the package as noted here at https://code.google.com/archive/p/scitools/wikis/Installation.wiki
Additionally, installed following these instruction latest package and your code runs perfectly well with it.