tensorflow and tflearn c++ API - c++

At first I am new on both tensorflow and python to start with.
I have a python code that contains a TFlearn DNN network. I need to convert that code to C++ to later on convert it into a library to be used in mobile application development.
I read about the C++ API for tensorflow (of which documentations are real vague and not clear). so I took the code line by line to try converting it.
The first step was loading the saved model that was was previously trained and saved in python (I don't need training to be done in c++ so just loading the tflearn model is enough)
The python code to save the file was as follows:
network = input_data(shape=[None, 100, 100, 1], name='input')
network = conv_2d(network, 32, 5, activation='relu')
network = avg_pool_2d(network, 2)
network = conv_2d(network, 64, 5, activation='relu')
network = avg_pool_2d(network, 2)
network = fully_connected(network, 128, activation='relu')
network = fully_connected(network, 64, activation='relu')
network = fully_connected(network, 2, activation='softmax',restore=False)
network = regression(network, optimizer='adam', learning_rate=0.0001,
loss='categorical_crossentropy', name='target')
model = tflearn.DNN(network, tensorboard_verbose=0)
model.fit(X, y.toarray(), n_epoch=3, validation_set=0.1, shuffle=True,
show_metric=True, batch_size=32, snapshot_step=100,
snapshot_epoch=False, run_id='model_finetuning')
model.save('model/my_model.tflearn')
To load the model python code was:
network = input_data(shape=[None, 100, 100, 1], name='input')
network = conv_2d(network, 32, 5, activation='relu')
network = avg_pool_2d(network, 2)
network = conv_2d(network, 64, 5, activation='relu')
network = avg_pool_2d(network, 2)
network = fully_connected(network, 128, activation='relu')
network = fully_connected(network, 64, activation='relu')
network = fully_connected(network, 2, activation='softmax')
network = regression(network, optimizer='adam', learning_rate=0.001,
loss='categorical_crossentropy', name='target')
model = tflearn.DNN(network, tensorboard_verbose=0)
model.load('model/my_model.tflearn')
and this code worked like a charm in python, yet the model save file was actually 4 files inside the model folder as follows:
model
|------------checkpoint
|------------my_model.tflearn.data-00000-of-00001
|------------my_model.tflearn.index
|------------my_model.tflearn.meta
now I come to the c++ part of it. After a lot of research I came up with the following code:
#include "tensorflow/core/public/session.h"
#include "tensorflow/core/platform/env.h"
#include <iostream>
using namespace tensorflow;
using namespace std;
int main()
{
Session* session;
Status status = NewSession(SessionOptions(), &session);
if (!status.ok())
{
cerr << status.ToString() << "\n";
return 1;
}
else
{
cout << "Session created successfully" << endl;
}
tensorflow::Tensor input_tensor(tensorflow::DT_FLOAT, tensorflow::TensorShape({1,100,100,1}));
GraphDef graph_def;
status = ReadBinaryProto(Env::Default(), "/home/user/PycharmProjects/untitled/model/my_model.tflearn", &graph_def);
if (!status.ok())
{
cerr << status.ToString() << "\n";
return 1;
}
else
{
cout << "Read Model File" << endl;
}
return 0;
}
And now for my questions, the code compile correctly (with no faults) using the bazel build (as described in the "Short" explanation of tensorflow C++ API. but when I tried to run it the model file is not found.
Is what I did in c++ correct? Is this the correct way to load the saved model (which I don't know why 4 files are generated during save)? or is there another approach to do it?
Is there any "Full and descent" manual for the tensorflow c++ API?

If you just want to load an already trained model, a c++ loader already exists. Directly on tensorflow look here and here
Patwie also got a really good example for loading a saved model Code from Patwie.
tensorflow::Status LoadModel(tensorflow::Session *sess, std::string graph_fn, std::string checkpoint_fn = "") {
tensorflow::Status status;
// Read in the protobuf graph we exported
tensorflow::MetaGraphDef graph_def;
status = ReadBinaryProto(tensorflow::Env::Default(), graph_fn, &graph_def);
if (status != tensorflow::Status::OK())
return status;
// create the graph in the current session
status = sess->Create(graph_def.graph_def());
if (status != tensorflow::Status::OK())
return status;
// restore model from checkpoint, iff checkpoint is given
if (checkpoint_fn != "") {
const std::string restore_op_name = graph_def.saver_def().restore_op_name();
const std::string filename_tensor_name = graph_def.saver_def().filename_tensor_name();
tensorflow::Tensor filename_tensor(tensorflow::DT_STRING, tensorflow::TensorShape());
filename_tensor.scalar<std::string>()() = checkpoint_fn;
tensor_dict feed_dict = {{filename_tensor_name, filename_tensor}};
status = sess->Run(feed_dict,
{},
{restore_op_name},
nullptr);
if (status != tensorflow::Status::OK())
return status;
} else {
// virtual Status Run(const std::vector<std::pair<string, Tensor> >& inputs,
// const std::vector<string>& output_tensor_names,
// const std::vector<string>& target_node_names,
// std::vector<Tensor>* outputs) = 0;
status = sess->Run({}, {}, {"init"}, nullptr);
if (status != tensorflow::Status::OK())
return status;
}
Unfortunatly there isn't a "full and descent" manual for tensorflow c++ API yet (AFAIK)

I wrote the steps how to save a TFLearn checkpoint correctly:
...
model = tflearn.DNN(network)
class MonitorCallback(tflearn.callbacks.Callback):
# Create an other session to clone the model and avoid effecting the training process
with tf.Session() as second_sess:
# Clone the current model
model2 = model
# Delete the training ops
del tf.get_collection_ref(tf.GraphKeys.TRAIN_OPS)[:]
# Save the checkpoint
model2.save('checkpoint_'+str(training_state.step)+".ckpt")
# Write a text protobuf to have a human-readable form of the model
tf.train.write_graph(second_sess.graph_def, '.', 'checkpoint_'+str(training_state.step)+".pbtxt", as_text = True)
return
mycb = MonitorCallback()
model.fit({'input': X}, {'target': Y}, n_epoch=500, run_id="mymodel", callbacks=mycb)
...
After you have the checkpoint, you can load in C++:
https://github.com/kecsap/tensorflow_cpp_packaging#load-a-checkpoint-in-c
...and you it for inference:
https://github.com/kecsap/tensorflow_cpp_packaging#inference-in-c
You can also find example code for C and how to freeze a model then load in C++.

Related

OpenVINO - Image classification

I tried to use OpenVINO Inference Engine to accelerate my DL inference. It works with one image. But I want to create a batch of two images and then do a inference.
This is my code:
InferenceEngine::Core core;
InferenceEngine::CNNNetwork network = core.ReadNetwork("path/to/model.xml");
InferenceEngine::InputInfo::Ptr input_info = network.getInputsInfo().begin()->second;
std::string input_name = network.getInputsInfo().begin()->first;
InferenceEngine::DataPtr output_info = network.getOutputsInfo().begin()->second;
std::string output_name = network.getOutputsInfo().begin()->first;
InferenceEngine::ExecutableNetwork executableNetwork = core.LoadNetwork(network, "CPU");
InferenceEngine::InferRequest inferRequest = executableNetwork.CreateInferRequest();
std::string input_image_01 = "path/to/image_01.png";
cv::Mat image_01 = cv::imread(input_image_01 );
InferenceEngine::Blob::Ptr imgBlob_01 = wrapMat2Blob(image_01);
std::string input_image_02 = "path/to/image_02.png";
cv::Mat image_02 = cv::imread(input_image_02 );
InferenceEngine::Blob::Ptr imgBlob_02 = wrapMat2Blob(image_02);
InferenceEngine::BlobMap imgBlobMap;
std::pair<std::string, InferenceEngine::Blob::Ptr> pair01(input_image_01, imgBlob_01);
imgBlobMap.insert(pair01);
std::pair<std::string, InferenceEngine::Blob::Ptr> pair02(input_image_02, imgBlob_02);
imgBlobMap.insert(pair02);
inferRequest.SetInput(imgBlobMap);
inferRequest.StartAsync();
inferRequest.Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY);
InferenceEngine::Blob::Ptr output = inferRequest.GetBlob(output_name);
std::vector<unsigned> class_results;
ClassificationResult cls(output, {"x", "y"}, 2, 3);
class_results = cls.getResults();
Unfortunately, I received the following error message from the command
inferRequest.SetInput(imgBlobMap);
[NOT_FOUND] Failed to find input or output with name: 'path/to/image_02.png'
C:\j\workspace\private-ci\ie\build-windows-vs2019#2\b\repos\openvino\inference-engine\src\plugin_api\cpp_interfaces/impl/ie_infer_request_internal.hpp:303
C:\Program Files (x86)\Intel\openvino_2021.3.394\inference_engine\include\details/ie_exception_conversion.hpp:66
How can I create a batch of more than image, do a inference and get the information for classification class and confidence? Is the confidence and class located in the received variable of GetBlob()? Should I need the call of ClassificationResult cls(output, {"x", "y"}, 2, 3);?
I'd recommend you to review Using Shape Inference article from OpenVINO online documentation to be aware of the limitations of using batches. It also refers to Open Model Zoo smart_classroom_demo, where dynamic batching is used in processing multiple previously detected faces. Basically, when you have batch enabled in the model, the memory buffer of your input blob will be allocated to have a room for all batch of images, and your responsibility is to fill data in input blob for each image in batch from your data. You may take a look at function CnnDLSDKBase::InferBatch, of smart_classroom_demo, which is located at file smart_classroom_demo/cpp/src/cnn.cpp, line 51. As you can see, in the loop over num_imgs an auxiliary function matU8ToBlob fills the input blob with data for current_batch_size of images, then set batch size for infer request and run inference.
for (size_t batch_i = 0; batch_i < num_imgs; batch_i += batch_size) {
const size_t current_batch_size = std::min(batch_size, num_imgs - batch_i);
for (size_t b = 0; b < current_batch_size; b++) {
matU8ToBlob<uint8_t>(frames[batch_i + b], input, b);
}
if (config_.max_batch_size != 1)
infer_request_.SetBatch(current_batch_size);
infer_request_.Infer();
there is a similar sample using the batch inputs as input into model within the OpenVINO. You can refer to below link.
https://github.com/openvinotoolkit/openvino/blob/ae2913d3b5970ce0d3112cc880d03be1708f13eb/inference-engine/samples/hello_nv12_input_classification/main.cpp#L236

Tensorflow lite model output always gives same output no matter the input

My goal is to run a Keras model I have made in my ESP32 microcontroller. I have the libraries all working correctly.
I have created a Keras model using google Collab that looks to be working fine when I give it random test data within google Collab. The model has two input features and 4 different outputs.(a multiple-output regression model)
However, when I export and load the model into my c++ application in the ESP32 it does not matter what the inputs are, it always predicts the same output.
I have based myself in this code in order to load and run the model in c++ : https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/magic_wand/main_functions.cc
And this is my version of the code
namespace {
tflite::ErrorReporter* error_reporter = nullptr;
const tflite::Model* model = nullptr;
tflite::MicroInterpreter* interpreter = nullptr;
TfLiteTensor* input = nullptr;
TfLiteTensor* output = nullptr;
int inference_count = 0;
// Create an area of memory to use for input, output, and intermediate arrays.
// Finding the minimum value for your model may require some trial and error.
constexpr int kTensorArenaSize = 2 * 2048;
uint8_t tensor_arena[kTensorArenaSize];
} // namespace
static void setup(){
static tflite::MicroErrorReporter micro_error_reporter;
error_reporter = &micro_error_reporter;
model = tflite::GetModel(venti_model);
if (model->version() != TFLITE_SCHEMA_VERSION) {
error_reporter->Report(
"Model provided is schema version %d not equal "
"to supported version %d.",
model->version(), TFLITE_SCHEMA_VERSION);
return;
}
// This pulls in all the operation implementations we need.
// NOLINTNEXTLINE(runtime-global-variables)
static tflite::ops::micro::AllOpsResolver resolver;
// Build an interpreter to run the model with.
static tflite::MicroInterpreter static_interpreter(
model, resolver, tensor_arena, kTensorArenaSize, error_reporter);
interpreter = &static_interpreter;
// Allocate memory from the tensor_arena for the model's tensors.
TfLiteStatus allocate_status = interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
error_reporter->Report("AllocateTensors() failed");
return;
}
// Obtain pointers to the model's input and output tensors.
input = interpreter->input(0);
ESP_LOGI("TENSOR SETUP", "input size = %d", input->dims->size);
ESP_LOGI("TENSOR SETUP", "input size in bytes = %d", input->bytes);
ESP_LOGI("TENSOR SETUP", "Is input float32? = %s", (input->type == kTfLiteFloat32) ? "true" : "false");
ESP_LOGI("TENSOR SETUP", "Input data dimentions = %d",input->dims->data[1]);
output = interpreter->output(0);
ESP_LOGI("TENSOR SETUP", "output size = %d", output->dims->size);
ESP_LOGI("TENSOR SETUP", "output size in bytes = %d", output->bytes);
ESP_LOGI("TENSOR SETUP", "Is input float32? = %s", (output->type == kTfLiteFloat32) ? "true" : "false");
ESP_LOGI("TENSOR SETUP", "Output data dimentions = %d",output->dims->data[1]);
}
static bool setupDone = true;
static void the_ai_algorithm_task(){
/* First time task is init setup the ai model */
if(setupDone == false){
setup();
setupDone = true;
}
/* Load the input data i.e deltaT1 and deltaT2 */
//int i = 0;
input->data.f[0] = 2.0; /* Different values dont change the output */
input->data.f[1] = 3.2;
// Run inference, and report any error
TfLiteStatus invoke_status = interpreter->Invoke();
if (invoke_status != kTfLiteOk) {
error_reporter->Report("Invoke failed");
// return;
}
/* Retrieve outputs Fan , AC , Vent 1 , Vent 2 */
double fan = output->data.f[0];
double ac = output->data.f[1];
double vent1 = output->data.f[2];
double vent2 = output->data.f[3];
ESP_LOGI("TENSOR SETUP", "fan = %lf", fan);
ESP_LOGI("TENSOR SETUP", "ac = %lf", ac);
ESP_LOGI("TENSOR SETUP", "vent1 = %lf", vent1);
ESP_LOGI("TENSOR SETUP", "vent2 = %lf", vent2);
}
The model seems to load ok as the dimensions and sizes are correct. But the output is always the same 4 values
fan = 0.0087
ac = 0.54
vent1 = 0.73
vent2 = 0.32
Any idea on what can be going wrong? Is it something about my model or am I just not using the model correctly in my c++ application?
Could you refer to the "Test the model" section here - https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/hello_world/train/train_hello_world_model.ipynb#scrollTo=f86dWOyZKmN9 and verify if the TFLite model is producing the correct results?
You can find the issue by testing the 1) TFModel (which you have done already) 2) TFLite model and then 3) TFLite Micro Model (C Source File)
You also need to verify that the inputs passed to the model are of the same type and distribution. eg: If your TFModel was trained on Images in the range 0-255, then you need to pass this to the TFLite and TFLite Micro Model. Instead, if you trained the model using preprocessed data (0-255 get normalized to 0-1 during training), then you need to do the same and preprocess the data for the TFLite and TFLite Micro model.
I have found the issue and the answer.
It was not the C++ code, it was the model. Originally, I made my model with 3 hidden layers of 64, 20, and 8 (I am new to ML so I was only playing with random values) and it was giving me the issue.
To solve it I just changed the hidden layers to 32, 16, and 8 and the C++ code was outputting right values.

gRPC-only Tensorflow Serving client in C++

There seems to be a bit of information out there for creating a gRPC-only client in Python (and even a few other languages) and I was able to successfully get a working client that uses only gRPC in Python that works for our implementation.
What I can't seem to find is a case where someone has successfully written the client in C++.
The constraints of the task are as follows:
The build system cannot be bazel, because the final application already has its own build system.
The client cannot include Tensorflow (which requires bazel to build against in C++).
The application should use gRPC and not HTTP calls for speed.
The application ideally won't call Python or otherwise execute shell commands.
Given the above constraints, and assuming that I extracted and generated the gRPC stubs, is this even possible? If so, can an example be provided?
Turns out, this isn't anything new if you have already done it in Python. Assuming the model has been named "predict" and the input to the model is called "inputs," the following is the Python code:
import logging
import grpc
from grpc import RpcError
from types_pb2 import DT_FLOAT
from tensor_pb2 import TensorProto
from tensor_shape_pb2 import TensorShapeProto
from predict_pb2 import PredictRequest
from prediction_service_pb2_grpc import PredictionServiceStub
class ModelClient:
"""Client Facade to work with a Tensorflow Serving gRPC API"""
host = None
port = None
chan = None
stub = None
logger = logging.getLogger(__name__)
def __init__(self, name, dims, dtype=DT_FLOAT, version=1):
self.model = name
self.dims = [TensorShapeProto.Dim(size=dim) for dim in dims]
self.dtype = dtype
self.version = version
#property
def hostport(self):
"""A host:port string representation"""
return f"{self.host}:{self.port}"
def connect(self, host='localhost', port=8500):
"""Connect to the gRPC server and initialize prediction stub"""
self.host = host
self.port = int(port)
self.logger.info(f"Connecting to {self.hostport}...")
self.chan = grpc.insecure_channel(self.hostport)
self.logger.info("Initializing prediction gRPC stub.")
self.stub = PredictionServiceStub(self.chan)
def tensor_proto_from_measurement(self, measurement):
"""Pass in a measurement and return a tensor_proto protobuf object"""
self.logger.info("Assembling measurement tensor.")
return TensorProto(
dtype=self.dtype,
tensor_shape=TensorShapeProto(dim=self.dims),
string_val=[bytes(measurement)]
)
def predict(self, measurement, timeout=10):
"""Execute prediction against TF Serving service"""
if self.host is None or self.port is None \
or self.chan is None or self.stub is None:
self.connect()
self.logger.info("Creating request.")
request = PredictRequest()
request.model_spec.name = self.model
if self.version > 0:
request.model_spec.version.value = self.version
request.inputs['inputs'].CopyFrom(
self.tensor_proto_from_measurement(measurement))
self.logger.info("Attempting to predict against TF Serving API.")
try:
return self.stub.Predict(request, timeout=timeout)
except RpcError as err:
self.logger.error(err)
self.logger.error('Predict failed.')
return None
The following is a working (rough) C++ translation:
#include <iostream>
#include <memory>
#include <string>
#include <grpcpp/grpcpp.h>
#include "grpcpp/create_channel.h"
#include "grpcpp/security/credentials.h"
#include "google/protobuf/map.h"
#include "types.grpc.pb.h"
#include "tensor.grpc.pb.h"
#include "tensor_shape.grpc.pb.h"
#include "predict.grpc.pb.h"
#include "prediction_service.grpc.pb.h"
using grpc::Channel;
using grpc::ClientContext;
using grpc::Status;
using tensorflow::TensorProto;
using tensorflow::TensorShapeProto;
using tensorflow::serving::PredictRequest;
using tensorflow::serving::PredictResponse;
using tensorflow::serving::PredictionService;
typedef google::protobuf::Map<std::string, tensorflow::TensorProto> OutMap;
class ServingClient {
public:
ServingClient(std::shared_ptr<Channel> channel)
: stub_(PredictionService::NewStub(channel)) {}
// Assembles the client's payload, sends it and presents the response back
// from the server.
std::string callPredict(const std::string& model_name,
const float& measurement) {
// Data we are sending to the server.
PredictRequest request;
request.mutable_model_spec()->set_name(model_name);
// Container for the data we expect from the server.
PredictResponse response;
// Context for the client. It could be used to convey extra information to
// the server and/or tweak certain RPC behaviors.
ClientContext context;
google::protobuf::Map<std::string, tensorflow::TensorProto>& inputs =
*request.mutable_inputs();
tensorflow::TensorProto proto;
proto.set_dtype(tensorflow::DataType::DT_FLOAT);
proto.add_float_val(measurement);
proto.mutable_tensor_shape()->add_dim()->set_size(5);
proto.mutable_tensor_shape()->add_dim()->set_size(8);
proto.mutable_tensor_shape()->add_dim()->set_size(105);
inputs["inputs"] = proto;
// The actual RPC.
Status status = stub_->Predict(&context, request, &response);
// Act upon its status.
if (status.ok()) {
std::cout << "call predict ok" << std::endl;
std::cout << "outputs size is " << response.outputs_size() << std::endl;
OutMap& map_outputs = *response.mutable_outputs();
OutMap::iterator iter;
int output_index = 0;
for (iter = map_outputs.begin(); iter != map_outputs.end(); ++iter) {
tensorflow::TensorProto& result_tensor_proto = iter->second;
std::string section = iter->first;
std::cout << std::endl << section << ":" << std::endl;
if ("classes" == section) {
int titer;
for (titer = 0; titer != result_tensor_proto.int64_val_size(); ++titer) {
std::cout << result_tensor_proto.int64_val(titer) << ", ";
}
} else if ("scores" == section) {
int titer;
for (titer = 0; titer != result_tensor_proto.float_val_size(); ++titer) {
std::cout << result_tensor_proto.float_val(titer) << ", ";
}
}
std::cout << std::endl;
++output_index;
}
return "Done.";
} else {
std::cout << "gRPC call return code: " << status.error_code() << ": "
<< status.error_message() << std::endl;
return "RPC failed";
}
}
private:
std::unique_ptr<PredictionService::Stub> stub_;
};
Note that the dimensions here have been specified within the code instead of passed in.
Given the above class, execution can then be as follows:
int main(int argc, char** argv) {
float measurement[5*8*105] = { ... data ... };
ServingClient sclient(grpc::CreateChannel(
"localhost:8500", grpc::InsecureChannelCredentials()));
std::string model("predict");
std::string reply = sclient.callPredict(model, *measurement);
std::cout << "Predict received: " << reply << std::endl;
return 0;
}
The Makefile used was borrowed from the gRPC C++ examples, with the PROTOS_PATH variable set relative to the Makefile and the following build target (assuming the C++ application is named predict.cc):
predict: types.pb.o types.grpc.pb.o tensor_shape.pb.o tensor_shape.grpc.pb.o resource_handle.pb.o resource_handle.grpc.pb.o model.pb.o model.grpc.pb.o tensor.pb.o tensor.grpc.pb.o predict.pb.o predict.grpc.pb.o prediction_service.pb.o prediction_service.grpc.pb.o predict.o
$(CXX) $^ $(LDFLAGS) -o $#

Select specific gpu for the session of tensorflow c++ api

How could I ask tensorflow use specific gpu to do the inference?
Part of the source codes
std::unique_ptr<tensorflow::Session> session;
Status const load_graph_status = LoadGraph(graph_path, &session);
if (!load_graph_status.ok()) {
LOG(ERROR) << "LoadGraph ERROR!!!!"<< load_graph_status;
return -1;
}
std::vector<Tensor> resized_tensors;
Status const read_tensor_status = ReadTensorFromImageFile(image_path, &resized_tensors);
if (!read_tensor_status.ok()) {
LOG(ERROR) << read_tensor_status;
return -1;
}
std::vector<Tensor> outputs;
Status run_status = session->Run({{input_layer, resized_tensor}},
output_layer, {}, &outputs);
So far so good, but tensorflow always select the same gpu when I execute Run, do I have a way to specify which gpu to execute?
In case you need complete source codes, I placed them at pastebin
Edit : Looks like options.config.mutable_gpu_options()->set_visible_device_list("0") work, but I am not sure.
Turns out in the C++ API there are a series of (nested) structs: tensorflow::SessionOptions, tensorflow::ConfigProto, and tensorflow::GPUOptions. The latter contains a method called set_visible_device_list(::std::string&& value) which you can select the GPU you would like:
auto options = tensorflow::SessionOptions();
options.config.mutable_gpu_options()->set_visible_device_list("0");
// session_ is a unique_ptr to a tensorflow::Session
session_->reset(tensorflow::NewSession(options));
Similar to this (for memory usage restriction):
how to limit GPU usage in tensorflow (r1.1) with C++ API

TensorFlow 0.12 Model Files

I train a model and save it using:
saver = tf.train.Saver()
saver.save(session, './my_model_name')
Besides the checkpoint file, which simply contains pointers to the most recent checkpoints of the model, this creates the following 3 files in the current path:
my_model_name.meta
my_model_name.index
my_model_name.data-00000-of-00001
I wonder what each of these files contains.
I'd like to load this model in C++ and run the inference. The label_image example loads the model from a single .bp file using ReadBinaryProto(). I wonder how I can load it from these 3 files. What is the C++ equivalent of the following?
new_saver = tf.train.import_meta_graph('./my_model_name.meta')
new_saver.restore(session, './my_model_name')
What your saver creates is called "Checkpoint V2" and was introduced in TF 0.12.
I got it working quite nicely (though the docs on the C++ part are horrible, so it took me a day to solve). Some people suggest converting all variables to constants or freezing the graph, but none of these is actually needed.
Python part (saving)
with tf.Session() as sess:
tf.train.Saver(tf.trainable_variables()).save(sess, 'models/my-model')
If you create the Saver with tf.trainable_variables(), you can save yourself some headache and storage space. But maybe some more complicated models need all data to be saved, then remove this argument to Saver, just make sure you're creating the Saver after your graph is created. It is also very wise to give all variables/layers unique names, otherwise you can run in different problems.
C++ part (inference)
Note that checkpointPath isn't a path to any of the existing files, just their common prefix. If you mistakenly put there path to the .index file, TF won't tell you that was wrong, but it will die during inference due to uninitialized variables.
#include <tensorflow/core/public/session.h>
#include <tensorflow/core/protobuf/meta_graph.pb.h>
using namespace std;
using namespace tensorflow;
...
// set up your input paths
const string pathToGraph = "models/my-model.meta"
const string checkpointPath = "models/my-model";
...
auto session = NewSession(SessionOptions());
if (session == nullptr) {
throw runtime_error("Could not create Tensorflow session.");
}
Status status;
// Read in the protobuf graph we exported
MetaGraphDef graph_def;
status = ReadBinaryProto(Env::Default(), pathToGraph, &graph_def);
if (!status.ok()) {
throw runtime_error("Error reading graph definition from " + pathToGraph + ": " + status.ToString());
}
// Add the graph to the session
status = session->Create(graph_def.graph_def());
if (!status.ok()) {
throw runtime_error("Error creating graph: " + status.ToString());
}
// Read weights from the saved checkpoint
Tensor checkpointPathTensor(DT_STRING, TensorShape());
checkpointPathTensor.scalar<std::string>()() = checkpointPath;
status = session->Run(
{{ graph_def.saver_def().filename_tensor_name(), checkpointPathTensor },},
{},
{graph_def.saver_def().restore_op_name()},
nullptr);
if (!status.ok()) {
throw runtime_error("Error loading checkpoint from " + checkpointPath + ": " + status.ToString());
}
// and run the inference to your liking
auto feedDict = ...
auto outputOps = ...
std::vector<tensorflow::Tensor> outputTensors;
status = session->Run(feedDict, outputOps, {}, &outputTensors);
For completeness, here's the Python equivalent:
Inference in Python
with tf.Session() as sess:
saver = tf.train.import_meta_graph('models/my-model.meta')
saver.restore(sess, tf.train.latest_checkpoint('models/'))
outputTensors = sess.run(outputOps, feed_dict=feedDict)
I'm currently struggling with this myself, I've found it's not very straightforward to do currently. The two most commonly cited tutorials on the subject are:
https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f#.goxwm1e5j
and
https://medium.com/#hamedmp/exporting-trained-tensorflow-models-to-c-the-right-way-cf24b609d183#.g1gak956i
The equivalent of
new_saver = tf.train.import_meta_graph('./my_model_name.meta')
new_saver.restore(session, './my_model_name')
Is just
Status load_graph_status = LoadGraph(graph_path, &session);
Assuming you've "frozen the graph" (Used a script with combines the graph file with the checkpoint values).
Also, see the discussion here: Tensorflow Different ways to Export and Run graph in C++