debugging c++ tensorflow programs

debugging c++ tensorflow programs - c++

I am looking for tips on how to debug tensor flow graph execution written in c++. In particular calls to session::Run produce the error "specified in either feed_devices or fetch_devices was not found in the Graph" and I am looking for ways to debug my graph.
// data is a std::vector
std::vector names = {"feature1","feature2","feature3","feature4"};
std::vector<std::pair<tf::string, tf::Tensor>> input_tensor;
std::vector<tf::Tensor> output_tensors;
// create the Run input tensor
for (uint i=0; i < gamma_transformed_data.size();i++)
{
tf::Tensor tensorValue(tf::DT_FLOAT, tf::TensorShape({1,1}));
tensorValue.flat<float>().data()[0] = data[i];
std::pair <tf::string, tf::Tensor> pair = std::make_pair (names[i],tensorValue);
input_tensor.push_back(pair);
}
//execute the graph
check_status(ReadBinaryProto(tf::Env::Default(),protobuf_model_filepath, &graph));
check_status(tf::NewSession(tf::SessionOptions(), &session));
check_status(session->Create(graph));
check_status(session->Run(input_tensor, {"results"}, {}, &output_tensors));
session->Run returns false and status.ToString() = //"Invalid argument: Tensor xyz:0, specified in either feed_devices or fetch_devices was not found in the Graph"

Related

LibTorch: There's no way to put a model on the CUDA device that will process a tensor already on the CUDA device

I cannot put, and keep the model on the CUDA device. I cannot send a tensor that is already on CUDA through a model without getting the "found at least two devices, cpu and cuda" error.
Did I miss some simple way of putting the model on the CUDA device in LibTorch? I cannot find it or figure it out.
The full, reproducible example is below, but the lines in question are quite simple as show below...
I have a tensor that is on CUDA I want to send it though a model. This causes an error
auto the_tensor = torch::rand({42, 427}).to(device);
std::cout << net.forward(the_tensor).to(device);
terminate called after throwing an instance of 'c10::Error'
what(): Expected all tensors to be on the same device, but found at least two devices, cpu and
cuda:0! (when checking argument for argument mat1 in method wrapper_addmm)
If I do NOT put the tensor on CUDA I can run the tensor and the model both on CUDA like so
auto the_tensor = torch::rand({42, 427});
std::cout << net.forward(the_tensor).to(device);
I can also send the tensor back to the CPU and this also does NOT create an error. But, I have a large script with a lot of tensors that will already be on the CUDA device I DO NOT want to be sending the tensors from CUDA back to the CPU and then back to the CUDA device. This is why I call it a bug. How do I put the model on the CUDA device and keep it there other then putting .to(device) on the end of the model only when it is being called with forward net.forward(tensor)
auto the_tensor = torch::rand({42, 427}).to(device);
std::cout << net.forward(the_tensor.to(torch::kCPU)).to(device);
I have tried permanently putting the model on the device but nothing I try works.
net.to(device);
net->to(device);
Critic_Net().to(device);
I've tried many variations like these above to put the model on the CUDA device and keep it on the CUDA device but nothing works but to put the model on the CUDA device with net.forward(the_tensor).to(device);
The full, reproducible example.
#include <torch/torch.h>
using namespace torch::indexing;
torch::Device device(torch::kCUDA);
struct Critic_Net : torch::nn::Module {
torch::Tensor next_state_batch__sampled_action;
public:
Critic_Net() {
lin1 = torch::nn::Linear(427, 42);
lin2 = torch::nn::Linear(42, 286);
lin3 = torch::nn::Linear(286, 1);
}
torch::Tensor forward(torch::Tensor next_state_batch__sampled_action) {
auto h = next_state_batch__sampled_action;
h = torch::relu(lin1->forward(h));
h = torch::tanh(lin2->forward(h));
h = lin3->forward(h);
return torch::nan_to_num(h);
}
torch::nn::Linear lin1{nullptr}, lin2{nullptr}, lin3{nullptr};
};
auto net = Critic_Net();
int main() {
net.to(device);
auto the_tensor = torch::rand({42, 427}).to(device);
std::cout << net.forward(the_tensor).to(device);
}

When you are moving your model to gpu with the to function, libtorch does not move anything because you have not registered anything as parameters/buffers/modules. Hence when you call the forward method, incompatible devices are found and an error is raised. Here is how to register your submodules (see doc)
struct Critic_Net : torch::nn::Module {
public:
Critic_Net() {
lin1 = register_module("lin1", torch::nn::Linear(427, 42));
lin2 = register_module("lin1", torch::nn::Linear(42, 286));
lin3 = register_module("lin1", torch::nn::Linear(286, 1));
}
torch::Tensor forward(torch::Tensor next_state_batch__sampled_action) {
// unchanged
}
torch::nn::Linear lin1{nullptr}, lin2{nullptr}, lin3{nullptr};
};
Note : in addition to register_module you have access to register_parameter and register_buffer which take tensors instead of modules. The difference is that the "parameters" are trainable tensors while "buffers" will not be trainable (they are useful if you want to keep a moving average of your inputs for example).

OpenVINO - Image classification

I tried to use OpenVINO Inference Engine to accelerate my DL inference. It works with one image. But I want to create a batch of two images and then do a inference.
This is my code:
InferenceEngine::Core core;
InferenceEngine::CNNNetwork network = core.ReadNetwork("path/to/model.xml");
InferenceEngine::InputInfo::Ptr input_info = network.getInputsInfo().begin()->second;
std::string input_name = network.getInputsInfo().begin()->first;
InferenceEngine::DataPtr output_info = network.getOutputsInfo().begin()->second;
std::string output_name = network.getOutputsInfo().begin()->first;
InferenceEngine::ExecutableNetwork executableNetwork = core.LoadNetwork(network, "CPU");
InferenceEngine::InferRequest inferRequest = executableNetwork.CreateInferRequest();
std::string input_image_01 = "path/to/image_01.png";
cv::Mat image_01 = cv::imread(input_image_01 );
InferenceEngine::Blob::Ptr imgBlob_01 = wrapMat2Blob(image_01);
std::string input_image_02 = "path/to/image_02.png";
cv::Mat image_02 = cv::imread(input_image_02 );
InferenceEngine::Blob::Ptr imgBlob_02 = wrapMat2Blob(image_02);
InferenceEngine::BlobMap imgBlobMap;
std::pair<std::string, InferenceEngine::Blob::Ptr> pair01(input_image_01, imgBlob_01);
imgBlobMap.insert(pair01);
std::pair<std::string, InferenceEngine::Blob::Ptr> pair02(input_image_02, imgBlob_02);
imgBlobMap.insert(pair02);
inferRequest.SetInput(imgBlobMap);
inferRequest.StartAsync();
inferRequest.Wait(InferenceEngine::IInferRequest::WaitMode::RESULT_READY);
InferenceEngine::Blob::Ptr output = inferRequest.GetBlob(output_name);
std::vector<unsigned> class_results;
ClassificationResult cls(output, {"x", "y"}, 2, 3);
class_results = cls.getResults();
Unfortunately, I received the following error message from the command
inferRequest.SetInput(imgBlobMap);
[NOT_FOUND] Failed to find input or output with name: 'path/to/image_02.png'
C:\j\workspace\private-ci\ie\build-windows-vs2019#2\b\repos\openvino\inference-engine\src\plugin_api\cpp_interfaces/impl/ie_infer_request_internal.hpp:303
C:\Program Files (x86)\Intel\openvino_2021.3.394\inference_engine\include\details/ie_exception_conversion.hpp:66
How can I create a batch of more than image, do a inference and get the information for classification class and confidence? Is the confidence and class located in the received variable of GetBlob()? Should I need the call of ClassificationResult cls(output, {"x", "y"}, 2, 3);?

I'd recommend you to review Using Shape Inference article from OpenVINO online documentation to be aware of the limitations of using batches. It also refers to Open Model Zoo smart_classroom_demo, where dynamic batching is used in processing multiple previously detected faces. Basically, when you have batch enabled in the model, the memory buffer of your input blob will be allocated to have a room for all batch of images, and your responsibility is to fill data in input blob for each image in batch from your data. You may take a look at function CnnDLSDKBase::InferBatch, of smart_classroom_demo, which is located at file smart_classroom_demo/cpp/src/cnn.cpp, line 51. As you can see, in the loop over num_imgs an auxiliary function matU8ToBlob fills the input blob with data for current_batch_size of images, then set batch size for infer request and run inference.
for (size_t batch_i = 0; batch_i < num_imgs; batch_i += batch_size) {
const size_t current_batch_size = std::min(batch_size, num_imgs - batch_i);
for (size_t b = 0; b < current_batch_size; b++) {
matU8ToBlob<uint8_t>(frames[batch_i + b], input, b);
}
if (config_.max_batch_size != 1)
infer_request_.SetBatch(current_batch_size);
infer_request_.Infer();

there is a similar sample using the batch inputs as input into model within the OpenVINO. You can refer to below link.
https://github.com/openvinotoolkit/openvino/blob/ae2913d3b5970ce0d3112cc880d03be1708f13eb/inference-engine/samples/hello_nv12_input_classification/main.cpp#L236

WebRTC std::deque iterator exception when RTC_DCHECK_IS_ON

Recently, since M83 and M84 releases of lib-WebRTC, I face a strange error when I run my host program in Windows x64 Debug configuration (RTC_DCHECK_IS_ON) on Visual Studio :
When a video channel is creating in WebRTC library I get an exception on
_Deque_const_iterator& operator++() {
#if _ITERATOR_DEBUG_LEVEL != 0
const auto _Mycont = static_cast<const _Mydeque*>(this->_Getcont());
_STL_VERIFY(_Mycont, "cannot increment value-initialized deque iterator");
here----> _STL_VERIFY(this->_Myoff < _Mycont->_Myoff + _Mycont->_Mysize, "cannot increment deque iterator past end");**
#endif // _ITERATOR_DEBUG_LEVEL != 0
++_Myoff;
return *this;
}
Because _Myoff is NULL ...
This ++operator is called from rtc_base/thread.cc in WebRTC library here :
void ThreadManager::RegisterSendAndCheckForCycles(Thread* source,
Thread* target) {
CritScope cs(&crit_);
std::deque<Thread*> all_targets({target});
// We check the pre-existing who-sends-to-who graph for any path from target
// to source. This loop is guaranteed to terminate because per the send graph
// invariant, there are no cycles in the graph.
for (auto it = all_targets.begin(); it != all_targets.end(); ++it) {
const auto& targets = send_graph_[*it];
all_targets.insert(all_targets.end(), targets.begin(), targets.end());
}
...
It comes from the ++it from a std::deque< rtc::Thread* >
I don't really see what might be the problem, but it seems that the iterator has an issue.
Perhaps I have a kind of mismatch of configurations between the compiled webrtc.lib and my project but there wasn't any problem with WebRTC M79 or M81 for exemple.
And, as WebRTC is really a huge project, I don't know where to start my investigations.
Any idea ?
Please note that I also reported this bug to WebRTC team : https://bugs.chromium.org/p/webrtc/issues/detail?id=11746

The problem is from the function RegisterSendAndCheckForCycles of rtc_base/thread.cc file
for (auto it = all_targets.begin(); it != all_targets.end(); ++it) {
const auto& targets = send_graph_[*it];
all_targets.insert(all_targets.end(), targets.begin(), targets.end());
}
When all_targets.insert is called, the "it" gets invalid because the memory allocation changed in all_targets, so the next ++it generate an assertion failure. Working with indexes solves the problem
Here's the fixed version:
void ThreadManager::RegisterSendAndCheckForCycles(Thread* source,Thread* target) {
CritScope cs(&crit_);
std::deque<Thread*> all_targets({target});
// We check the pre-existing who-sends-to-who graph for any path from target
// to source. This loop is guaranteed to terminate because per the send graph
// invariant, there are no cycles in the graph.
for (size_t i = 0; i < all_targets.size(); i++) {
const auto& targets = send_graph_[all_targets[i]];
all_targets.insert(all_targets.end(), targets.begin(), targets.end());
}
RTC_CHECK_EQ(absl::c_count(all_targets, source), 0)
<< " send loop between " << source->name() << " and " << target->name();
// We may now insert source -> target without creating a cycle, since there
// was no path from target to source per the prior CHECK.
send_graph_[source].insert(target);
}
I will propose a patch directly to WebRTC team in a few days

This is basically a continuation of the accepted answer...
If you follow Microsoft's tutorial (https://learn.microsoft.com/en-us/winrtc/getting-started), they have you use the M84 release (still as of posting this in May '22). Then, they tell you to apply a pile of git patches they put together. To run those, they have you first define an environmental variable called WEBRTCM84_ROOT which is the absolute path to the webrtc\src directory. If you didn't do all that, simply execute this in a Command Prompt window (filling in YOUR actual path):
set WEBRTCM84_ROOT=C:\abs\path\to\webrtc\src
Now, create a git patch file somewhere, containing the following content. I'll just assume you put in it on a path directly adjacent to the webrtc repo.
ThreadManager.patch
diff --git a/rtc_base/thread.cc b/rtc_base/thread.cc
index 0fb2e813e0..a8cb022fa9 100644
--- a/rtc_base/thread.cc
+++ b/rtc_base/thread.cc
## -168,8 +168,8 ## void ThreadManager::RegisterSendAndCheckForCycles(Thread* source,
// We check the pre-existing who-sends-to-who graph for any path from target
// to source. This loop is guaranteed to terminate because per the send graph
// invariant, there are no cycles in the graph.
- for (auto it = all_targets.begin(); it != all_targets.end(); ++it) {
- const auto& targets = send_graph_[*it];
+ for (size_t i = 0; i < all_targets.size(); i++) {
+ const auto& targets = send_graph_[all_targets[i]];
all_targets.insert(all_targets.end(), targets.begin(), targets.end());
}
RTC_CHECK_EQ(absl::c_count(all_targets, source), 0)
Then, apply it, like so:
pushd "%WEBRTCM84_ROOT%"
git apply "..\..\ThreadManager.patch"
git commit -a -m "Applied ThreadManager patch."
popd
Note: I'm still experiencing some other bad behaviors in debug mode, that aren't present in release, but SamT's solution got me beyond this particular issue.

More than one input is Const Op

I am trying to serve the following gitrepo in opencv: https://github.com/una-dinosauria/3d-pose-baseline and the checkpoint data can be found at the following link: https://drive.google.com/file/d/0BxWzojlLp259MF9qSFpiVjl0cU0/view
I have already constructed a frozen graph which I can serve in python and was generated using the following script:
meta_path = 'checkpoint-4874200.meta' # Your .meta file
output_node_names = ['linear_model/add_1'] # Output nodes
export_dir=os.path.join('export_dir')
graph=tf.Graph()
with tf.Session(graph=graph) as sess:
# Restore the graph
loader=tf.train.import_meta_graph(meta_path)
loader.restore(sess,'checkpoint-4874200')
builder=tf.saved_model.builder.SavedModelBuilder(export_dir)
builder.add_meta_graph_and_variables(sess,
[tf.saved_model.SERVING],
strip_default_attrs=True)
# Freeze the graph
frozen_graph_def = tf.graph_util.convert_variables_to_constants(
sess,
sess.graph_def,
output_node_names)
# Save the frozen graph
with open('C:\\Users\\FrozenGraph.pb', 'wb') as f:
f.write(frozen_graph_def.SerializeToString())
Then I optimized the graph by running:
optimized_graph_def=optimize_for_inference_lib.optimize_for_inference(
frozen_graph_def,
['inputs/enc_in'],
['linear_model/add_1'],
tf.float32.as_datatype_enum)
g=tf.gfile.FastGFile('optimized_inference_graph.pb','wb')
g.write(optimized_graph_def.SerializeToString())
and the optimized frozen graph can be found at: https://github.com/alecda573/frozen_graph/blob/master/optimized_inference_graph.pb
When I try to run in opencv the following I get this runtime error:
OpenCV(4.3.0) Error: Unspecified error (More than one input is Const op) in cv::dnn::dnn4_v20200310::`anonymous-namespace'::TFImporter::getConstBlob, file C:\build\master_winpack-build-win64-vc15\opencv\modules\dnn\src\tensorflow\tf_importer.cpp, line 570
Steps to reproduce
To reproduce problem you just need to download the frozen graph from the above link or create yourself from the checkpoint data and then call the following in opencv with the below headers:
#include <iostream>
#include <vector>
#include <cmath>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc.hpp>
#include "opencv2/dnn.hpp"
string pbFilePath = "C:/Users/optimized_inferene_graph.pb";
//Create 3d-pose-baseline model
cv::dnn::Net inputNet;
inputNet = cv::dnn::readNetFromTensorflow(pbFilePath);
Would love to know if anyone has any thoughts on how to address this error.
You can see the frozen graph and optimize graph I generated with tensorboard from the attached photos.
I have a feeling the error is arising from the training flag inputs but I am not certain, and I do not want to go trying to edit the graph if that is not the problem.
I am attaching the function in opencv that is causing the issue:
const tensorflow::TensorProto& TFImporter::getConstBlob(const tensorflow::NodeDef &layer, std::map<String, int> const_layers,
int input_blob_index, int* actual_inp_blob_idx) {
if (input_blob_index == -1) {
for(int i = 0; i < layer.input_size(); i++) {
Pin input = parsePin(layer.input(i));
if (const_layers.find(input.name) != const_layers.end()) {
if (input_blob_index != -1)
CV_Error(Error::StsError, "More than one input is Const op");
input_blob_index = i;
}
}
}
if (input_blob_index == -1)
CV_Error(Error::StsError, "Const input blob for weights not found");
Pin kernel_inp = parsePin(layer.input(input_blob_index));
if (const_layers.find(kernel_inp.name) == const_layers.end())
CV_Error(Error::StsError, "Input [" + layer.input(input_blob_index) +
"] for node [" + layer.name() + "] not found");
if (kernel_inp.blobIndex != 0)
CV_Error(Error::StsError, "Unsupported kernel input");
if(actual_inp_blob_idx) {
*actual_inp_blob_idx = input_blob_index;
}
int nodeIdx = const_layers.at(kernel_inp.name);
if (nodeIdx < netBin.node_size() && netBin.node(nodeIdx).name() == kernel_inp.name)
{
return netBin.node(nodeIdx).attr().at("value").tensor();
}
else
{
CV_Assert_N(nodeIdx < netTxt.node_size(),
netTxt.node(nodeIdx).name() == kernel_inp.name);
return netTxt.node(nodeIdx).attr().at("value").tensor();
}
}

As you pointed out, the error originates in getConstBlob (https://github.com/opencv/opencv/blob/master/modules/dnn/src/tensorflow/tf_importer.cpp#L570). getConstBlobis called several times in populateNet (https://github.com/opencv/opencv/blob/master/modules/dnn/src/tensorflow/tf_importer.cpp#L706), which is called in all overloaded definitions of readNetFromTensor (https://github.com/opencv/opencv/blob/master/modules/dnn/src/tensorflow/tf_importer.cpp#L2278). Those may be starting points for where to place breakpoints if you want to step through with a debugger.
The other thing I noticed is that the definition of populateNet which I believe you're using (supplying a std::string: https://docs.opencv.org/master/d6/d0f/group__dnn.html#gad820b280978d06773234ba6841e77e8d) requires two arguments - both the model path (model) and a configuration (config`), which is optional and defaults to an empty string. In the unit tests, it looks like there are both cases - with and without configuration provided (https://github.com/opencv/opencv/blob/master/modules/dnn/test/test_tf_importer.cpp). I'm not sure if that would have an impact.
Lastly, in the script you provided to replicate the results, I believe the model file name is misspelled - it says optimized_inferene_graph.pb, but the file you point to in the github repo is spelled optimized_inference_graph.pb.
Just a few suggestions, I hope this may help!

Not found: FeedInputs: unable to find feed output TensorFlow

I was trying this example of using Tensorflow saved model in c++ in this website:
https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f#.ji310n4zo
It works well. But it does not save the values of the variables a and b as it only saves the graph not the variables. I tried to replace the following line:
tf.train.write_graph(sess.graph_def, 'models/', 'graph.pb', as_text=False)
with
saver.save(sess, 'models/graph', global_step=0)
of course after creating the saver object. It does not work and it outputs:
Not found: FeedInputs: unable to find feed output a
I checked the nodes the Nodes that are loaded and they are only:
_SOURCE
_SINK
while in the write_graph function and then load the model in C++, I got the following nodes loaded:
_SOURCE
_SINK
save/restore_slice_1/shape_and_slice
save/restore_slice_1/tensor_name
save/restore_slice/shape_and_slice
save/restore_slice/tensor_name
save/save/shapes_and_slices
save/save/tensor_names
save/Const
save/restore_slice_1
save/restore_slice
b
save/Assign_1
b/read
b/initial_value
b/Assign
a
save/Assign
save/restore_all
save/save
save/control_dependency
a/read
c
a/initial_value
a/Assign
init
Tensor
and even the graph file that is created by saver.save() is much smaller, 165B, compared to the one created by write_graph, 1.9KB.

I'm not sure if that is the best way of solving the problem but at least it solves it.
As write_graph can also store the values of the constants, I added the following code to the python just before writing the graph with write_graph function:
for v in tf.trainable_variables():
vc = tf.constant(v.eval())
tf.assign(v, vc, name="assign_variables")
This creates constants that store variables' values after being trained and then create tensors "assign_variables" to assign them to the variables. Now, when you call write_graph, it will store the variables' values in the file.
The only remaining part is to call these tensors "assign_variables" in the c code to make sure that your variables are assigned with the constants values that are stored in the file. Here is a one way to do it:
Status status = NewSession(SessionOptions(), &session);
std::vector<tensorflow::Tensor> outputs;
for(int i = 0;status.ok(); i++) {
char name[100];
if (i==0)
sprintf(name, "assign_variables");
else
sprintf(name, "assign_variables_%d", i);
status = session->Run({}, {name}, {}, &outputs);
}

There is another way of restoring the variables, by calling the save/restore_all operation, that should be present in the graph:
std::vector<tensorflow::Tensor> outputs;
Tensor checkpoint_filepath(DT_STRING, TensorShape());
checkpoint_filepath.scalar<std::string>()() = "path to the checkpoint file";
status = session->Run( {{ "save/Const", checkpoint_filepath },},
{}, {"save/restore_all"}, &outputs);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

debugging c++ tensorflow programs - c++

Related

LibTorch: There's no way to put a model on the CUDA device that will process a tensor already on the CUDA device

OpenVINO - Image classification

WebRTC std::deque iterator exception when RTC_DCHECK_IS_ON

More than one input is Const Op

Not found: FeedInputs: unable to find feed output TensorFlow

Categories

Resources