Understanding tflite examples in documentation

Understanding tflite examples in documentation - c++

I've been trying to figure out how to use tflite with c++ for a raspberry pi project, and I've had trouble understanding documentation and how to perform inference. I've been confused about the code shown on this page:
https://www.tensorflow.org/lite/api_docs/cc/class/tflite/interpreter
For context, I am inexperienced in c++, and usually work with python and matlab.
Here are the lines of code I'm trying to understand:
auto input = interpreter->typed_tensor(0);
for (int i = 0; i < input_size; i++) {
input[i] = ...; interpreter->Invoke();
My understanding of these lines is - They set input equal to the typed_tensor function of interpreter, with (0) as the input.
Then, they loop for values of i between 0 and input_size. For each i value, they make the inpt[i] equal to some value. They then call interpreter->Invoke(), which has the neural net perform inference, with input[i] as the input?
This is then repeated for each i value, each time calling interpreter->Invoke() with input[i] as the input to the neural net.
Is my understanding of this process correct?
What shape should the input take? For example, if I had converted a tensorflow model that takes a 1x100 input and then converted this to a tflite model, how would I create the input to feed into the tflite model in c++?
What data type should the input to the tflite model be in?
Thank you,
Simon
I've tried looking at example code of other people's uses of tflite in c++, but I haven't been able to examine the inputs to their models. I would also appreciate help with viewing the input tensors while debugging. I'm using code::blocks as an IDE for now.

Related

How to correctly format input and resize output data whille using TensorRT engine?

I'm trying implementing deep learning model into TensorRT runtime. The model conversion step is done quite OK and i'm pretty sure about it.
Now there's 2 parts i'm currently struggle with is memCpy data from host To Device (like openCV to Trt) and get the right output shape in order to get the right data. So my questions is:
How actually a shape of input dims relate with memory buffer. What is the difference when the model input dims is NCHW and NHWC, so when i read a openCV image, it's NHWC and also the model input is NHWC, do i have to re-arange the buffer data, if Yes then what's the actual consecutive memory format i have to do ?. Or simply what does the format or sequence of data that the engine are expecting ?
About the output (assume the input are correctly buffered), how do i get the right result shape for each task (Detection, Classification, etc..)..
Eg. an array or something look similar like when working with python .
I read Nvidia docs and it's not beginner-friendly at all.
//Let's say i have a model thats have a dynamic shape input dim in the NHWC format.
auto input_dims = nvinfer1::Dims4{1, 386, 342, 3}; //Using fixed H, W for testing
context->setBindingDimensions(input_idx, input_dims);
auto input_size = getMemorySize(input_dims, sizeof(float));
// How do i format openCV Mat to this kind of dims and if i encounter new input dim format, how do i adapt to that ???
And the expected output dims is something like (1,32,53,8) for example, the output buffer result in a pointer and i don't know what's the sequence of the data to reconstruct to expected array shape.
// Run TensorRT inference
void* bindings[] = {input_mem, output_mem};
bool status = context->enqueueV2(bindings, stream, nullptr);
if (!status)
{
std::cout << "[ERROR] TensorRT inference failed" << std::endl;
return false;
}
auto output_buffer = std::unique_ptr<int>{new int[output_size]};
if (cudaMemcpyAsync(output_buffer.get(), output_mem, output_size, cudaMemcpyDeviceToHost, stream) != cudaSuccess)
{
std::cout << "ERROR: CUDA memory copy of output failed, size = " << output_size << " bytes" << std::endl;
return false;
}
cudaStreamSynchronize(stream);
//How do i use this output_buffer to form right shape of output, (1,32,53,8) in this case ?

Could you please edit your question and tell us which model you're using if it's a commonly known NN, prehaps one we can download to test locally?
Then, the answer since it doesn't depend on the model (even though it would help to answer)
How actually a shape of input dims relate with memory buffer
If the input is NxCxHxW, you need to allocate N*C*H*W*sizeof(float) memory for that on your CPU and GPU. To be more precise, you need to allocate space on GPU for all the bindings and on CPU for only input and output bindings.
when i read a openCV image, it's NHWC and also the model input is NHWC, do i have to re-arange the buffer data
No, you do not have to re-arrange the buffer data. If you would have to change between NHWC and NCHW you can check this or google 'opencv NHWC to NHCW'.
Full working code example here, especially this function.
Or simply what does the format or sequence of data that the engine are expecting ?
This depends on how the neural network was trained. You should in general know exactly which kind of preprocessing and image data formats have been used to train the NN. You should even use the same libraries to load images and process them if possible. It's an open problem in ML: if you try to replicate results of some papers and use their models but they haven't open sourced the preprocessing you might get worse results. In the "worst" case you can implement both NHCW and NCHW and test which of them works.
About the output (assume the input are correctly buffered), how do i get the right result shape for each task (Detection, Classification, etc..).. Eg. an array or something look similar like when working with python .
This question clearly requires me to understand which NNs you are referring to. But I myself do the following:
Load the TensorRT .engine file in my code like this and deserialize like this
Print the bindings like this
Then I know the size of the input binding or bindings if there are many inputs, and the size of the output binding or bindings if there are many outputs.
This way you know the right result shape for each task. I hope this answered your question. If not, please add detailed comments and edit your post to be more precise. Thank you.
I read Nvidia docs and it's not beginner-friendly at all.
Yes I agree. You're better of searching TensorRT c++ (or Python) repositories from Github and studying their code. Have you seen TensorRT samples? It doesn't really take many lines of code to implement TensorRT inference.

extracting output data from typed_output_tensor in TFlite

Thanks in advance for your support.
I'm trying to get the output of a tensor after the inference on a .tflite U-net neural network. I'm using Tensorflow lite image classification code as a baseline.
I need to adapt the code for a segmentation task. My question is how I can access the output of the inferenced model (which 128x128x1) and write the result into an image?
I already debugged the code and explored many different approaches. Unfortunately, I'm not confident with the C++ language. What I found is that the command: interpreter->typed_output_tensor<float>(0) should be what I need, as also referenced here: https://www.tensorflow.org/lite/guide/inference#loading_a_model. However, I cannot access the 128x128 tensor generated by the network.
You can find the code at the address: https://github.com/tensorflow/tensorflow/blob/770481fb3e9126f9a29db5667f528e450d54d719/tensorflow/lite/examples/label_image/label_image.cc
The interesting part is here (lines 217 -224):
const float threshold = 0.001f;
std::vector<std::pair<float, int>> top_results;
int output = interpreter->outputs()[0];
TfLiteIntArray* output_dims = interpreter->tensor(output)->dims;
// assume output dims to be something like (1, 1, ... ,size)
auto output_size = output_dims->data[output_dims->size - 1];
I expect the values saved in an image or an alternative way of saving the output tensor

What is the TensorFlow C++ equivalent of argmax (axis=-1)

I am predicting pb graph output in TensorFlow C++.
The session->Run works fine and it gives a list of float values as output
load_graph_status = session->Run(inputs, { output_layer_name }, {}, &outputs);
I have done similar prediction in Python where I used
output = outputs.argmax(axis=-1)
I couldn't find the equivalent of this in C++? There is a tensorflow::ops::argmax in TensorFlow C++ documentation. But I couldn't figure out how to use it.

To answer my own question, there is no direct method in C++ that does this job.
The work around is to fetch and store each output value iteratively and get the maximum of the list.

Moving matrix from c++ to Matlab

I'm trying to take a matrix from c++ and import it to Matlab to run bintprog on this matrix, call it m. My c++ code generates these matrices of a certain type, and I need to run bintprog on them quickly, and with ideally millions of matrices.
So any of the following would be great:
A way to import a bunch of matrices at once so I can run a lot of iterations thru my Matlab code.
Or
If I could implement Matlab code right in c++ nicely.
If this is not clear leave me comments and I'll update what I can.

You can call Matlab commands from C++ code (and vice versa):
Compile your C++ code into a mex function and call bintprog using mexCallMatlab.
As proposed by Mark, you may call Matlab engine from native C++ code using matlab engine.
You may compile your C++ code as a shared library and call it from Matlab using calllib.

I suggest the simple solution, assuming that your matrices are kept in 3-dimmensional array:
Build a loop in C++, to save your matrices... Something like this:
ofstream arquivoOut0("myMatrices.dat");
for(int m=0;m<numberMatrices;m++){
for (int i=0; i< numberlines;i++){
for(int j=0;j<numberColumns;j++)
if(j!=numberColumns-1) arquivoOut0<< matrices[m][i][j] << "\t";
else arquivoOut0<< matrices[m][i][j] << "\n";
}
}
}
arquivoOut0.close();
Ok. You have saved your matrices in an ascii file! Now you have to read it in Matlab!
load myMatrices.dat
for m=1:numberMatrices
for i=1:numberLines
for j=1:numberColumns
myMatricesInMatlab(m,i,j)=myMatrices((m-1)*numberLines+i,j);
end
end
end
Now, you can use the toolbox that you need:
for i=1:numberMatrices
Apply the toolbox for myMatricesInMatlab(i,:,:);
end
I think it works, it the processing time is not an issue!

Embed a stateful Python script into C++ program using boost::python

I have a C++ program that keeps generating data. I have a python class that process these data. I want to use this python class to process the data: when each time a data point is generated, I can use this python script to process the data. But this python script must be "stateful", i.e. it should be able to remember what it did before this data point.
One super basic example is, my C++ program just generates numbers, and my python class calculates the cumulative sums of the numbers generated:
Python:
class CumSum:
def addone(x):
self._cumsum += x;
print self._cumsum;
C++
[Somehow construct a CumSum instance, say c]
for (int i=0; i<100000; i++) {
int x = rand() % 1000;
[Call c.addone(x)]
}
I heard boost::python is a good way to handle this. Can anyone sketch out how to do it? I tried to read boost documents but they were too huge for me to digest.
I appreciate your help.

For basic information about how to execute your python script:
http://www.boost.org/doc/libs/1_47_0/libs/python/doc/tutorial/doc/html/python/embedding.html
For details on manipulating python objects in C++
http://www.boost.org/doc/libs/1_47_0/libs/python/doc/tutorial/doc/html/python/object.html
Much of boost-python is concerned with exporting your C++ classes to python but you aren't doing that so you can ignore it.
You may be better off using a simpler wrapper like SCXX
http://davidf.sjsoft.com/mirrors/mcmillan-inc/scxx.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js