How construct input for tensorflow lite in c++? - c++

I am trying to figure out how should look input buffer for tensorflow lite model.
My input should be (1, 224, 224, 3) image buffer.
When I put to input buffer with 0 or 255 (black or white) images on the answer I am getting same answer.
uchar* in_data = new uchar[224*224*3];
for(int i=0; i<224*224*3;i++){
// in_data[i] = 0;
in_data[i] = 255;
}
uchar* input_1 = interpreter_stage1->typed_input_tensor<uchar>(0);
input_1 = in_data;
This code is giving me the same answer for all data which I am putting as input.
How should be constructed, proper input for the case when model dimensions are (1, 224, 224, 3)?
For the easy case when I have only (1, 128) single dimension vector everything is working good. But with this multidimensional case I don't know how to proceed.

Related

Conversion from IplImage to Mat with cvarrToMat missing/skipping image data bytes

I'm trying to convert an image from the container IplImage to a Mat object instead using cvarrToMat
I realized that the converted Mat image would display a number of random data bytes at the end (aka just some uninitialized bytes from memory) but I don't understand why this is happening and/or how to fix this? See the code and results below.
I'm using opencv 2.4.13.7 and working in Visual Studio 2017 (Visual C++ 2017)
I produced a data array with pixelwise recognizable data to contain data of a 3*4 resolution image with a depth of 8 bit and 3 color channels. When the data from the converted image is printed it shows that it skips a pixel (3 bytes) at each row end of the data.
#include "pch.h"
#include <iostream>
#include "cv.hpp"
#include "highgui.hpp"
using namespace std;
using namespace cv;
int main()
{
IplImage* ipl = NULL;
const char* windowName = "Mat image";
int i = 0;
ipl = cvCreateImage(cvSize(3, 4), IPL_DEPTH_8U, 3);
char array[3* 4 * 3] = { 11,12,13, 21,22,23, 31,32,33, 41,42,43, 51, 52, 53, 61, 62, 63, 71, 72, 73, 81, 82, 83, 91, 92, 93, 101, 102, 103, 111, 112, 113, 121, 122, 123 };
ipl->imageData = array;
printf("ipl->imageData = [ ");
for (i = 0; i < (ipl->width*ipl->height*ipl->nChannels); i++) {
printf("%u, ", ipl->imageData[i]);
}
printf("]\n\n");
Mat ipl2 = cvarrToMat(ipl);
cout << "ipl2 = " << endl << " " << ipl2 << endl << endl;
//display dummy image in window to use waitKey function
Mat M(3, 3, CV_8UC3, Scalar(0, 0, 255));
namedWindow(windowName, CV_WINDOW_AUTOSIZE);
imshow(windowName, M);
waitKey(0);
cvReleaseImage(&ipl);
}
Result:
Console window output for 3*4 resolution image
If the same is done for only a 2*2 pixel resolution image then only two bytes are skipped at the row end.. I can not explain this either.
Console window output for same code only with 2*2 resolution image
The reason why I would like to do this conversion is because I have a working routine in C of importing image data from a file (long story about old image file formats with raw image data) to an IplImage for further processing which I would like to keep for now - but I would like to start processing the images as Mat as this seems more widely supported and more simple to use in general, at least until I saw this.
Disclaimer: That is not an answer to the question itself, but should help the author to further investigate his problem. Also, see the comments beneath the question.
As a small test, I use this 3x3 image (you can hardly see - have a look at the "raw" input of my question for the link):
In Image Watch (Visual Studio extension), it'll look like this:
Let's try the following code:
// Read input image.
cv::Mat img = cv::imread("test.png", cv::IMREAD_COLOR);
// Output pixel values.
for (int x = 0; x < img.cols; x++)
{
for (int y = 0; y < img.rows; y++)
{
printf("%d ", img.at<cv::Vec3b>(y, x)[0]);
printf("%d ", img.at<cv::Vec3b>(y, x)[1]);
printf("%d \n", img.at<cv::Vec3b>(y, x)[2]);
}
}
We'll get this output:
0 255 255
0 255 255
255 255 255
255 0 255
255 255 255
255 0 255
0 0 255
0 255 255
255 0 255
Now, you could use the nested loops to check, whether the image data (or better: the pixel values) are identical in your IplImage ipl and in your Mat ipl2.

Save integer CV_32S image with OpenCV

I am working with TIF images containing signed integer data. After successfully inputing one and processing it I need to output the image in the same format (input and output both *.tif files).
For the input, I know that OpenCV does not know if the data is signed or unsigned, so it assumes unsigned. Using this trick solves that problem (switching the type of cv::Mat by hand).
However, when I output the image and load it again, I do not get the expected result. The file contains multiple segments (groups of pixels), and the format is as follows (I must use this format):
all pixels not belonging to any segment have the value -9999
all the pixels belonging to a single segment have the same positive integer value
(e.g. all pixels of 1st segment have value 1, second 2 etc)
And here is the example code:
void ImageProcessor::saveSegments(const std::string &filename){
cv::Mat segmentation = cv::Mat(workingImage.size().height,
workingImage.size().width,
CV_32S, cv::Scalar(-9999));
for (int i=0, szi = segmentsInput.size(); i < szi; ++i){
for (int j=0, szj = segmentsInput[i].size(); j < szj; ++j){
segmentation.at<int>(segmentsInput[i][j].Y,
ssegmentsInput[i][j].X) = i+1;
}
}
cv::imwrite(filename, segmentation);
}
You can assume that all the variables (e.g. workingImage, segmentsInput) exist as global variables.
Using this code, when I input the image and examine the values, most of the values are set to 0 while the ones that are set take a full range of integer values (in my example I had 20 segments).
You can't save integer matrices directly with imwrite. As the documentation states: "Only 8-bit (or 16-bit unsigned (CV_16U) in case of PNG, JPEG 2000, and TIFF) single-channel or 3-channel (with ‘BGR’ channel order) images can be saved using this function."
However, what you could do it to convert your CV_32S matrix to a CV_8UC4 and save it as a PNG with no compression. Of course, this is a bit unsafe since endianness comes into play and may change your values between different systems or compilers (especially since we're talking about signed integers here). If you use always the same system and compiler, you can use this:
cv::Mat segmentation = cv::Mat(workingImage.size().height,
workingImage.size().width,
CV_32S, cv::Scalar(-9999));
cv::Mat pngSegmentation(segmentation.rows, segmentation.cols, CV_8UC4, (cv::Vec4b*)segmentation.data);
std::vector<int> params;
params.push_back(CV_IMWRITE_PNG_COMPRESSION);
params.push_back(0);
cv::imwrite("segmentation.png", pngSegmentation, params);
I also save opencv mats as tifs but i don`t use the opencv tif solution. I include the libtiff lib on my own (i think libtiff is also used in opencv) and than you can use the following code to save as tiff
TIFF* tif = TIFFOpen("file.tif", "w");
if (tif != NULL) {
for (int i = 0; i < pages; i++)
{
TIFFSetField(tif, TIFFTAG_IMAGEWIDTH, TIFF_UINT64_T(x)); // set the width of the image
TIFFSetField(tif, TIFFTAG_IMAGELENGTH, TIFF_UINT64_T(y)); // set the height of the image
TIFFSetField(tif, TIFFTAG_SAMPLESPERPIXEL, 1); // set number of channels per pixel
TIFFSetField(tif, TIFFTAG_BITSPERSAMPLE, 32); // set the size of the channels 32 for CV_32F
TIFFSetField(tif, TIFFTAG_PAGENUMBER, i, pages);
TIFFSetField(tif, TIFFTAG_SAMPLEFORMAT, SAMPLEFORMAT_IEEEFP); // for CV32_F
for (uint32 row = 0; row < y; row++)
{
TIFFWriteScanline(tif, &imageDataStack[i].data[row*x*32/ 8], row, 0);
}
TIFFWriteDirectory(tif);
}
}
imageDataStack is a vector of cv::Mat objects. This code works for me to save tiff stacks.

opencv neural network, incorrect predict

I'm trying to create a neural network in C++ with OpenCV. The aim is recognition of road signs. I have created the network in this way, but it predicts badly, because it returns strange results:
Sample images from the training selection look like this:
Can someone help?
trainNN() {
char* templates_directory[] = {
"speed50ver1\\",
"speed60ver1\\",
"speed70ver1\\",
"speed80ver1\\"
};
int const numFilesChars[]={ 213, 100, 385, 163};
char const strCharacters[] = { '5', '6', '7', '8' };
Mat trainingData;
Mat trainingLabels(0, 0, CV_32S);
int const numCharacters = 4;
// load images from directory
for (int i = 0; i != numCharacters; ++i) {
int numFiles = numFilesChars[i];
DIR *dir;
struct dirent *ent;
char* s1 = templates_directory[i];
if ((dir = opendir (s1)) != NULL) {
Size size(80, 80);
while ((ent = readdir (dir)) != NULL) {
string s = s1;
s.append(ent->d_name);
if(s.substr(s.find_last_of(".") + 1) == "jpg") {
Mat img = imread(s,0);
Mat img_mat;
resize(img, img_mat, size);
Mat new_img = img_mat.reshape(1, 1);
trainingData.push_back(new_img);
trainingLabels.push_back(i);
}
}
int b = 0;
closedir (dir);
} else {
/* could not open directory */
perror ("");
}
}
trainingData.convertTo(trainingData, CV_32FC1);
Mat trainClasses(trainingData.rows, numCharacters, CV_32FC1);
for( int i = 0; i != trainClasses.rows; ++i){
int const labels = *trainingLabels.ptr<int>(i);
auto train_ptr = trainClasses.ptr<float>(i);
for(int k = 0; k != trainClasses.cols; ++k){
*train_ptr = k != labels ? 0 : 1;
++train_ptr;
}
}
int layers_d[] = { trainingData.cols, 10, numCharacters};
Mat layers(1, 3, CV_32SC1, layers_d);
ann.create(layers, CvANN_MLP::SIGMOID_SYM, 1, 1);
CvANN_MLP_TrainParams params = CvANN_MLP_TrainParams(
// terminate the training after either 1000
// iterations or a very small change in the
// network wieghts below the specified value
cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 1000, 0.000001),
// use backpropogation for training
CvANN_MLP_TrainParams::BACKPROP,
// co-efficents for backpropogation training
// (refer to manual)
0.1,
0.1);
int iterations = ann.train(trainingData, trainClasses, cv::Mat(), cv::Mat(), params);
CvFileStorage* storage = cvOpenFileStorage( "neural_network_2.xml", 0, CV_STORAGE_WRITE );
ann.write(storage,"digit_recognition");
cvReleaseFileStorage(&storage);
}
void analysis(char* file, bool a) {
//trainNN(a);
read_nn();
// load image
Mat img = imread(file, 0);
Size my_size(80,80);
resize(img, img, my_size);
Mat r_img = img.reshape(1,1);
r_img.convertTo(r_img, CV_32FC1);
Mat classOut(1,4,CV_32FC1);
ann.predict(r_img, classOut);
double min1, max1;
cv::Point min_loc, max_loc;
minMaxLoc(classOut, &min1, &max1, &min_loc, &max_loc);
int x = max_loc.x;
//create windows
namedWindow("Original Image", CV_WINDOW_AUTOSIZE);
imshow("Original Image", img);
waitKey(0); //wait for key press
img.release();
rr.release();
destroyAllWindows(); //destroy all open windows
}
strange results: for this input answer is 3 (because i have only 4 classes - speed limit 50, 60, 70, 80). It's correct for speed limit 80 sign.
But for the rest inputs results are incorrect. They are same for signs 50, 60, 70. max1 = min1 = 1.02631...(as on the first picture) It's strange.
I have adapted your code to train a classifier on 4 hand positions (since that's the image data I have). I kept your logic as similar as possible, only changing what was absolutely necessary to make it run on my Windows machine on my images. Long story short, there is nothing fundamentally wrong with your code - I don't see the failure mode you described.
One thing you left out was the code for read_nn(). I assume that just does something like the following:
ann.load("neural_network_2.xml");
Anyway, my suspicion is that either your neural network is not converging at all or it's badly overfitting. Perhaps there's not enough variation in the training data. Are you running analysis() on separate test data that the ANN wasn't trained on? If so, is the ANN able to predict training data properly at least?
EDIT: OK, I just downloaded your image data and tried it out and saw the same behavior. After some analysis, it looks like your ANN is not converging. The training operation exits after only about 250 iterations, even if you specify only CV_TERMCRIT_ITER for the cvTermCriteria. After increasing your hidden layer size from 10 to 20, I saw a marked improvement, with successful classification on the training data for 212, 72, 94, and 143 of the images respectively to the classes (50, 60, 70, and 80). That's not very good, but it demonstrates that you're on the right track.
Basically, the network architecture is not expressive enough to adequately model the problem you're trying to solve, so the network weights never converge and it abandons the backprop early. For one class, you may see some success, but I believe that's largely a function of the lack of shuffling of training data. If it stops after having just trained on a couple hundred very similar images, it may be able to manage to classify those correctly.
In short, I would recommend doing the following:
Build a way to test the results - e.g.: create a function to run prediction on all training data, and ideally set aside some images as a validation set in order to also confirm that the model is not overfitting the training data.
Shuffle the training data prior to training. Otherwise, backprop will not converge as easily.
Experiment with different architectures such as more than one hidden layer with varying sizes.
Really, this is a problem that would benefit dramatically from using a Convolutional Neural Net, but OpenCV's machine learning facilities are pretty limited. Ultimately, if you're serious about creating ANNs, you might want to investigate some more robust tools. I personally use Tensorflow, but I've heard good things about Theano as well.
I've only implemented NN with OpenCV for boolean classification, but I think that for a task where you need to classify more than two distinct classes this might also apply:
"If you are using the default cvANN_MLP::SIGMOID_SYM activation function then the output should be in the range [-1,1], instead of [0,1], for optimal results."
So, where you do:
*train_ptr = k != labels ? 0 : 1;
You might want to try:
*train_ptr = k != labels ? -1 : 1;
Disregard if I'm way off track here.

decoder ouput ---YUV file to RGB

I have made a directshow filter of decoder using libde265. There is built in function named write_image. It writes the decoded data in a yuv file.
I need to render the decoder data.
For that purpose I need to do two steps:
Output data on the output pin
Conversion of the data into rgb format
The media_subtype used is IMC3. IN IMC3 the format of the data is , y components following u and v in the memory .
I have tried the following code to output the data on the pin.
static FILE* fh = NULL;
if (fh == NULL) { fh = fopen(output_filename, "wb"); }
for (int y = 0; y<de265_get_image_height(img, 0); y++)
fread(out, de265_get_image_width(img, 0), 1, fh);
for (int y = 0; y<de265_get_image_height(img, 1); y++)
fread(out, de265_get_image_width(img, 1), 1, fh);
for (int y = 0; y<de265_get_image_height(img, 2); y++)
fread(out, de265_get_image_width(img, 2), 1, fh);
But the render screen is blank. Secondly I need to convert it to rgb as well. In the above code img is the image structure storing the decoded data.
Please help me in this regard because render is not showing anything. As well as suggest me to convert the data in the rgb format as well. May be I may be wrong in packing the data in the output buffer.Although I am following the exact IMC3 format

Normalize pixel values between 0 and 1

I am looking to normalize the pixel values of an image to the range [0..1] using C++/OpenCV. However, when I do the normalization using either image *= 1./255 or the normalize function the pixel values are rounded down to zero. I have tried setting the image to type CV_32FC3.
Below is the code I have:
Mat image;
image = imread(imageLoc, CV_LOAD_IMAGE_COLOR | CV_LOAD_IMAGE_ANYDEPTH);
Mat tempImage;
// (didn't work) tempImage *= 1./255;
image.convertTo(tempImage, CV_32F, 3);
normalize(image, tempImage, 0, 1, CV_MINMAX);
int r = 100;
int c = 150;
uchar* ptr = (uchar*)(tempImage.data + r * tempImage.step);
Vec3f tempVals;
tempVals.val[0] = ptr[3*c+1];
tempVals.val[1] = ptr[3*c+2];
tempVals.val[2] = ptr[3*c+3];
cout<<" temp image - "<< tempVals << endl;
uchar* ptr2 = (uchar*)(image.data + r * image.step);
Vec3f imVals;
imVals.val[0] = ptr2[3*c+1];
imVals.val[1] = ptr2[3*c+2];
imVals.val[2] = ptr2[3*c+3];
cout<<" image - "<< imVals << endl;
This produces the following output in the console:
temp image - [0, 0, 0]
image - [90, 78, 60]
You can make convertTo() do the normalization for you:
image.convertTo(tempImage, CV_32FC3, 1.f/255);
You are passing 3 to convertTo(), presumably as channel-count, but that's not the correct signature.
I used the normalize function and it worked (Java):
Core.normalize(src,dst,0.0,1.0,Core.NORM_MINMAX,CvType.CV_32FC1);
You should use a 32F depth for your destination image. I believe the reason for this, is that since you need to get decimal values, you should use an a non-integer OpenCV data type. According to this table, the float types correspond to the 32F depth. I chose the number of channels to be 1 and it worked; CV_32FC1
Remember also that it's unlikely to spot any visual difference in the image.
Finally, since you probably have thousands of pixels in your image, your console might seem that it's printing only zeros. However due to the large amount of data, try to use CTRL+F to see what's going on. Hope this helps.