opencv, accessing element for downsampling but white window appear - c++

i'm learning c++ api of opencv, and for a simple approach i've started with try to downsample image (ok i know that there is pyrDown with gaussian resampling but it's for learning how to access element in Mat class)
this is my code:
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <iostream>
#define original_window "original"
#define manual_window "manual"
using namespace cv;
using namespace std;
Mat img, manual;
void downsample(Mat src, Mat &dst, const Size& s) {
float factor = src.rows/(float)s.width;
Mat_<Vec3f> _dst = Mat(s, src.type());
Mat_<Vec3f> _src = src;
for(int i=0; i<src.cols; i+=factor) {
int _i = i/factor;
for(int j=0; j<src.rows; j+=factor) {
int _j = j/factor;
_dst (_j, _i) = _src(j,i);
}
}
cout << "downsample image size: " << _dst.rows << " " << _dst.cols << endl;
dst = Mat(_dst);
}
int main(int /*argc*/, char** /*argv*/) {
img = imread("lena.jpg");
cout << "original image size: " << img.rows << " " << img.cols << endl;
downsample(img, manual, Size(img.cols/2, img.rows/2));
namedWindow(original_window, CV_WINDOW_AUTOSIZE);
namedWindow(manual_window, CV_WINDOW_AUTOSIZE);
while( true )
{
char c = (char)waitKey(10);
if( c == 27 )
{ break; }
imshow( original_window, img );
imshow( manual_window, manual );
}
return 0;
}
now, i'm doing a downsampling in a fool way: i'm just deleting elements. and i'm try to use c++ api with Mat_.
in manual window i get a white window, and i don't understand why. event if i try to cout manual i'seeing different values.. what's wrong with this piece of code?
EDIT 1
i've found a solution:
dst.convertTo(dst, src.type()); // in this particular case: src.type() == CV_8UC3
at the end of downsample()
now my question is: why that? i declare Mat(s, src.type()); why it is modified?
EDIT 2
if i use #go4sri answer with this line
_dst (_j, _i) = src.at<Vec3f>(j, i);
i get this output:
i really does not understand why..

The way to access an element in OpenCV's Mat is as follows:
for a single channel matrix(tmp)
Matrix_Name.at<dataType>(row, col)
For a three channel matrix( as is the case for a color image), you will need to use the Vec3b/Vec3f type depending upon if yours is a unsigned char/float matrix.
As yours is a unsigned char 3Dimensional matrix:
you will have to access it as src.at<Vec3b>(i, j)
Your downsample should have been:
void downsample(const Mat& src, Mat &dst, const Size& s) {
float factor = src.rows/(float)s.height;
Mat _dst = Mat(s, src.type());
for(int i=0; i < src.cols; i += factor) {
int _i = i/factor;
for(int j=0; j<src.rows; j+=factor) {
int _j = j/factor;
_dst.at<Vec3b> (_j, _i) = src.at<Vec3b>(j, i);
}
}
cout << "downsample image size: " << _dst.rows << " " << _dst.cols << endl;
dst = Mat(_dst);
}

Related

How to apply custom filters on image?

I'm using OpenCV4 on Ubuntu 20.04 LTS on WSL + XServer for GUI.
I want to create custom convlutional filter kernels and apply them to my image. this is the code I've written for it:
cv::Mat filter2D(cv::Mat input, cv::Mat filter)
{
using namespace cv;
Mat dst = input.clone();
//cout << " filter data successfully found. Rows:" << filter.rows << " cols:" << filter.cols << " channels:" << filter.channels() << "\n";
//cout << " input data successfully found. Rows:" << input.rows << " cols:" << input.cols << " channels:" << input.channels() << "\n";
for (int i = 0-(filter.rows/2);i<input.rows-(filter.rows/2);i++)
{
for (int j = 0-(filter.cols/2);j<input.cols-(filter.cols/2);j++)
{ //adding k and l to i and j will make up the difference and allow us to process the whole image
float filtertotal = 0;
for (int k = 0; k < filter.rows;k++)
{
for (int l = 0; l < filter.rows;l++)
{
if(i+k >= 0 && i+k < input.rows && j+l >= 0 && j+l < input.cols)
{ //don't try to process pixels off the edge of the map
float a = input.at<uchar>(i+k,j+l);
float b = filter.at<float>(k,l);
float product = a * b;
filtertotal += product;
}
}
}
//filter all proccessed for this pixel, write it to dst
dst.at<uchar>(i+(filter.rows/2),j+(filter.cols/2)) = filtertotal;
}
}
return dst;
}
int main(int argc, char** argv)
{
// Declare variables
cv::Mat_<float> src;
const char* window_name = "filter2D Demo";
// Loads an image
src = cv::imread("fapan.png", cv::IMREAD_GRAYSCALE ); // Load an image
if( src.empty() )
{
printf(" Error opening image\n");
return EXIT_FAILURE;
}
static float x[3][3] = {
{-1, -1, -1},
{-1, 8, -1},
{-1, -1, -1}
};
cv::Mat kernel(3,3, CV_16FC1, x);
// Apply filter
filter2D(src, kernel);
cv::imshow( window_name, src );
cv::waitKey(0);
return EXIT_SUCCESS;
}
the problem is that the output image is like this.
as you can see not only the edges are white, but also inside of it is white too.
the input image
The output you have posted for the input code is correct as you are applying a normal filter on a image .
It may cause a little blurring or sharpening in it but it will never cause it to completely detect edges.
In order to detect only the edges along the images you must apply Laplacian along a certain direction.
https://www.l3harrisgeospatial.com/docs/LaplacianFilters.html#:~:text=A%20Laplacian%20filter%20is%20an,an%20edge%20or%20continuous%20progression. ( A link with some info )
Which is the derivative of the image it will only detect the change .
I recommend you do this on matlab image processing toolbox .

How do I clear a white background in OpenCV with c++?

This program shows sequential frames with images.
However, as you see, the worm image has a white background.
But I already cut the worm image's background, So the current worm images background is transparent.
I wants to process the worm image's background transparently and show the worm image not gray but colorful.
I tried to edit into cvtColor(image, srcBGR, CV_BGR2BGRA), however, occured error.
Here is the code.
#include<opencv2/core.hpp>
#include<opencv2/highgui.hpp>
#include<opencv2/imgproc.hpp>
#include<iostream>
#include<vector>
using namespace std;
using namespace cv;
int main(){
VideoCapture cap;
cap.open(0);
if(!cap.isOpened()){
cerr << "Error opening the webcam!" << endl;
return -1;
}
Mat image = imread("images/worm.png", 0);
cv::resize(image,image,Size(70, 120));
Mat frame;
while(1){
cap >> frame;
Mat newFrame = frame.clone();
int cx = (newFrame.cols - 70) / 2;
if (!image.empty()) {
// Get a BGR version of the face, since the output is BGR color
Mat srcBGR = Mat(image.size(), CV_8UC3);
cvtColor(image, srcBGR, CV_GRAY2BGR);
// Get the destination ROI (and make sure it is within the image)
Rect dstRC = Rect(cx, newFrame.rows/2, 70, 120);
Mat dstROI = newFrame(dstRC);
// Copy the pixels from src to dst.
srcBGR.copyTo(dstROI);
}
imshow("frame", newFrame);
char key = (char) waitKey(30);
// Exit this loop on escape:
if(key == 27)
break;
}
return 0;
}
Reading the image using
Mat image = imread("images/worm.png", 0);
will discard the transparency information and load it as an RGB image. Instead, you can use
Mat image = imread("images/worm.png", cv2.IMREAD_UNCHANGED);
The rest of the code should work now since you convert the captured image to a BGRA image before copying.
I try to demonstrate it in Python.
As the preview answer said, the cv2.imread(fname, 0) will discard the transparency information, that's the alpha channel.
To preserve the alpha channel, use cv2.imread(fname, -1) or equals to cv2.imread(fname, cv2.IMREAD_UNCHANGED) to read, then split the channels.
We can clearly find the alpha channel.
Then do mask-operation to blend, we will get this:
## read the images
## 读图(0:BGR, -1:保持不变)
wali = cv2.imread("wali.png")
worm = cv2.imread("worm.png", -1)
## split and merge channels
## 通道分离与合并
w,h = worm.shape[:2]
b,g,r,a = cv2.split(worm)
mask = np.dstack((a,a,a))
worm = np.dstack((b,g,r))
## mask operation
## 掩模操作
canvas = wali[100:100+h, 200:200+w]
imask = mask>0
canvas[imask] = worm[imask]
## display
## 显示
cv2.imshow("wali", wali)
cv2.waitKey()
Try this:
#include <windows.h>
#include <iostream>
#include <vector>
#include <stdio.h>
#include "fstream"
#include "iostream"
#include <algorithm>
#include <iterator>
#include "opencv2/opencv.hpp"
using namespace std;
using namespace cv;
//-----------------------------------------------------------------------------------------------------
//
//-----------------------------------------------------------------------------------------------------
int main(int argc, unsigned int** argv)
{
Mat img = imread("background.jpg", 1);
if (img.empty())
{
cout << "Can't read image." << endl;
return 0;
}
Mat overlay = imread("overlay.png", -1);
if (overlay.empty())
{
cout << "Can't read overlay image." << endl;
return 0;
}
Rect target_roi(0,0,img.cols,img.rows); // Set here, where to place overlay.
cv::resize(overlay, overlay, Size(target_roi.width, target_roi.height));
Mat mask;
if (overlay.channels() == 4)
{
vector<Mat> ch;
split(overlay, ch);
mask = 255-ch[3].clone();
mask.convertTo(mask, CV_32FC1, 1.0 / 255.0);
ch.erase(ch.begin()+3);
merge(ch, overlay);
}
else
{
if (overlay.channels() == 3)
{
cvtColor(overlay, overlay, COLOR_BGR2GRAY);
}
overlay.convertTo(mask, CV_32FC1, 1.0 / 255.0);
}
for (int i = 0; i < overlay.rows; ++i)
{
for (int j = 0; j < overlay.cols; ++j)
{
float blending_coeff = mask.at<float>(i, j);
Vec3b v1 = img.at<Vec3b>(i + target_roi.y, j + target_roi.x);
Vec3b v2;
if (overlay.channels() == 1)
{
int v = overlay.at<uchar>(i, j);
v2 = (v, v, v);
}
else
{
v2 = overlay.at<Vec3b>(i, j);
}
Vec3f v1f(v1[0], v1[1], v1[2]);
Vec3f v2f(v2[0], v2[1], v2[2]);
Vec3f r = v1f*blending_coeff + (1.0 - blending_coeff)*v2f;
img.at<Vec3b>(i + target_roi.y, j + target_roi.x) = r;
}
}
imshow("mask", img);
imwrite("result.png", img);
waitKey();
}

Using saturate_cast or not

This is a simple program to change contrast and brightness of an image. I have noticed that there is a an another program with one simple difference:saturate_cast is added to code.
And I don't realize what is the reason of doing this and there is no need to converting to unsigned char or uchar both code (with saturate_cast<uchar> and to not use this) are outputting the same result. I appreciate if anyone help.
Here it is code :
#include "opencv2/imgcodecs.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <iostream>
#include "Source.h"
using namespace cv;
double alpha;
int beta;
int main(int, char** argv)
{
/// Read image given by user
Mat image = imread(argv[1]);
Mat image2 = Mat::zeros(image.size(), image.type());
/// Initialize values
std::cout << " Basic Linear Transforms " << std::endl;
std::cout << "-------------------------" << std::endl;
std::cout << "* Enter the alpha value [1.0-3.0]: ";std::cin >> alpha;
std::cout << "* Enter the beta value [0-100]: "; std::cin >> beta;
for (int x = 0; x < image.rows; x++)
{
for (int y = 0; y < image.cols; y++)
{
for (int c = 0; c < 3; c++)
{
image2.at<Vec3b>(x, y)[c] =
saturate_cast<uchar>(alpha*(image.at<Vec3b>(x, y)[c]) + beta);
}
}
/// Create Windows
namedWindow("Original Image", 1);
namedWindow("New Image", 1);
/// Show stuff
imshow("Original Image", image);
imshow("New Image", image2);
/// Wait until user press some key
waitKey();
return 0;
}
Since the result of your expression may go outside the valid range for uchar, i.e. [0,255], you'd better always use saturate_cast.
In your case, the result of the expression: alpha*(image.at<Vec3b>(x, y)[c]) + beta is a double, so it's safer to use saturate_cast<uchar> to clamp values correctly.
Also, this improves readability, since it's easy to see that you want a uchar out of an expression.
Without using saturate_cast you may have unexpected values:
uchar u1 = 257; // u1 = 1, why a very bright value is set to almost black?
uchar u2 = saturate_cast<uchar>(257); // u2 = 255, a very bright value is set to white
inline unsigned char saturate_cast_uchar(double val) {
val += 0.5; // to round the value
return unsigned char(val < 0 ? 0 : (val > 0xff ? 0xff : val));
}
if val lies between 0 to 255 than this function will return rounded value,
if val lies outside the range [0, 255] than it will return lower or upper boundary value.

Number and character recognition using ANN OpenCV 3.1

I have implemented Neural network using OpenCV ANN Library. I am newbie in this field and I learn everything about it online (Mostly StackOverflow).
I am using this ANN for detection of number plate. I did segmentation part using OpenCV image processing library and it is working good. It performs character segmentation and gives it to the NN part of the project. NN is going to recognize the number plate.
I have sample images of 20x30, therefore I have 600 neurons in input layer. As there are 36 possibilities (0-9,A-Z) I have 36 output neurons. I kept 100 neurons in hidden layer. The predict function of OpenCV is giving me the same output for every segmented image. That output is also showing some large negative(< -1). I have used cv::ml::ANN_MLP::SIGMOID_SYM as an activation function.
Please don't mind as there is lot of code wrongly commented (I am doing trial and error).
I need to find out what is the output of predict function. Thank you for your help.
#include <opencv2/opencv.hpp>
int inputLayerSize = 1;
int outputLayerSize = 1;
int numSamples = 2;
Mat layers = Mat(3, 1, CV_32S);
layers.row(0) =Scalar(600) ;
layers.row(1) = Scalar(20);
layers.row(2) = Scalar(36);
vector<int> layerSizes = { 600,100,36 };
Ptr<ml::ANN_MLP> nnPtr = ml::ANN_MLP::create();
vector <int> n;
//nnPtr->setLayerSizes(3);
nnPtr->setLayerSizes(layers);
nnPtr->setTrainMethod(ml::ANN_MLP::BACKPROP);
nnPtr->setTermCriteria(TermCriteria(cv::TermCriteria::COUNT | cv::TermCriteria::EPS, 1000, 0.00001f));
nnPtr->setActivationFunction(cv::ml::ANN_MLP::SIGMOID_SYM, 1, 1);
nnPtr->setBackpropWeightScale(0.5f);
nnPtr->setBackpropMomentumScale(0.5f);
/*CvANN_MLP_TrainParams params = CvANN_MLP_TrainParams(
// terminate the training after either 1000
// iterations or a very small change in the
// network wieghts below the specified value
cvTermCriteria(CV_TERMCRIT_ITER + CV_TERMCRIT_EPS, 1000, 0.000001),
// use backpropogation for training
CvANN_MLP_TrainParams::BACKPROP,
// co-efficents for backpropogation training
// (refer to manual)
0.1,
0.1);*/
/* Mat samples(Size(inputLayerSize, numSamples), CV_32F);
samples.at<float>(Point(0, 0)) = 0.1f;
samples.at<float>(Point(0, 1)) = 0.2f;
Mat responses(Size(outputLayerSize, numSamples), CV_32F);
responses.at<float>(Point(0, 0)) = 0.2f;
responses.at<float>(Point(0, 1)) = 0.4f;
*/
//reading chaos image
// we will read the classification numbers into this variable as though it is a vector
// close the traning images file
/*vector<int> layerInfo;
layerInfo=nnPtr->get;
for (int i = 0; i < layerInfo.size(); i++) {
cout << "size of 0" <<layerInfo[i] << endl;
}*/
cv::imshow("chaos", matTrainingImagesAsFlattenedFloats);
// cout <<abc << endl;
matTrainingImagesAsFlattenedFloats.convertTo(matTrainingImagesAsFlattenedFloats, CV_32F);
//matClassificationInts.reshape(1, 496);
matClassificationInts.convertTo(matClassificationInts, CV_32F);
matSamples.convertTo(matSamples, CV_32F);
std::cout << matClassificationInts.rows << " " << matClassificationInts.cols << " ";
std::cout << matTrainingImagesAsFlattenedFloats.rows << " " << matTrainingImagesAsFlattenedFloats.cols << " ";
std::cout << matSamples.rows << " " << matSamples.cols;
imshow("Samples", matSamples);
imshow("chaos", matTrainingImagesAsFlattenedFloats);
Ptr<ml::TrainData> trainData = ml::TrainData::create(matTrainingImagesAsFlattenedFloats, ml::SampleTypes::ROW_SAMPLE, matSamples);
nnPtr->train(trainData);
bool m = nnPtr->isTrained();
if (m)
std::cout << "training complete\n\n";
// cv::Mat matCurrentChar = Mat(cv::Size(matTrainingImagesAsFlattenedFloats.cols, matTrainingImagesAsFlattenedFloats.rows), CV_32F);
// cout << "samples:\n" << samples << endl;
//cout << "\nresponses:\n" << responses << endl;
/* if (!nnPtr->train(trainData))
return 1;*/
/* cout << "\nweights[0]:\n" << nnPtr->getWeights(0) << endl;
cout << "\nweights[1]:\n" << nnPtr->getWeights(1) << endl;
cout << "\nweights[2]:\n" << nnPtr->getWeights(2) << endl;
cout << "\nweights[3]:\n" << nnPtr->getWeights(3) << endl;*/
//predicting
std::vector <cv::String> filename;
cv::String folder = "./plate/";
cv::glob(folder, filename);
if (filename.empty()) { // if unable to open image
std::cout << "error: image not read from file\n\n"; // show error message on command line
return(0); // and exit program
}
String strFinalString;
for (int i = 0; i < filename.size(); i++) {
cv::Mat matTestingNumbers = cv::imread(filename[i]);
cv::Mat matGrayscale; //
cv::Mat matBlurred; // declare more image variables
cv::Mat matThresh; //
cv::Mat matThreshCopy;
cv::Mat matCanny;
//
cv::cvtColor(matTestingNumbers, matGrayscale, CV_BGR2GRAY); // convert to grayscale
matThresh = cv::Mat(cv::Size(matGrayscale.cols, matGrayscale.rows), CV_8UC1);
for (int i = 0; i < matGrayscale.cols; i++) {
for (int j = 0; j < matGrayscale.rows; j++) {
if (matGrayscale.at<uchar>(j, i) <= 130) {
matThresh.at<uchar>(j, i) = 255;
}
else {
matThresh.at<uchar>(j, i) = 0;
}
}
}
// blur
cv::GaussianBlur(matThresh, // input image
matBlurred, // output image
cv::Size(5, 5), // smoothing window width and height in pixels
0); // sigma value, determines how much the image will be blurred, zero makes function choose the sigma value
// filter image from grayscale to black and white
/* cv::adaptiveThreshold(matBlurred, // input image
matThresh, // output image
255, // make pixels that pass the threshold full white
cv::ADAPTIVE_THRESH_GAUSSIAN_C, // use gaussian rather than mean, seems to give better results
cv::THRESH_BINARY_INV, // invert so foreground will be white, background will be black
11, // size of a pixel neighborhood used to calculate threshold value
2); */ // constant subtracted from the mean or weighted mean
// cv::imshow("thresh" + std::to_string(i), matThresh);
matThreshCopy = matThresh.clone();
std::vector<std::vector<cv::Point> > ptContours; // declare a vector for the contours
std::vector<cv::Vec4i> v4iHierarchy;// make a copy of the thresh image, this in necessary b/c findContours modifies the image
cv::Canny(matBlurred, matCanny, 20, 40, 3);
/*std::vector<std::vector<cv::Point> > ptContours; // declare a vector for the contours
std::vector<cv::Vec4i> v4iHierarchy; // declare a vector for the hierarchy (we won't use this in this program but this may be helpful for reference)
cv::findContours(matThreshCopy, // input image, make sure to use a copy since the function will modify this image in the course of finding contours
ptContours, // output contours
v4iHierarchy, // output hierarchy
cv::RETR_EXTERNAL, // retrieve the outermost contours only
cv::CHAIN_APPROX_SIMPLE); // compress horizontal, vertical, and diagonal segments and leave only their end points
/*std::vector<std::vector<cv::Point> > contours_poly(ptContours.size());
std::vector<cv::Rect> boundRect(ptContours.size());
for (int i = 0; i < ptContours.size(); i++)
{
approxPolyDP(cv::Mat(ptContours[i]), contours_poly[i], 3, true);
boundRect[i] = cv::boundingRect(cv::Mat(contours_poly[i]));
}*/
/*for (int i = 0; i < ptContours.size(); i++) { // for each contour
ContourWithData contourWithData; // instantiate a contour with data object
contourWithData.ptContour = ptContours[i]; // assign contour to contour with data
contourWithData.boundingRect = cv::boundingRect(contourWithData.ptContour); // get the bounding rect
contourWithData.fltArea = cv::contourArea(contourWithData.ptContour); // calculate the contour area
allContoursWithData.push_back(contourWithData); // add contour with data object to list of all contours with data
}
for (int i = 0; i < allContoursWithData.size(); i++) { // for all contours
if (allContoursWithData[i].checkIfContourIsValid()) { // check if valid
validContoursWithData.push_back(allContoursWithData[i]); // if so, append to valid contour list
}
}
//sort contours from left to right
std::sort(validContoursWithData.begin(), validContoursWithData.end(), ContourWithData::sortByBoundingRectXPosition);
// std::string strFinalString; // declare final string, this will have the final number sequence by the end of the program
*/
/*for (int i = 0; i < validContoursWithData.size(); i++) { // for each contour
// draw a green rect around the current char
cv::rectangle(matTestingNumbers, // draw rectangle on original image
validContoursWithData[i].boundingRect, // rect to draw
cv::Scalar(0, 255, 0), // green
2); // thickness
cv::Mat matROI = matThresh(validContoursWithData[i].boundingRect); // get ROI image of bounding rect
cv::Mat matROIResized;
cv::resize(matROI, matROIResized, cv::Size(RESIZED_IMAGE_WIDTH, RESIZED_IMAGE_HEIGHT)); // resize image, this will be more consistent for recognition and storage
*/
cv::Mat matROIFloat;
cv::resize(matThresh, matThresh, cv::Size(RESIZED_IMAGE_WIDTH, RESIZED_IMAGE_HEIGHT));
matThresh.convertTo(matROIFloat, CV_32FC1, 1.0 / 255.0); // convert Mat to float, necessary for call to find_nearest
cv::Mat matROIFlattenedFloat = matROIFloat.reshape(1, 1);
cv::Point maxLoc = { 0,0 };
cv::Point minLoc;
cv::Mat output = cv::Mat(cv::Size(36, 1), CV_32F);
vector<float>output2;
// cv::Mat output2 = cv::Mat(cv::Size(36, 1), CV_32F);
nnPtr->predict(matROIFlattenedFloat, output2);
// float max = output.at<float>(0, 0);
int fo = 0;
float m = output2[0];
imshow("predicted input", matROIFlattenedFloat);
// float b = output.at<float>(0, 0);
// cout <<"\n output0,0:"<<b<<endl;
// minMaxLoc(output, 0, 0, &minLoc, &maxLoc, Mat());
// cout << "\noutput:\n" << maxLoc.x << endl;
for (int j = 1; j < 36; j++) {
float value =output2[j];
if (value > m) {
m = value;
fo = j;
}
}
float * p = 0;
p = &m;
cout << "j value in output " << fo << " Max value " << p << endl;
//imshow("output image" + to_string(i), output);
// cout << "\noutput:\n" << minLoc.x << endl;
//float fltCurrentChar = (float)maxLoc.x;
output.release();
m = 0;
fo = 0;
}
// strFinalString = strFinalString + char(int(fltCurrentChar)); // append current char to full string
// cv::imshow("Predict output", output);
/*cv::Point maxLoc = {0,0};
Mat output=Mat (cv::Size(matSamples.cols,matSamples.rows),CV_32F);
nnPtr->predict(matTrainingImagesAsFlattenedFloats, output);
minMaxLoc(output, 0, 0, 0, &maxLoc, 0);
cout << "\noutput:\n" << maxLoc.x << endl;*/
// getchar();
/*for (int i = 0; i < 10;i++) {
for (int j = 0; j < 36; j++) {
if (matCurrentChar.at<float>(i, j) >= 0.6) {
cout << " "<<j<<" ";
}
}
}*/
waitKey(0);
return(0);
}
void gen() {
std::string dir, filepath;
int num, imgArea, minArea;
int pos = 0;
bool f = true;
struct stat filestat;
cv::Mat imgTrainingNumbers;
cv::Mat imgGrayscale;
cv::Mat imgBlurred;
cv::Mat imgThresh;
cv::Mat imgThreshCopy;
cv::Mat matROIResized=cv::Mat (cv::Size(RESIZED_IMAGE_WIDTH,RESIZED_IMAGE_HEIGHT),CV_8UC1);
cv::Mat matROI;
std::vector <cv::String> filename;
std::vector<std::vector<cv::Point> > ptContours;
std::vector<cv::Vec4i> v4iHierarchy;
int count = 0, contoursCount = 0;
matSamples = cv::Mat(cv::Size(36, 496), CV_32FC1);
matTrainingImagesAsFlattenedFloats = cv::Mat(cv::Size(600, 496), CV_32FC1);
for (int j = 0; j <= 35; j++) {
int tmp = j;
cv::String folder = "./Training Data/" + std::to_string(tmp);
cv::glob(folder, filename);
for (int k = 0; k < filename.size(); k++) {
count++;
// If the file is a directory (or is in some way invalid) we'll skip it
// if (stat(filepath.c_str(), &filestat)) continue;
//if (S_ISDIR(filestat.st_mode)) continue;
imgTrainingNumbers = cv::imread(filename[k]);
imgArea = imgTrainingNumbers.cols*imgTrainingNumbers.rows;
// read in training numbers image
minArea = imgArea * 50 / 100;
if (imgTrainingNumbers.empty()) {
std::cout << "error: image not read from file\n\n";
//return(0);
}
cv::cvtColor(imgTrainingNumbers, imgGrayscale, CV_BGR2GRAY);
//cv::equalizeHist(imgGrayscale, imgGrayscale);
imgThresh = cv::Mat(cv::Size(imgGrayscale.cols, imgGrayscale.rows), CV_8UC1);
/*cv::adaptiveThreshold(imgGrayscale,
imgThresh,
255,
cv::ADAPTIVE_THRESH_GAUSSIAN_C,
cv::THRESH_BINARY_INV,
3,
0);
*/
for (int i = 0; i < imgGrayscale.cols; i++) {
for (int j = 0; j < imgGrayscale.rows; j++) {
if (imgGrayscale.at<uchar>(j, i) <= 130) {
imgThresh.at<uchar>(j, i) = 255;
}
else {
imgThresh.at<uchar>(j, i) = 0;
}
}
}
// cv::imshow("imgThresh"+std::to_string(count), imgThresh);
imgThreshCopy = imgThresh.clone();
cv::GaussianBlur(imgThreshCopy,
imgBlurred,
cv::Size(5, 5),
0);
cv::Mat imgCanny;
// cv::Canny(imgBlurred,imgCanny,20,40,3);
cv::findContours(imgBlurred,
ptContours,
v4iHierarchy,
cv::RETR_EXTERNAL,
cv::CHAIN_APPROX_SIMPLE);
for (int i = 0; i < ptContours.size(); i++) {
if (cv::contourArea(ptContours[i]) > MIN_CONTOUR_AREA) {
contoursCount++;
cv::Rect boundingRect = cv::boundingRect(ptContours[i]);
cv::rectangle(imgTrainingNumbers, boundingRect, cv::Scalar(0, 0, 255), 2); // draw red rectangle around each contour as we ask user for input
matROI = imgThreshCopy(boundingRect); // get ROI image of bounding rect
std::string path = "./" + std::to_string(contoursCount) + ".JPG";
cv::imwrite(path, matROI);
// cv::imshow("matROI" + std::to_string(count), matROI);
cv::resize(matROI, matROIResized, cv::Size(RESIZED_IMAGE_WIDTH, RESIZED_IMAGE_HEIGHT)); // resize image, this will be more consistent for recognition and storage
std::cout << filename[k] << " " << contoursCount << "\n";
//cv::imshow("matROI", matROI);
//cv::imshow("matROIResized"+std::to_string(count), matROIResized);
// cv::imshow("imgTrainingNumbers" + std::to_string(contoursCount), imgTrainingNumbers);
int intChar;
if (j<10)
intChar = j + 48;
else {
intChar = j + 55;
}
/*if (intChar == 27) { // if esc key was pressed
return(0); // exit program
}*/
// if (std::find(intValidChars.begin(), intValidChars.end(), intChar) != intValidChars.end()) { // else if the char is in the list of chars we are looking for . . .
// append classification char to integer list of chars
cv::Mat matImageFloat;
matROIResized.convertTo(matImageFloat,CV_32FC1);// now add the training image (some conversion is necessary first) . . .
//matROIResized.convertTo(matImageFloat, CV_32FC1); // convert Mat to float
cv::Mat matImageFlattenedFloat = matImageFloat.reshape(1, 1);
//matTrainingImagesAsFlattenedFloats.push_back(matImageFlattenedFloat);// flatten
try {
//matTrainingImagesAsFlattenedFloats.push_back(matImageFlattenedFloat);
std::cout << matTrainingImagesAsFlattenedFloats.rows << " " << matTrainingImagesAsFlattenedFloats.cols;
//unsigned char* re;
int ii = 0; // Current column in training_mat
for (int i = 0; i<matImageFloat.rows; i++) {
for (int j = 0; j < matImageFloat.cols; j++) {
matTrainingImagesAsFlattenedFloats.at<float>(contoursCount-1, ii++) = matImageFloat.at<float>(i,j);
}
}
}
catch (std::exception &exc) {
f = false;
exc.what();
}
if (f) {
matClassificationInts.push_back((float)intChar);
matSamples.at<float>(contoursCount-1, j) = 1.0;
}
f = true;
// add to Mat as though it was a vector, this is necessary due to the
// data types that KNearest.train accepts
} // end if
//} // end if
} // end for
}//end i
}//end j
}
Output of predict function
Unfortunately, I don't have the necessary time to really review the code, but I can say off the top that to train a model that performs well for prediction with 36 classes, you will need several things:
A large number of good quality images. Ideally, you'd want thousands of images for each class. Of course, you can see somewhat decent results with less than that, but if you only have a few images per class, it's never going to be able to generalize adequately.
You need a model that is large and sophisticated enough to provide the necessary expressiveness to solve the problem. For a problem like this, a plain old multi-layer perceptron with one hidden layer with 100 units may not be enough. This is actually a problem that would benefit from using a Convolutional Neural Net (CNN) with a couple layers just to extract useful features first. But assuming you don't want to go down that path, you may at least want to tweak the size of your hidden layer.
To even get to a point where the training process converges, you will probably need to experiment and crucially, you need an effective way to test the accuracy of the ANN after each experiment. Ideally, you want to observe the loss as the training is proceeding, but I'm not sure whether that's possible using OpenCV's ML functionality. At a minimum, you should fully expect to have to play around with the various so-called "hyper-parameters" and run many experiments before you have a reasonable model.
Anyway, the most important thing is to make sure you have a solid mechanism for validating the accuracy of the model after training. If you aren't already doing so, set aside some images as a separate test set, and after each experiment, use the trained ANN to predict each test image to see the accuracy.
One final general note: what you're trying to do is complex. You will save yourself a huge number of headaches if you take the time early and often to refactor your code. No matter how many experiments you run, if there's some defect causing (for example) your training data to be fundamentally different in some way than your test data, you will never see good results.
Good luck!
EDIT: I should also point out that seeing the same result for every input image is a classic sign that training failed. Unfortunately, there are many reasons why that might happen and it will be very difficult for anyone to isolate that for you without some cleaner code and access to your image data.
I have solved the issue of not getting the output of predict. The issue was created because of the input Mat image to train (ie. matTrainingImagesAsFlattenedFloats) was having values 255.0 for a white pixel. This happened because I haven't use convertTo() properly. You need to use convertTo(OutputImage name, CV_32FC1, 1.0 / 255.0); like this which will convert all the pixel values with 255.0 to 1.0 and after that I am getting the correct output.
Thank you for all the help.
This is too broad to be in one question. Sorry for the bad news. I tried this over and over and couldn't find a solution. I recommend that you implement a simple AND, OR or XOR first just to make sure that the learning part is working and that you are getting better results the more passes you do. Also I suggest to try the Tangent Hyperbolic as a Transfer Function instead of Sigmoid. And Good luck!
Here is some of my own posts that might help you:
Exact results as yours: HERE
Some codes: HERE
I don't want to say that, but several professors I met said Backpropagation just doesn't work and they had (and me have) to implement my own method of teaching the network.

OpenCv see each pixel value in an image

I'm working on Connected Component Labeling (CCL) operation in OpenCv (in C++ language). To see whether CCL works reliably, I must check each pixel value in the image while debugging. I have tried saving the result of CCL as an image, however I could not reach digital values of the pixels. Is there any way of doing this during debugging operation?
As already mentioned by #Gombat and e.g. here, in Visual Studio you can install Image Watch.
If you want to save the values of a Mat to a text file, you don't need to reinvent anything (check OpenCV Mat: the basic image container).
You can for example save a csv file simply like:
Mat img;
// ... fill matrix somehow
ofstream fs("test.csv");
fs << format(img, "csv");
Full example:
#include <opencv2\opencv.hpp>
#include <iostream>
#include <fstream>
using namespace std;
using namespace cv;
int main()
{
// Just a green image
Mat3b img(10,5,Vec3b(0,255,0));
ofstream fs("test.csv");
fs << format(img, "csv");
return 0;
}
Convert the CCL matrix into values in the range [0, 255] and save it as an image. For example:
cv::Mat ccl = ...; // ccl operation returning CV_8U
double min, max;
cv::minMaxLoc(ccl, &min, &max);
cv::Mat image = ccl * (255. / max);
cv::imwrite("ccl.png", image);
Or store all the values in a file:
std::ofstream f("ccl.txt");
f << "row col value" << std::endl;
for (int r = 0; r < ccl.rows; ++r) {
unsigned char* row = ccl.ptr<unsigned char>(r);
for (int c = 0; c < ccl.cols; ++c) {
f << r << " " << c << " " << static_cast<int>(row[c]) << std::endl;
}
}
Of course there is, but it depends on the type of image you use.
http://docs.opencv.org/doc/user_guide/ug_mat.html#accessing-pixel-intensity-values
Which IDE do you use for debugging? There is a Visual Studio opencv plugin:
http://opencv.org/image-debugger-plug-in-for-visual-studio.html
https://visualstudiogallery.msdn.microsoft.com/e682d542-7ef3-402c-b857-bbfba714f78d
To simply print a cv::Mat of type CV_8UC1 to a text file, use the code below:
// create the image
int rows(4), cols(3);
cv::Mat img(rows, cols, CV_8UC1);
// fill image
for ( int r = 0; r < rows; r++ )
{
for ( int c = 0; c < cols; c++ )
{
img.at<unsigned char>(r, c) = std::min(rows + cols - (r + c), 255);
}
}
// write image to file
std::ofstream out( "output.txt" );
for ( int r = -1; r < rows; r++ )
{
if ( r == -1 ){ out << '\t'; }
else if ( r >= 0 ){ out << r << '\t'; }
for ( int c = -1; c < cols; c++ )
{
if ( r == -1 && c >= 0 ){ out << c << '\t'; }
else if ( r >= 0 && c >= 0 )
{
out << static_cast<int>(img.at<unsigned char>(r, c)) << '\t';
}
}
out << std::endl;
}
Simply replace img, rows, cols with your vars and leave the "fill image" part aside and it should work. In the first row and column are the indices of that row / column. "output.txt" will be left in your debugging working directory you can specify in the projects debugging settings in visual studio.