Object detection with every pixel information OpenCV - c++

Input and Output Images of my code are here.
I want output as complete object detection with every pixel. Here I get with some shadows as well as other background pixels and missing some object points.
So can anybody have idea how can I get complete object detection (foreground detection) with this input images (object image and background image)?
Below is the code I have tried.
cv::Mat ImgObject, ImgBck;
ImgObject = imread("Object.jpg");
ImgBck = imread("Background.jpg");
imwrite("ImgObject.jpg", ImgObject);
imwrite("ImgBck.jpg", ImgBck);
cv::Mat diffImage;
ImgBck = ImgBck + Scalar(-20, -20 - 20);/* decrease brightness of background
because of brightness changes after putting object */
cv::absdiff(ImgObject, ImgBck, diffImage);
float threshold = (float)50;
float dist = 0.0f;
for (int j = 0; j < diffImage.rows; ++j)
{
for (int i = 0; i<diffImage.cols; ++i)
{
cv::Vec3b pix = diffImage.at<cv::Vec3b>(j, i);
dist = (pix[0] * pix[0] + pix[1] * pix[1] + pix[2] * pix[2]);
dist = sqrt(dist);
cv::Point3_<uchar>* pFinal = ImgObject.ptr<Point3_<uchar> >(j, i);
if (dist <= threshold)
{
pFinal->x = 255; // fill blue as background
pFinal->y = 0;
pFinal->z = 0;
}
}
}
imwrite("Obj.jpg", ImgObject);
ImgObject.release();
ImgBck.release();

Do not use direct light on the object(To reduce Shadow and Reflection).

Firstly, I need to say that this is not an object detection task, but a saliency detection or segmentation task.
Second, as #Kartik Maheshwari said, you are facing a lightning issue which is not a solved problem in Computer Vision.
As an alternative answer, take a look at this.

Related

Make a mosaic image (bitmap format)

I want to make a mosaic photo with different window-size (that has been determined by user). This is just like a first draft of the code but I have problems to get the pixels and calculating averages. Then put the avarage value in each pixel and continue to the end. Even I get error to converting them of diffrent types: (Also the other part manufacturers a gray-scale image)
p.s: sorry that I am in the very first steps of learning image processing.
''' void CImageProcessingDoc::OnProcessMosaic()
{
if (m_pImage) {
DlgMosaicOption dlg;
if (dlg.DoModal() == IDOK) {
DWORD dwWindowSize = dlg.m_dwWindowSize;
DWORD width = m_pImage->GetWidth();
DWORD height = m_pImage->GetHeight();
RGBQUAD color;
RGBQUAD newcolor;
float X_step = width / dwWindowSize;
float Y_step = height / dwWindowSize;
int avg, pixel;
for (DWORD y = 0; y < dwWindowSize; y++) {
for (DWORD x = 0; x < dwWindowSize; x++) {
color = m_pImage->GetPixelColor(x, y);
(RGBQUAD) pixel = m_pImage->GetPixelColor(x, y);
avR += (int)(color.red(pixel);
avG += (int)(color.green(pixel);
avB += (int)(color.blue(pixel);
newcolor.rgbBlue = (BYTE)RGB2GRAY(color.rgbRed, color.rgbGreen, color.rgbBlue);
newcolor.rgbGreen = (BYTE)RGB2GRAY(color.rgbRed, color.rgbGreen, color.rgbBlue);
newcolor.rgbRed = (BYTE)RGB2GRAY(color.rgbRed, color.rgbGreen, color.rgbBlue);
m_pImage->SetPixelColor(x, y, newcolor);
}
}
}
}
} '''
Could anyone please help me to understand the problem?
I think you are mixing up spatial, spectral and temporal average here.
Spatial average
This is the operation of computing average of pixels over an area.
You have to compute eR = 1/N * (P0.R + P1.R + P2.R + P3.R + ...), eG = 1/N * (P0.G + P1.G + ...), eB = 1/N * (P0.B + P1.B + ...)
You'll get a pixel with as many color as there was in the input picture, but with limited spatial frequency, a picture computed like this will appear blurred, with no details
Spectral average
This is the operation of computing average of the components (spectrum) of each pixels.
You have to compute e = 1/3 * (P0.R + P0.G + P0.B)
You'll get a monochromic picture with the exact same spatial frequency as the initial picture.
Temporal average
While you haven't talked about it, this is for reference. The idea is to compute the average of each pixel, and each component for N pictures in a temporal sequence
This gives a kind of motion blurred picture.
Answer
If I understand your question correctly, you want spectral average to convert a RGB to the average grey value taken that grey = (R+G+B)/3.
Thus, you pixel loop should look like this:
for (DWORD y = 0; y < dwWindowSize; y++) {
for (DWORD x = 0; x < dwWindowSize; x++) {
color = m_pImage->GetPixelColor(x, y);
float avg = (color.rgbRed + color.rgbGreen + color.rgbBlue) / 3.f;
m_pImage->SetPixelColor(x, y, RGBQUAD(avg, avg, avg, 1.0f));
}
}
Please notice that converting non linear RGB (usually called sRGB) to luminance using the average is a poor formula for RGB to grayscale conversion. You should read about RGB to Lab* conversion (you are interested in L part only) or at least RGB to YUV (you are interested to Y part only).
If your question is about resizing the input picture, then you are not using the appropriate algorithm, what you want is called resampling.

Asus Zenfone AR correlate depth and colour image

Hello, I am using the depth and colour images from Google Tango, so that I can load the image into Meshlab. There is a related question, where the goal is to find the colour of each point in the Tango Point Cloud. However, I would like to go the other way. For each pixel of the colour image, how do I find the corresponding depth?
I have upsampled the depth image and saved the result in the TangoDepthBuffer. I have used the OpenGL readPixels() method, to get the colour image and store the RGB values in an array called pixels[]. I then correlate the x, y, z values with the RGB values using the following code:
index_rgb = 0;
index_pixels = 0;
for(int i = 0; i < color_camera_width; i++)
{
for(int j = 0; j < color_camera_height; j++)
{
red [index_rgb] = pixels[color_camera_width * color_camera_height * 3 - 3 - index_pixels];
green [index_rgb] = pixels[color_camera_width * color_camera_height * 3 - 2 - index_pixels];
blue [index_rgb] = pixels[color_camera_width * color_camera_height * 3 - 1 - index_pixels];
z[index_rgb] = render_point_cloud_buffer->depths[j * color_camera_width + i];
x[index_rgb] = (double) (i - color_camera_width/2);
y[index_rgb] = (double) (j - color_camera_height/2);
x[index_rgb] = (x[index_rgb] / color_camera_width) * depth_camera_horizontal_fov;
y[index_rgb] = (y[index_rgb] / color_camera_height) * depth_camera_vertical_fov;
x[index_rgb] = z[index_rgb] * tan(x[index_rgb]);
y[index_rgb] = z[index_rgb] * tan(y[index_rgb]);
index_rgb++;
index_pixels += 3;
}
}
I would expect the result to align the depth and colour images. However, when I load the result into Meshlab, the depth pixels are shifted down and to the left of the corresponding colour pixels. The manner in which this shift occurs varies based on the depth. However, I cannot find a depth where there is no shift.
How do you find the transformation required to fix this? Will it work for any depth? Alternatively, how do you find the depth at each specific colour pixel?

OpenCV VLFeat Slic function call

I am trying to use the vl_slic_segment function of the VLFeat library using an input image stored in an OpenCV Mat. My code is compiling and running, but the output superpixel values do not make sense. Here is my code so far :
Mat bgrUChar = imread("/pathtowherever/image.jpg");
Mat bgrFloat;
bgrUChar.convertTo(bgrFloat, CV_32FC3, 1.0/255);
cv::Mat labFloat;
cvtColor(bgrFloat, labFloat, CV_BGR2Lab);
Mat labels(labFloat.size(), CV_32SC1);
vl_slic_segment(labels.ptr<vl_uint32>(),labFloat.ptr<const float>(),labFloat.cols,labFloat.rows,labFloat.channels(),30,0.1,25);
I have tried not converting it to the Lab colorspace and setting different regionSize/regularization, but the output is always very glitchy. I am able to retrieve the label values correctly, the thing is the every labels is usually scattered on a little non-contiguous area.
I think the problem is the format of my input data is wrong but I can't figure out how to send it properly to the vl_slic_segment function.
Thank you in advance!
EDIT
Thank you David, as you helped me understand, vl_slic_segment wants data ordered as [LLLLLAAAAABBBBB] whereas OpenCV is ordering its data [LABLABLABLABLAB] for the LAB color space.
In the course of my bachelor thesis I have to use VLFeat's SLIC implementation as well. You can find a short example applying VLFeat's SLIC on Lenna.png on GitHub: https://github.com/davidstutz/vlfeat-slic-example.
Maybe, a look at main.cpp will help you figuring out how to convert the images obtained by OpenCV to the right format:
// OpenCV can be used to read images.
#include <opencv2/opencv.hpp>
// The VLFeat header files need to be declared external.
extern "C" {
#include "vl/generic.h"
#include "vl/slic.h"
}
int main() {
// Read the Lenna image. The matrix 'mat' will have 3 8 bit channels
// corresponding to BGR color space.
cv::Mat mat = cv::imread("Lenna.png", CV_LOAD_IMAGE_COLOR);
// Convert image to one-dimensional array.
float* image = new float[mat.rows*mat.cols*mat.channels()];
for (int i = 0; i < mat.rows; ++i) {
for (int j = 0; j < mat.cols; ++j) {
// Assuming three channels ...
image[j + mat.cols*i + mat.cols*mat.rows*0] = mat.at<cv::Vec3b>(i, j)[0];
image[j + mat.cols*i + mat.cols*mat.rows*1] = mat.at<cv::Vec3b>(i, j)[1];
image[j + mat.cols*i + mat.cols*mat.rows*2] = mat.at<cv::Vec3b>(i, j)[2];
}
}
// The algorithm will store the final segmentation in a one-dimensional array.
vl_uint32* segmentation = new vl_uint32[mat.rows*mat.cols];
vl_size height = mat.rows;
vl_size width = mat.cols;
vl_size channels = mat.channels();
// The region size defines the number of superpixels obtained.
// Regularization describes a trade-off between the color term and the
// spatial term.
vl_size region = 30;
float regularization = 1000.;
vl_size minRegion = 10;
vl_slic_segment(segmentation, image, width, height, channels, region, regularization, minRegion);
// Convert segmentation.
int** labels = new int*[mat.rows];
for (int i = 0; i < mat.rows; ++i) {
labels[i] = new int[mat.cols];
for (int j = 0; j < mat.cols; ++j) {
labels[i][j] = (int) segmentation[j + mat.cols*i];
}
}
// Compute a contour image: this actually colors every border pixel
// red such that we get relatively thick contours.
int label = 0;
int labelTop = -1;
int labelBottom = -1;
int labelLeft = -1;
int labelRight = -1;
for (int i = 0; i < mat.rows; i++) {
for (int j = 0; j < mat.cols; j++) {
label = labels[i][j];
labelTop = label;
if (i > 0) {
labelTop = labels[i - 1][j];
}
labelBottom = label;
if (i < mat.rows - 1) {
labelBottom = labels[i + 1][j];
}
labelLeft = label;
if (j > 0) {
labelLeft = labels[i][j - 1];
}
labelRight = label;
if (j < mat.cols - 1) {
labelRight = labels[i][j + 1];
}
if (label != labelTop || label != labelBottom || label!= labelLeft || label != labelRight) {
mat.at<cv::Vec3b>(i, j)[0] = 0;
mat.at<cv::Vec3b>(i, j)[1] = 0;
mat.at<cv::Vec3b>(i, j)[2] = 255;
}
}
}
// Save the contour image.
cv::imwrite("Lenna_contours.png", mat);
return 0;
}
In addition, have a look at README.md within the GitHub repository. The following figures show some example outputs of setting the regularization to 1 (100,1000) and setting the region size to 30 (20,40).
Figure 1: Superpixel segmentation with region size set to 30 and regularization set to 1.
Figure 2: Superpixel segmentation with region size set to 30 and regularization set to 100.
Figure 3: Superpixel segmentation with region size set to 30 and regularization set to 1000.
Figure 4: Superpixel segmentation with region size set to 20 and regularization set to 1000.
Figure 5: Superpixel segmentation with region size set to 20 and regularization set to 1000.

Converting Kinect depth image to Real world coordinate

I'm working with the kinect, using OpenNI 2.x, c++, OpenCV.
I am able to get the kinect depth streaming and obtain a grey-scale cv::Mat. just to show how it is defined:
cv::Mat m_depthImage;
m_depthImage= cvCreateImage(cvSize(640, 480), 8, 1);
I suppose that the closest value is represented by "0" and the farthest by "255".
After that, I make a conversion between depth coordinates to World coordinates. I do it element by element of the cv::Mat grey-scale matrix, and i collect data in PointsWorld[640*480].
In order to display these data, I adjust the scale in order to collect value in a 2000x2000x2000 matrix.
cv::Point3f depthPoint;
cv::Point3f PointsWorld[640*480];
for (int j=0;j<m_depthImage.rows;j++)
{
for(int i=0;i<m_depthImage.cols; i++)
{
depthPoint.x = (float) i;
depthPoint.y = (float) j;
depthPoint.z = (float) m_depthImage.at<unsigned char>(j, i);
if (depthPoint.z!=255)
{
openni::CoordinateConverter::convertDepthToWorld(*m_depth,depthPoint.x,depthPoint.y,depthPoint.z, &wx,&wy,&wz);
wx = wx*7,2464; //138->1000
if (wx<-999) wx = -999;
if (wx>999) wx = 999;
wy = wy*7,2464; //111->1000 with 9,009
if (wy<-999) wy = -999;
if (wy>999) wy = 999;
wz=wz*7,8431; //255->2000
if (wz>1999) wy = 1999;
Xsp = P-floor(wx);
Ysp = P+floor(wy);
Zsp = 2*P-floor(wz);
PointsWorld[k].x = Xsp;
PointsWorld[k].y = Ysp;
PointsWorld[k].z = Zsp;
k++;
}
}
}
But i'm sure that doing that do not allow me to have a comprehension of the real distance between points. An x,y,z coordinate what will mean?
There is a way to know the real distance between points, to know how much far is, for example, the grey value of the matrix "255"? and the wx,wy,wz what they are for?
If you have OpenCV built with OpenNI support you should be able to do something like:
int ptcnt;
cv::Mat real;
cv::Point3f PointsWorld[640*480];
if( capture.retrieve(real, CV_CAP_OPENNI_POINT_CLOUD_MAP)){
for (int j=0;j<m_depthImage.rows;j++)
{
for(int i=0;i<m_depthImage.cols; i++){
PointsWorld[ptcnt] = real.at<cv::Vec3f>(i,j);
ptcnt++;
}
}
}

Accessing certain pixel RGB value in openCV

I have searched internet and stackoverflow thoroughly, but I haven't found answer to my question:
How can I get/set (both) RGB value of certain (given by x,y coordinates) pixel in OpenCV? What's important-I'm writing in C++, the image is stored in cv::Mat variable. I know there is an IplImage() operator, but IplImage is not very comfortable in use-as far as I know it comes from C API.
Yes, I'm aware that there was already this Pixel access in OpenCV 2.2 thread, but it was only about black and white bitmaps.
EDIT:
Thank you very much for all your answers. I see there are many ways to get/set RGB value of pixel. I got one more idea from my close friend-thanks Benny! It's very simple and effective. I think it's a matter of taste which one you choose.
Mat image;
(...)
Point3_<uchar>* p = image.ptr<Point3_<uchar> >(y,x);
And then you can read/write RGB values with:
p->x //B
p->y //G
p->z //R
Try the following:
cv::Mat image = ...do some stuff...;
image.at<cv::Vec3b>(y,x); gives you the RGB (it might be ordered as BGR) vector of type cv::Vec3b
image.at<cv::Vec3b>(y,x)[0] = newval[0];
image.at<cv::Vec3b>(y,x)[1] = newval[1];
image.at<cv::Vec3b>(y,x)[2] = newval[2];
The low-level way would be to access the matrix data directly. In an RGB image (which I believe OpenCV typically stores as BGR), and assuming your cv::Mat variable is called frame, you could get the blue value at location (x, y) (from the top left) this way:
frame.data[frame.channels()*(frame.cols*y + x)];
Likewise, to get B, G, and R:
uchar b = frame.data[frame.channels()*(frame.cols*y + x) + 0];
uchar g = frame.data[frame.channels()*(frame.cols*y + x) + 1];
uchar r = frame.data[frame.channels()*(frame.cols*y + x) + 2];
Note that this code assumes the stride is equal to the width of the image.
A piece of code is easier for people who have such problem. I share my code and you can use it directly. Please note that OpenCV store pixels as BGR.
cv::Mat vImage_;
if(src_)
{
cv::Vec3f vec_;
for(int i = 0; i < vHeight_; i++)
for(int j = 0; j < vWidth_; j++)
{
vec_ = cv::Vec3f((*src_)[0]/255.0, (*src_)[1]/255.0, (*src_)[2]/255.0);//Please note that OpenCV store pixels as BGR.
vImage_.at<cv::Vec3f>(vHeight_-1-i, j) = vec_;
++src_;
}
}
if(! vImage_.data ) // Check for invalid input
printf("failed to read image by OpenCV.");
else
{
cv::namedWindow( windowName_, CV_WINDOW_AUTOSIZE);
cv::imshow( windowName_, vImage_); // Show the image.
}
The current version allows the cv::Mat::at function to handle 3 dimensions. So for a Mat object m, m.at<uchar>(0,0,0) should work.
uchar * value = img2.data; //Pointer to the first pixel data ,it's return array in all values
int r = 2;
for (size_t i = 0; i < img2.cols* (img2.rows * img2.channels()); i++)
{
if (r > 2) r = 0;
if (r == 0) value[i] = 0;
if (r == 1)value[i] = 0;
if (r == 2)value[i] = 255;
r++;
}
const double pi = boost::math::constants::pi<double>();
cv::Mat distance2ellipse(cv::Mat image, cv::RotatedRect ellipse){
float distance = 2.0f;
float angle = ellipse.angle;
cv::Point ellipse_center = ellipse.center;
float major_axis = ellipse.size.width/2;
float minor_axis = ellipse.size.height/2;
cv::Point pixel;
float a,b,c,d;
for(int x = 0; x < image.cols; x++)
{
for(int y = 0; y < image.rows; y++)
{
auto u = cos(angle*pi/180)*(x-ellipse_center.x) + sin(angle*pi/180)*(y-ellipse_center.y);
auto v = -sin(angle*pi/180)*(x-ellipse_center.x) + cos(angle*pi/180)*(y-ellipse_center.y);
distance = (u/major_axis)*(u/major_axis) + (v/minor_axis)*(v/minor_axis);
if(distance<=1)
{
image.at<cv::Vec3b>(y,x)[1] = 255;
}
}
}
return image;
}