Open CV Mat structure coordinates

Open CV Mat structure coordinates - c++

I am confused about the coordinates in OpenCV Mat structure. When I want to get a pixel I do something like this
image.at<Vec3b>(i,j)
The question is whether (0,0) coordinate is the top-left corner coordinate. I'm not sure about that, because when I try to get (-100,-100) it still works and gets a pixel.

Yes it is the top-left.
From official documentation (for all pixel-access methods) here :
the 0-based row index (or y-coordinate) goes first and the 0-based
column index (or x-coordinate) follows it
The at(-100,100) works because it is allowed to read (fast) everywhere in memory, but the data you get is not a pixel.

Related

Disparity Map Block Matching

I am writing a disparity matching algorithm using block matching, but I am not sure how to find the corresponding pixel values in the secondary image.
Given a square window of some size, what techniques exist to find the corresponding pixels? Do I need to use feature matching algorithms or is there a simpler method, such as summing the pixel values and determining whether they are within some threshold, or perhaps converting the pixel values to binary strings where the values are either greater than or less than the center pixel?

I'm going to assume you're talking about Stereo Disparity, in which case you will likely want to use a simple Sum of Absolute Differences (read that wiki article before you continue here). You should also read this tutorial by Chris McCormick before you read more here.
side note: SAD is not the only method, but it's really common and should solve your problem.
You already have the right idea. Make windows, move windows, sum pixels, find minimums. So I'll give you what I think might help:
To start:
If you have color images, first you will want to convert them to black and white. In python you might use a simple function like this per pixel, where x is a pixel that contains RGB.
def rgb_to_bw(x):
return int(x[0]*0.299 + x[1]*0.587 + x[2]*0.114)
You will want this to be black and white to make the SAD easier to computer. If you're wondering why you don't loose significant information from this, you might be interested in learning what a Bayer Filter is. The Bayer Filter, which is typically RGGB, also explains the multiplication ratios of the Red, Green, and Blue portions of the pixel.
Calculating the SAD:
You already mentioned that you have a window of some size, which is exactly what you want to do. Let's say this window is n x n in size. You would also have some window in your left image WL and some window in your right image WR. The idea is to find the pair that has the smallest SAD.
So, for each left window pixel pl at some location in the window (x,y) you would the absolute value of difference of the right window pixel pr also located at (x,y). you would also want some running value, which is the sum of these absolute differences. In sudo code:
SAD = 0
from x = 0 to n:
from y = 0 to n:
SAD = SAD + absolute_value|pl - pr|
After you calculate the SAD for this pair of windows, WL and WR you will want to "slide" WR to a new location and calculate another SAD. You want to find the pair of WL and WR with the smallest SAD - which you can think of as being the most similar windows. In other words, the WL and WR with the smallest SAD are "matched". When you have the minimum SAD for the current WL you will "slide" WL and repeat.
Disparity is calculated by the distance between the matched WL and WR. For visualization, you can scale this distance to be between 0-255 and output that to another image. I posted 3 images below to show you this.
Typical Results:
Left Image:
Right Image:
Calculated Disparity (from the left image):
you can get test images here: http://vision.middlebury.edu/stereo/data/scenes2003/

Depth/Disparity Map from a moving camera in OpenCV

Is that possible to get the depth/disparity map from a moving camera? Let say I capture an image at x location, after I travelled let say 5cm and I capture another picture, and from there I calculate the depth map of the image.
I have tried using BlockMatching in opencv but the result is not good.The first and second image are as following:
first image,second image,
disparity map (colour),disparity map
My code is as following:
GpuMat leftGPU;
GpuMat rightGPU;
leftGPU.upload(left);rightGPU.upload(right);
GpuMat disparityGPU;
GpuMat disparityGPU2;
Mat disparity;Mat disparity1,disparity2;
Ptr<cuda::StereoBM> stereo = createStereoBM(256,3);
stereo->setMinDisparity(-39);
stereo->setPreFilterCap(61);
stereo->setPreFilterSize(3);
stereo->setSpeckleRange(1);
stereo->setUniquenessRatio(0);
stereo->compute(leftGPU,rightGPU,disparityGPU);
drawColorDisp(disparityGPU, disparityGPU2,256);
disparityGPU.download(disparity);
disparityGPU2.download(disparity2);
imshow("display img",disparityGPU);
how can I improve upon this? From the colour disparity map, there are quite a lot error (ie. the tall circle is red in colour and it is the same as some of the part of the table.). Also,from the disparity map, there are small noise (all the black dots in the picture), how can I pad those black dots with nearby disparities?

It is possible if the object is static.
To properly do stereo matching, you first need to rectify your images! If you don't have calibrated cameras, you can do this from detected feature points. Also note that for cuda::StereoBM the minimum default disparity is 0. (I have never used cuda, but I don't think your setMinDisparity is doing anything, see this anser.)
Now, in your example images corresponding points are only about 1 row apart, therefore your disparity map actually doesn't look too bad. Maybe having a larger blockSize would already do in this special case.
Finally, your objects have very low texture, therefore the block matching algorithm can't detect much.

OpenCV - odd HSV range detection

I have a Qt app where I have to find the HSV range of a couple of pixels around click coordinates, to track later on. This is how I do it:
cv::Mat temp;
cv::cvtColor(frame, temp, CV_BGR2HSV); //frame is pulled from a video or jpeg
cv::Vec3b hsv=temp.at<cv::Vec3b>(frameX,frameY); //sometimes SIGSEGV?
qDebug() << hsv.val[0]; //look up H
qDebug() << hsv.val[1]; //look up S
qDebug() << hsv.val[2]; //look up V
//just base values so far, will work on range later
emit hsvDownloaded(hsv.val[0], hsv.val[0]+5, hsv.val[1], 255, hsv.val[2], 255); //send to GUI which automaticly updates worker thread
Now, things are odd. Those are the results (red circle indicates the click location):
With red it's weird, upper half of the shape is detected correctly, lower half is not, despite it being a solid mass of the same colour.
And for an actual test
It detects HSV {95,196,248} which is frankly absurd (base values way too high). None of the pixels that were detected isn't even the one that was clicked. The best values to detect that ball 100% of the time are H:35-141 S:0-238 V:65-255. I've wanted to get a HSV range from a normalized histogram, but I can't even get the base values right. What's up? When OpenCV pulls a frame using kalibrowanyPlik.read(frame); , the default colour scheme is BGR, right?
Why would the colour detection work so randomly?

As berak has mentioned, your code looks like you've used the indices to access pixel in the wrong order.
That means your pixel locations are wrong, except for pixel that lie on the diagonal, so clicked objects that are around the diagonal will be detected correctly, while all the others won't.
To not get confused again and again, I want you to understand why OpenCV uses (row,col) ordering for indices:
OpenCV uses matrices to represent images. In mathematics, 2D matrices use (row,col) indexing, have a look at http://en.wikipedia.org/wiki/Index_notation#Two-dimensional_arrays and watch at the indices. So for matrices, it is typical to use the row index first, followed by the column index.
Unfortunately, images and pixel typically have a (x,y) indexing, which corresponds to x/y axis/direction in mathematical graphs and coordinate systems. So here the x position is used first, followed by the y position.
Luckily, OpenCV provides two different versions of .at method, one to access pixel-positions and one to access matrix elements (which are exactly the same elements in the end).
matrix.at<type>(row,column) // matrix indexing to access elements
// which equals
matrix.at<type>(y,x)
and
matrix.at<type>(cv::Point(x,y)) // pixel/position indexing to access elements
since the first version should be slightly more efficient it should be preferred if the positions aren't already given as cv::Point objects. So the best way often is to remember, that openCV uses matrices to represent images and it uses matric index notations to access elements.
btw, I've seen people wondering why matrix.at<type>(cv::Point(y,x)) doesn't work the way intended after they've learned that openCV images use the "wrong ordering". I hope this question doesn't come up after my explanation.
one more btw: in school I already wondered, why matrices index rows first, while graphs of functions index x axis first. I found it stupid to not use the "same" ordering for both but I still had to live with it :D (and at the end, both don't have much to do with the other)

cv:Mat, every second pixel is set

I'm new to CV and I'm coming up with a question.
I want to create a fading grey bar (from black to white).
So i initializied a Mat:
Mat fadedgrey=Mat(20,256,CV_8UC1);
when I write the pixelvalues:
for(int x=0;x<20;x++){
for(int y=0;y<256;y++){
fadedgrey.at<int>(x,y)=y;}}
the result is the following:
only every second column is written, but I thought CV_8UC1 is one-channel, not a two-channel Mat.
For example the value set at Position (1,129) comes up with a Pixel in the beginning of second row.
Help me!
Greetings!

If your matrix is of type CV_8UC1, then each element is one byte in size and you should be using .at<uchar> or similar, rather than .at<int>.
Although this isn't your problem, you might also end-up confused about rows and columns, as your Mat constructor takes nRows,nCols, which is the opposite way around to x,y

How to rotate yuv420 data?

I need to know how to rotate an image, which is in yuv420p format by 90 degrees. The option of converting this to rgb, rotating and again reconverting to yuv is not feasible. Even an algorithm would help.
Regards,
Anirudh.

In case the image is yuv420 planar, this is how the image data is encoded.
Planar meaning the y section is first, followed by U section and then with V section.
Considering the width of the image w, and height of the image h.
The total size of the image is w*h*3/2
The Y section also called luminescence occupies w*h.
there is a U pixel and V pixel for every 2x2 block in Y section.
the U section comes next, occupies (w/2)*(h/2) and is laid at an offset w*h from beginning of the image.
the V section follows, occupies (w/2)*(h/2) and is laid at an offset of (w*h)+((w*h)/4).
In order to rotate the image by 90 degrees, you essentially copy this w*h array to an array of h*w
As mentioned in above post, you simply need to copy each of the 3 above Y, U, V blocks separately.
Start with the Y section. The 1st pixel to be copied is at (h-1)*w in Source Array, copy this to (0,0) of destination array. The 2nd pixel is at (h-2)*w and so on...
Remember that the U and V sections are only (w/2)*(h/2)
Next copy the U section. The first pixel to be copied is at (w*h)+(((h/2)-1)*(w/2)) in Source Array, copy this to (h*w)+(0,0) in the Destination Array. The 2nd pixel is at (w*h)+(((h/2)-2)*(w/2)) and so on...
Finally copy the V section. The first pixel to be copied is at ((w*h)+(w*h/4))+(((h/2)-1)*(w/2)) in Source Array, copy this to (h*w)+(w*h/4)+(0,0) in the Destination Array. The 2nd pixel is at ((w*h)+(w*h/4))+(((h/2)-2)*(w/2)) and so on...
The Destination Array obtained in this way contains the 90 degree rotated image.

I suppose it is not planar YUV, if it is it already it's quite easy (skip first and last steps). You meant to have YUV 4:2:0 planar, but then I do not understand why you have difficulties.
convert it to a planar first: allocate space for planes and put bytes at right places according to the packed YUV format you have.
rotate the Y, U, V planes separately. The "color" (U, V) information for each block then shall be kept the same.
recombine the planes to reobtain the right packed YUV you had at the beginning
This always works fine if your image dimensions are multiple of 4. If not, then take care...

I think YUV420p is indeed planar.
Try and take a look at AviSynth's source code. The turn (rotate) functions are in turn.cpp and turnfunc.cpp
http://www.avisynth.org/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js