I have a 3 channel Mat image, type is CV_8UC3.
I want to compare, in a loop, the intensity value of a pixel with its neighbours and then set 0 or 1 if the neighbour is greater or not.
I can get the intensity calling Img.at<Vec3b>(x,y).
But my question is: how can I compare two Vec3b?
Should I compare pixels value for every channel (BGR or Vec3b[0], Vec3b[1] and Vec3b[2]), and then merge the three channels results into a single Mat object?
Me again :)
If you want to compare (greater or less) two RGB values you need to project the 3-dimensional RGB space onto a plane or axis.
Of course, there are many possibilities to do this, but an easy way would be to use the HSV color space. The hue (H), however, is not appropriate as a linear order function because it is circular (i.e. the value 1.0 is identical with 0.0, so you cannot decide if 0.5 > 0.0 or 0.5 < 0.0). However, the saturation (S) or the value (V) are appropriate projection functions for your purpose:
If you want to have colored pixels "larger" than monochrome pixels, you will prefer S.
If you want to have lighter pixels larger than darker pixels, you will probably prefer V.
Also any combination of S and V would be a valid projection function, e.g. S+V.
As far as I understand, you want a measure to calculate distance/similarity between two Vec3b pixels. This can be reflected to the general problem of finding distance between two vectors in an n-mathematical space.
One of the famous measures (and I think this is what you're asking for), is the Euclidean distance.
If you are using Opencv then you can simply use:
cv::Vec3b a(1, 1, 1);
cv::Vec3b b(5, 5, 5);
double dist = cv::norm(a, b, CV_L2);
You can refer to this for reading about cv::norm and its options.
Edit: If you are doing this to measure color similarity, it's recommended to use the LAB color space as it's proved that Euclidean distance in LAB space is a good approximation for human perception of colors.
Edit 2: I see what you mean, for this you can get the magnitude of each vector and then compare them, something like this:
double a_magnitude = cv::norm(a, CV_L2);
double b_magnitude = cv::norm(b, CV_L2);
if(a_magnitude > b_magnitude)
// do something
else
// do something else.
Related
I am looking for a solution to easily compute the pixel coordinate from two images.
Question: If you take the following code, how could I compute the pixel coordinate that changed from the "QVector difference" ? Is it possible to have an (x,y) coordinate and find on the currentImage which pixel it represents ?
char *previousImage;
char *currentImage;
QVector difference<LONG>;
for(int i = 0 ; i < CurrentImageSize; i++)
{
//Check if pixels are the same (we can also do it with RGB values, this is just for the example)
if(previousImagePixel != currentImagePixel)
{
difference.push_back(currentImage - previousImage);
}
currentImage++;
}
EDIT:
More information about this topic:
The image is in RGB format
The width, the height and the bpp of both images are known
I have a pointer to the bytes representing the image
The main objective here is to clearly know what is the new value of a pixel that changed between the two images and to know which pixel is it (its coordinates)
There is not enough information to answer, but I will try to give you some idea.
You have declared char *previousImage;, which implies to me that you have a pointer to the bytes representing an image. You need more than that to interpret the image.
You need to know the pixel format. You mention RGB, So -- for the time being, let's assume that the image uses 3 bytes for each pixel and the order is RGB
You need to know the width of the image.
Given the above 2, you can calculate the "Row Stride", which is the number of bytes that a row takes up. This is usually the "bytes per pixel" * "image width", but it is typically padded out to be divisible by 4. So 3 bpp and a width of 15, would be 45 bytes + 3 bytes of padding to make the row stride 48.
Given that, if you have an index into the image data, you first integer-divide it against the row stride to get the row (Y coordinate).
The X coordinate is the (index mod the row stride) integer-divided by the bytes per pixel.
From what I understand, you want compute the displacement or motion that occured between two images. E.g. for each pixel I(x, y, t=previous) in previousImage, you want to know where it did go in currentImage, and what is his new coordinate I(x, y, t=current).
If that is the case, then it's called motion estimation and measuring the optical flow. There are many algorithms for that, who rely on more or less complex hypotheses, depending on the objects you observe in the image sequence.
The simpliest hypothesis is that if you follow a moving pixel I(x, y, t) in the scene you observe, its luminance will remain constant over time. In other words, dI(x,y,t) / dt = 0.
Since I(x, y, t) is function of three parameters (space and time) with two unknowns, and there is only one equation, this is an ill defined problem that has no easy solution. Many of the algorithms add an additional hypothesis, so that the problem can be solved with a unique solution.
You can use existing libraries which will do that for you, one of them which is pretty popular is openCV.
I have a lowest speed Color and a highest speed Color
I have another variable called currentSpeed which gives me the current speed. I'd like to generate a Color between the two extremes using the current speed. Any hints?
The easiest solution is probably to linearly interpolate each of RGB (because that is probably the format your colours are in). However it can lead to some strange results. If lowest is bright blue (0x0000FF) and highest is bright yellow (0xFFFF00), then mid way will be dark grey (0x808080).
A better solution is probably:
Convert both colours to HSL (Hue, saturation, lightness)
Linearly interpolate those components
Convert the result back to RGB.
See this answer for how to do the conversion to and from HSL.
To do linear interpolation you will need something like:
double low_speed = 20.0, high_speed = 40.0; // The end points.
int low_sat = 50, high_sat = 200; // The value at the end points.
double current_speed = 35;
const auto scale_factor = (high_sat-low_sat)/(high_speed-low_speed);
int result_sat = low_sat + scale_factor * (current_speed - low_speed);
Two problems:
You will need to be careful about integer rounding if speeds are not actually double.
When you come to interpolate hue, you need to know that they are represented as angles on a circle - so you have a choice whether to interpolate clockwise or anti-clockwise (and one of them will go through 360 back to 0).
I'm trying to get 3D coordinates of several points in space, but I'm getting odd results from both undistortPoints() and triangulatePoints().
Since both cameras have different resolution, I've calibrated them separately, got RMS errors of 0,34 and 0,43, then used stereoCalibrate() to get more matrices, got an RMS of 0,708, and then used stereoRectify() to get remaining matrices. With that in hand I've started the work on gathered coordinates, but I get weird results.
For example, input is: (935, 262), and the undistortPoints() output is (1228.709125, 342.79841) for one point, while for another it's (934, 176) and (1227.9016, 292.4686) respectively. Which is weird, because both of these points are very close to the middle of the frame, where distortions are the smallest. I didn't expect it to move them by 300 pixels.
When passed to traingulatePoints(), the results get even stranger - I've measured the distance between three points in real life (with a ruler), and calculated the distance between pixels on each picture. Because this time the points were on a pretty flat plane, these two lengths (pixel and real) matched, as in |AB|/|BC| in both cases was around 4/9. However, triangulatePoints() gives me results off the rails, with |AB|/|BC| being 3/2 or 4/2.
This is my code:
double pointsBok[2] = { bokList[j].toFloat()+xBok/2, bokList[j+1].toFloat()+yBok/2 };
cv::Mat imgPointsBokProper = cv::Mat(1,1, CV_64FC2, pointsBok);
double pointsTyl[2] = { tylList[j].toFloat()+xTyl/2, tylList[j+1].toFloat()+yTyl/2 };
//cv::Mat imgPointsTyl = cv::Mat(2,1, CV_64FC1, pointsTyl);
cv::Mat imgPointsTylProper = cv::Mat(1,1, CV_64FC2, pointsTyl);
cv::undistortPoints(imgPointsBokProper, imgPointsBokProper,
intrinsicOne, distCoeffsOne, R1, P1);
cv::undistortPoints(imgPointsTylProper, imgPointsTylProper,
intrinsicTwo, distCoeffsTwo, R2, P2);
cv::triangulatePoints(P1, P2, imgWutBok, imgWutTyl, point4D);
double wResult = point4D.at<double>(3,0);
double realX = point4D.at<double>(0,0)/wResult;
double realY = point4D.at<double>(1,0)/wResult;
double realZ = point4D.at<double>(2,0)/wResult;
The angles between points are kinda sorta good but usually not:
`7,16816 168,389 4,44275` vs `5,85232 170,422 3,72561` (degrees)
`8,44743 166,835 4,71715` vs `12,4064 158,132 9,46158`
`9,34182 165,388 5,26994` vs `19,0785 150,883 10,0389`
I've tried to use undistort() on the entire frame, but got results just as odd. The distance between B and C points should be pretty much unchanged at all times, and yet this is what I get:
7502,42
4876,46
3230,13
2740,67
2239,95
Frame by frame.
Pixel distance (bottom) vs real distance (top) - should be very similar:
Angle:
Also, shouldn't both undistortPoints() and undistort() give the same results (another set of videos here)?
The function cv::undistort does undistortion and reprojection in one go. It performs the following list of operations:
undo camera projection (multiplication with the inverse of the camera matrix)
apply the distortion model to undo the distortion
rotate by the provided Rotation matrix R1/R2
project points to image using the provided Projection matrix P1/P2
If you pass the matrices R1, P1 resp. R2, P2 from cv::stereoCalibrate(), the input points will be undistorted and rectified. Rectification means that the images are transformed in a way such that corresponding points have the same y-coordinate. There is no unique solution for image rectification, as you can apply any translation or scaling to both images, without changing the alignement of corresponding points.
That being said, cv::stereoCalibrate() can shift the center of projection quite a bit (e.g. 300 pixels). If you want pure undistortion you can pass an Identity Matrix (instead of R1) and the original camera Matrix K (instead of P1). This should lead to pixel coordinates similar to the original ones.
I want to add up all channels of a Mat image to a Mat image with only one sum-channel. I've tried it this way:
// sum up the channels of the image:
// 1 .store initial nr of rows/columns
int initialRows = frameVid1.rows;
int initialCols = frameVid1.cols;
// 2. check if matrix is continous
if (!frameVid1.isContinuous())
{
frameVid1 = frameVid1.clone();
}
// 3. reshape matrix to 3 color vectors
frameVid1 = frameVid1.reshape(3, initialRows*initialCols);
// 4. convert matrix to store bigger values than 255
frameVid1.convertTo(frameVid1, CV_32F);
// 5. sum up the three color vectors
reduce(frameVid1, frameVid1, 1, CV_REDUCE_SUM);
// 6. reshape to initial size
frameVid1 = frameVid1.reshape(1, initialRows);
// 7. convert back to CV_8UC1
frameVid1.convertTo(frameVid1, CV_8U);
But somehow reduce does not touch the color channels as a Matrix Dimension. Is there another function that can sum them up?
Also why does using CV_16U in step 4.) not work? (I had to put a CV_32F in there)
Thanks in advance!
You can sum the RGB channels with a single line
cv::transform(frameVid1, frameVidSum, cv::Matx13f(1,1,1))
You may need one more line, as before applying the transform you shall convert the image to some appropriate type to avoid saturation (I assumed CV_32FC3). -Output array is of the same size and depth as source.
Some explanation:
cv::transform may operate on per-pixel channel values.
Having the third argument cv::Matx13f(a, b, c) for each pixel [u,v] it does the following:
frameVidSum[u,v] = frameVid1[u,v].B * a + frameVid1[u,v].G * b + frameVid1[u,v].R * c
By using third argument cv::Matx13f(1,0,1) you will sum only blue and red channels.
cv::transform is so clever, you can even use cv::Matx14f and then the fourth value will be added (offset) to each pixel in the frameVidSum.
Every 3rd element (in RGB) is one similar colour. Probably it will work if you grab every group of 3 elements (R, G and B) sum them up and store it in another 1-channel matrix. Before storing you should use saturate cast to avoid unexpected results. So, I think the better way is to use saturate cast instead of adapting your matrix.
Have a look at cv::split() and cv::add() functions.
You can use the split function to split the image into separate channels and then the add function to add the images. But be careful when using add because adding may lead to saturation of values. You may have to first convert types and then add. Have a look here: http://answers.opencv.org/question/13769/adding-matrices-without-saturation/
I wanted to detect ellipse in an image. Since I was learning Mathematica at that time, I asked a question here and got a satisfactory result from the answer below, which used the RANSAC algorithm to detect ellipse.
However, recently I need to port it to OpenCV, but there are some functions that only exist in Mathematica. One of the key function is the "GradientOrientationFilter" function.
Since there are five parameters for a general ellipse, I need to sample five points to determine one. Howevere, the more sampling points indicates the lower chance to have a good guess, which leads to the lower success rate in ellipse detection. Therefore, the answer from Mathematica add another condition, that is the gradient of the image must be parallel to the gradient of the ellipse equation. Anyway, we'll only need three points to determine one ellipse using least square from the Mathematica approach. The result is quite good.
However, when I try to find the image gradient using Sobel or Scharr operator in OpenCV, it is not good enough, which always leads to the bad result.
How to calculate the gradient or the tangent of an image accurately? Thanks!
Result with gradient, three points
Result without gradient, five points
----------updated----------
I did some edge detect and median blur beforehand and draw the result on the edge image. My original test image is like this:
In general, my final goal is to detect the ellipse in a scene or on an object. Something like this:
That's why I choose to use RANSAC to fit the ellipse from edge points.
As for your final goal, you may try
findContours and [fitEllipse] in OpenCV
The pseudo code will be
1) some image process
2) find all contours
3) fit each contours by fitEllipse
here is part of code I use before
[... image process ....you get a bwimage ]
vector<vector<Point> > contours;
findContours(bwimage, contours, CV_RETR_LIST, CV_CHAIN_APPROX_NONE);
for(size_t i = 0; i < contours.size(); i++)
{
size_t count = contours[i].size();
Mat pointsf;
Mat(contours[i]).convertTo(pointsf, CV_32F);
RotatedRect box = fitEllipse(pointsf);
/* You can put some limitation about size and aspect ratio here */
if( box.size.width > 20 &&
box.size.height > 20 &&
box.size.width < 80 &&
box.size.height < 80 )
{
if( MAX(box.size.width, box.size.height) > MIN(box.size.width, box.size.height)*30 )
continue;
//drawContours(SrcImage, contours, (int)i, Scalar::all(255), 1, 8);
ellipse(SrcImage, box, Scalar(0,0,255), 1, CV_AA);
ellipse(SrcImage, box.center, box.size*0.5f, box.angle, 0, 360, Scalar(200,255,255), 1, CV_AA);
}
}
imshow("result", SrcImage);
If you focus on ellipse(no other shape), you can treat the value of the pixels of the ellipse as mass of the points.
Then you can calculate the moment of inertial Ixx, Iyy, Ixy to find out the angle, theta, which can rotate a general ellipse back to a canonical form (X-Xc)^2/a + (Y-Yc)^2/b = 1.
Then you can find out Xc and Yc by the center of mass.
Then you can find out a and b by min X and min Y.
--------------- update -----------
This method can apply to filled ellipse too.
More than one ellipse on a single image will fail unless you segment them first.
Let me explain more,
I will use C to represent cos(theta) and S to represent sin(theta)
After rotation to canonical form, the new X is [eq0] X=xC-yS and Y is Y=xS+yC where x and y are original positions.
The rotation will give you min IYY.
[eq1]
IYY= Sum(m*Y*Y) = Sum{m*(xS+yC)(xS+yC)} = Sum{ m(xxSS+yyCC+xySC) = Ixx*S^2 + Iyy*C^2 + Ixy*S*C
For min IYY, d(IYY)/d(theta) = 0 that is
2IxxSC - 2IyySC + Ixy(CC-SS) = 0
2(Ixx-Iyy)/Ixy = (SS-CC)/SC = S/C+C/S = Z+1/Z
While programming, the LHS is just a number, let's said N
Z^2 - NZ +1 =0
So there are two roots of Z hence theta, let's said Z1 and Z2, one will min the IYY and the other will max the IYY.
----------- pseudo code --------
Compute Ixx, Iyy, Ixy for a hollow or filled ellipse.
Compute theta1=atan(Z1) and theta2=atan(Z2)
Put These two theta into eq1 find which is smaller. Then you get theta.
Go back to those non-zero pixels, transfer them to new X and Y by the theta you found.
Find center of mass Xc Yc and min X and min Y by sort().
-------------- by hand -----------
If you need the original equation of the ellipse
Just put [eq0] into the canonical form
You're using terms in an unusual way.
Normally for images, the term "gradient" is interpreted as if the image is a mathematical function f(x,y). This gives us a (df/dx, df/dy) vector in each point.
Yet you're looking at the image as if it's a function y = f(x) and the gradient would be f(x)/dx.
Now, if you look at your image, you'll see that the two interpretations are definitely related. Your ellipse is drawn as a set of contrasting pixels, and as a result there are two sharp gradients in the image - the inner and outer. These of course correspond to the two normal vectors, and therefore are in opposite directions.
Also note that your image has pixels. The gradient is also pixelated. The way your ellipse is drawn, with a single pixel width means that your local gradient takes on only values that are a multiple of 45 degrees:
▄▄ ▄▀ ▌ ▀▄