I am looking for a solution to easily compute the pixel coordinate from two images.
Question: If you take the following code, how could I compute the pixel coordinate that changed from the "QVector difference" ? Is it possible to have an (x,y) coordinate and find on the currentImage which pixel it represents ?
char *previousImage;
char *currentImage;
QVector difference<LONG>;
for(int i = 0 ; i < CurrentImageSize; i++)
{
//Check if pixels are the same (we can also do it with RGB values, this is just for the example)
if(previousImagePixel != currentImagePixel)
{
difference.push_back(currentImage - previousImage);
}
currentImage++;
}
EDIT:
More information about this topic:
The image is in RGB format
The width, the height and the bpp of both images are known
I have a pointer to the bytes representing the image
The main objective here is to clearly know what is the new value of a pixel that changed between the two images and to know which pixel is it (its coordinates)
There is not enough information to answer, but I will try to give you some idea.
You have declared char *previousImage;, which implies to me that you have a pointer to the bytes representing an image. You need more than that to interpret the image.
You need to know the pixel format. You mention RGB, So -- for the time being, let's assume that the image uses 3 bytes for each pixel and the order is RGB
You need to know the width of the image.
Given the above 2, you can calculate the "Row Stride", which is the number of bytes that a row takes up. This is usually the "bytes per pixel" * "image width", but it is typically padded out to be divisible by 4. So 3 bpp and a width of 15, would be 45 bytes + 3 bytes of padding to make the row stride 48.
Given that, if you have an index into the image data, you first integer-divide it against the row stride to get the row (Y coordinate).
The X coordinate is the (index mod the row stride) integer-divided by the bytes per pixel.
From what I understand, you want compute the displacement or motion that occured between two images. E.g. for each pixel I(x, y, t=previous) in previousImage, you want to know where it did go in currentImage, and what is his new coordinate I(x, y, t=current).
If that is the case, then it's called motion estimation and measuring the optical flow. There are many algorithms for that, who rely on more or less complex hypotheses, depending on the objects you observe in the image sequence.
The simpliest hypothesis is that if you follow a moving pixel I(x, y, t) in the scene you observe, its luminance will remain constant over time. In other words, dI(x,y,t) / dt = 0.
Since I(x, y, t) is function of three parameters (space and time) with two unknowns, and there is only one equation, this is an ill defined problem that has no easy solution. Many of the algorithms add an additional hypothesis, so that the problem can be solved with a unique solution.
You can use existing libraries which will do that for you, one of them which is pretty popular is openCV.
Related
I have a few questions on the usage of these two and how they effect textures.
What exactly is the ordering of the pixels? The pixels are provided via a buffer that's continuous. In what order does the texture read them so that they are distributed into the x and y planes?
Also these methods accept a type parameter for the data, does each pixel get 3 values? Because when we use the texture(uvMap, textureUV) method in the shader, it returns a vec3 of floats. So how exactly the data we provide to the texture via glteximage2d and gltexsubimage2d is read and organized in open gl?
Here is my assumption, correct me if I'm wrong:
The data buffer contains the pixel data. Each pixel is represented by the 3 values of the type sent in via glteximage2d and gltexsubimage2d methods. So the buffer we provide needs to have 3 * width * height number of values in it. OpenGL reads the buffer as follows(pseudo-code):
for(int y range 0 and height)
for(int x range 0 and width)
int index= (y * (width * 3)) + (x * 3);
pixels[y][x].x = buffer[index + 0];
pixels[y][x].y = buffer[index + 1];
pixels[y][x].z = buffer[index + 2];
The reference is very clear about that:
glTexImage2D
Description
The last three arguments (format, type, data) describe how the image is represented in memory.
If target is GL_TEXTURE_2D, ..., data is read from data as a sequence of signed or unsigned bytes, shorts, or longs, or single-precision floating-point values, depending on type. These values are grouped into sets of one, two, three, or four values, depending on format, to form elements.
The first element corresponds to the lower left corner of the texture image. Subsequent elements progress left-to-right through the remaining texels in the lowest row of the texture image, and then in successively higher rows of the texture image. The final element corresponds to the upper right corner of the texture image.
format determines the composition of each element in data. It can assume one of these symbolic values: ... (to be brief, i will not list the elements here, but it is a must read)
I have a 3 channel Mat image, type is CV_8UC3.
I want to compare, in a loop, the intensity value of a pixel with its neighbours and then set 0 or 1 if the neighbour is greater or not.
I can get the intensity calling Img.at<Vec3b>(x,y).
But my question is: how can I compare two Vec3b?
Should I compare pixels value for every channel (BGR or Vec3b[0], Vec3b[1] and Vec3b[2]), and then merge the three channels results into a single Mat object?
Me again :)
If you want to compare (greater or less) two RGB values you need to project the 3-dimensional RGB space onto a plane or axis.
Of course, there are many possibilities to do this, but an easy way would be to use the HSV color space. The hue (H), however, is not appropriate as a linear order function because it is circular (i.e. the value 1.0 is identical with 0.0, so you cannot decide if 0.5 > 0.0 or 0.5 < 0.0). However, the saturation (S) or the value (V) are appropriate projection functions for your purpose:
If you want to have colored pixels "larger" than monochrome pixels, you will prefer S.
If you want to have lighter pixels larger than darker pixels, you will probably prefer V.
Also any combination of S and V would be a valid projection function, e.g. S+V.
As far as I understand, you want a measure to calculate distance/similarity between two Vec3b pixels. This can be reflected to the general problem of finding distance between two vectors in an n-mathematical space.
One of the famous measures (and I think this is what you're asking for), is the Euclidean distance.
If you are using Opencv then you can simply use:
cv::Vec3b a(1, 1, 1);
cv::Vec3b b(5, 5, 5);
double dist = cv::norm(a, b, CV_L2);
You can refer to this for reading about cv::norm and its options.
Edit: If you are doing this to measure color similarity, it's recommended to use the LAB color space as it's proved that Euclidean distance in LAB space is a good approximation for human perception of colors.
Edit 2: I see what you mean, for this you can get the magnitude of each vector and then compare them, something like this:
double a_magnitude = cv::norm(a, CV_L2);
double b_magnitude = cv::norm(b, CV_L2);
if(a_magnitude > b_magnitude)
// do something
else
// do something else.
I want to add up all channels of a Mat image to a Mat image with only one sum-channel. I've tried it this way:
// sum up the channels of the image:
// 1 .store initial nr of rows/columns
int initialRows = frameVid1.rows;
int initialCols = frameVid1.cols;
// 2. check if matrix is continous
if (!frameVid1.isContinuous())
{
frameVid1 = frameVid1.clone();
}
// 3. reshape matrix to 3 color vectors
frameVid1 = frameVid1.reshape(3, initialRows*initialCols);
// 4. convert matrix to store bigger values than 255
frameVid1.convertTo(frameVid1, CV_32F);
// 5. sum up the three color vectors
reduce(frameVid1, frameVid1, 1, CV_REDUCE_SUM);
// 6. reshape to initial size
frameVid1 = frameVid1.reshape(1, initialRows);
// 7. convert back to CV_8UC1
frameVid1.convertTo(frameVid1, CV_8U);
But somehow reduce does not touch the color channels as a Matrix Dimension. Is there another function that can sum them up?
Also why does using CV_16U in step 4.) not work? (I had to put a CV_32F in there)
Thanks in advance!
You can sum the RGB channels with a single line
cv::transform(frameVid1, frameVidSum, cv::Matx13f(1,1,1))
You may need one more line, as before applying the transform you shall convert the image to some appropriate type to avoid saturation (I assumed CV_32FC3). -Output array is of the same size and depth as source.
Some explanation:
cv::transform may operate on per-pixel channel values.
Having the third argument cv::Matx13f(a, b, c) for each pixel [u,v] it does the following:
frameVidSum[u,v] = frameVid1[u,v].B * a + frameVid1[u,v].G * b + frameVid1[u,v].R * c
By using third argument cv::Matx13f(1,0,1) you will sum only blue and red channels.
cv::transform is so clever, you can even use cv::Matx14f and then the fourth value will be added (offset) to each pixel in the frameVidSum.
Every 3rd element (in RGB) is one similar colour. Probably it will work if you grab every group of 3 elements (R, G and B) sum them up and store it in another 1-channel matrix. Before storing you should use saturate cast to avoid unexpected results. So, I think the better way is to use saturate cast instead of adapting your matrix.
Have a look at cv::split() and cv::add() functions.
You can use the split function to split the image into separate channels and then the add function to add the images. But be careful when using add because adding may lead to saturation of values. You may have to first convert types and then add. Have a look here: http://answers.opencv.org/question/13769/adding-matrices-without-saturation/
I'm using openCV C++ API and I'm trying to convert a camera buffer (YUV_NV12) to a RGB format. However, the dimensions of my image changed (width shrinked from 720 to 480) and the colors are wrong (kinda purple/green ish).
unsigned char* myYUVBufferPointer = (something passed as argument)
int myYUVBufferHeight = 1280
int myYUVBufferWidth = 720
cv::Mat yuvMat(myYUVBufferHeight, myYUVBufferWidth, CV_8U, myYUVBufferPointer);
cv::Mat rgbMat;
cv::cvtColor(yuvMat, rgbMat, CV_YUV2RGB_NV12);
cv::imwrite("path/to/my/image.jpg",rgbMat);
Any ideas? *(I'm more interested about the size changed than the color, since I will eventually convert it to CV_YUV2GRAY_NV12 and thats working, but the size isn't).*
Your code constructs a single channel (grayscale) image called yuvMat out of a series of unsigned chars. When you try to -- forcefully -- convert this single channel image from YUV 4:2:0 to a multi-channel RGB, OpenCV library assumes that each row has 2/3 of the full 4:4:4 information (1 x height x width for Y and 1/2 height x width for U and V each, instead of 3 x height x width for a normal YUV) therefore your width of the destination image shrinks to 2/3 of the original width. It can be assumed that half of the data read from the original image comes from unallocated memory though because the original image only has width x height uchars but 2 x width x height uchars are read from the memory!
If your uchar buffer is already correctly formatted for a series of bytes representing YUV_NV12 (4:2:0 subsampling) in the conventional width x height bytes per channel, all you need to do is to construct your original yuvMat as a CV_8UC3 and you are good to go. The above assumes that all the interlacing and channel positioning is already implemented in the buffer. Most probably this is not the case though. YUV_NV12 data comes with width x height uchars of Y, followed by (1/2) width x (1/2) x height of 2 x uchars representing UV combined. You probably need to write your own reader to read Y, U, and V data separately and to construct 3 single-channel Mats of size width x height -- filling in the horizontal and vertical gaps in both U and V channels -- then use cv::merge() to combine those single-channel images to a 3-channel YUV before using cv::cvtColor() to convert that image using CV_YUV2BGR option. Notice the use of BGR instead of RGB.
It could be that "something passed as argument" does not have enough data to fill 720 lines. With some video cameras, not all three channels are represented using the same number of bits. For example, when capturing video on an iPhone, the three channels use 8-4-4 Bytes instead of 8-8-8. I haven't used this type of a camera with OpenCV, but most likely the problem is here.
`Obtain the stride (the number of bytes between pixels on different rows)
screen_get_buffer_property_iv(mScreenPixelBuffer, SCREEN_PROPERTY_STRIDE, &mStride)`
I don't understand what the first line meant about having bytes between pixels on different rows. The function is what the stride is obtained through.
If we have a rectangular bunch of pixels (a screen, bitmap, or some such), there must be a way for a program to calculate the position of a pixel. Lets call this sort of bunch of pixels a "surface".
The surface can be split into individual pixels, and we could just put then in a very long row and number then from 0 to some large number (e.g. a 1280 x 1024 screen would have 1310720 pixels). But if you show this long row of pixels on a screen, it makes more sense to talk about lines of pixels that are 1280 pixels long, and have 1024 rows of them.
Now, let's say we want to draw a line from pixels 100,100 to 100,200. We can easily write that as:
int i;
for(i = 100; i < 200; i++)
{
setpixel(surface, 100, 100+i, colour);
}
Now, if we want to implement setpixel, what do we need to do? One thing would be to translate our x, y coordinates (100, 100+i) into a location of our "long row of pixels".
The general formula tends to be (x + y * width) * bytes_per_pixel. So if we have a 32bpp image (four bytes per pixel), that would make (100 + (100+i) * 1280) * 4
However, to make it easier to design the graphics chip there are often limits on things like "the width of a surface must be an even multiple of X", where X is usually 16, 32, 64 or some other power of 2. Sometimes, it has to be a power of two directly (for example textures in early opengl can only be 2^n x 2^n pixels in size - you don't have to USE the entire texture). And this is where stride comes in.
Say we want to have a bitmap of 100 x 100 pixels. But the graphics chip that we use to draw the bitmap to the screen has a rule that you MUST have a even multiple of 32 pixels wide surfaces. So we make something like this
XXXXXXXXXX...
XXXXXXXXXX...
XXXXXXXXXX...
XXXXXXXXXX...
XXXXXXXXXX...
XXXXXXXXXX...
XXXXXXXXXX...
XXXXXXXXXX...
XXXXXXXXXX...
XXXXXXXXXX...
The X's here represent the actual pixels (10 per X) in our bitmap, and the ... 28 pixels of "waste" that we have to have to make the graphics chip happy.
Now the formula of using width doesn't work, because from the software creating the bitmap, the width is 100 pixels. We need to change the math to make up for the "extra space at the end of each row of pixels":
(x + y * stride) * bytes_per_pixel
Now, the stride is 128, but the width is 100 pixels.
Stride here refers to array stride, the number of bytes between memory locations that correspond to the beginning of adjacent rows of an array, in this case of pixels.
In a fully packed array, the stride equals the size of an individual pixel multiplied with the number of pixels in the row. For performance reasons, arrays are frequently aligned so that each row takes a "round" number of bytes, typically an exponent of two. The byte size of the row, aka the stride, cannot be computed from other array parameters and must be known in order to correctly calculate the memory position of an arbitrary pixel.