YUV420 to grayscale in C++ - c++

I'm trying to write a function to convert a camera stream with a YUV420 pixel format to grayscale. To my understanding, I only need to extract the Y values since these are the luminance of the image frames.
I am using this page as a reference https://linuxtv.org/downloads/v4l-dvb-apis/uapi/v4l/pixfmt-yuv420.html, but I am having trouble understanding what a planar format is and how to essentially skip through the UV values. I think I would need to get the 2x2 Y values (e.g., Y00, Y01, Y10, Y11 in the above link) for every UV values, but I'm also not sure if I should just write them consecutively into my image frame's destination. That is, if I have a pointer to my destination and I have these for Y values, should I just write them in the order of Y00, Y01, Y10, Y11, Y02, etc?

Related

How can I create heightmap for QHeightMapSurfaceDataProxy from 2D array to show 2D Fourier transform results

I have the data - 2D discrete Fourier transform results. I want to gain heightmap but i dont know how to form heightmap. I need plot this data as surface in Q3DSurface through heightmap (not just 2D array).
QHeightMapSurfaceDataProxy's constructor takes an image or an image file as an argument. All you need to do is create this image and load it.
Images can easily be generated from a 2D array since the indices used to point at a specific value stored in it can be interpreted as X,Y, while the value at the specific pair of indices as the Z coordinate.
Example:
If you have the following assignment
myarr[2][10] = 200;
you can read it as X=2, Y=10 and Z=200, which would mean that pixel at location [2;10] has value 200.
The size of the image is calculated by taking the dimensions of your array. If you have 10x15 elements your image will be 10x15 pixels. Check how to populate a QImage to have a more accurate code and not my pseudo-code from above.

OpenCV convertTo()

I came across this code:
image.convertTo(temp_image,CV_16SC3);
I saw the description of the convertTo() function from here, but what confuses me is image. How can we read the above code? What would be the relation between image and temp_image?
Thanks.
The other answers here are correct, but lack some details. Let me try.
image.convertTo(temp_image,CV_16SC3);
You have a source image image, and a destination image temp_image. You didn't specify the type of image, but probably is CV_8UC3 or CV_32FC3, i.e. a 3 channel image (since convertTo doesn't change the number of channels), where each channel has depth 8 bit (unsigned char, CV_8UC3) or 32 bit (float, CV_32FC3).
This line of code will change the depth of each channel, so that temp_image has each channel of depth 16 bit (short). Specifically it's a signed short, since the type specifier has the S: CV_16SC3.
Note that if you are narrowing down the depth, as in the case from float to signed short, then saturate_cast will make sure that all the values in temp_image will be in the correct range, i.e. in [–32768, 32767] for signed short.
Why you need to change the depth of an image?
Some OpenCV functions require input images with a specific depth.
You need a matrix to contain a different range of values. E.g. if you need to sum (or subtract) some images CV_8UC3 (tipically BGR images), you'd better store the result in a CV_16SC3 or you'll probably get wrong results due to saturations, since the range for CV_8U images is in [0,255]
You read with imread, or want to store with imwrite images with 16bit depth. This are usually used (AFAIK) in medical or graphics application to allow a wider range of colors. However, most monitors do not support 16bit image visualization.
There may be other cases, let me know if I miss the one important to you.
An image is a matrix of pixel information (i.e. a 1080p image will be a 1,920 × 1,080 matrix where each entry contains rbg values for that pixel). All you are doing is reformatting that matrix (each pixel entry, iteratively) into a new type (CV_16SC3) so it can be read by different programs.
The temp_image is a new matrix of pixel information based off of image formatted into CV_16SC3.
The first one is a source, the second one - destination. So, it takes image, converts it into type CV_16SC3 and stores in temp_image.

how go get RGB values of ROI selected in depth stream

I wrote an simple kinect application where I'm accessing the depth values to detect some objects. I use the following code to get the depth value
depth = NuiDepthPixelToDepth(pBufferRun);
this will give me the depth value for each pixel. Now I want to subselect a region of the image, and get the RGB camera values of this corresponding region.
What I'm not sure about:
do I need to open a color image stream?
or is it enough to just convert the depth into color?
how do I use NuiImageGetColorPixelCoordinateFrameFromDepthPixelFrameAtResolution?
I'm fine with the simplest solution where I have a depth frame and a color frame, so that I can select a ROI with opencv and then crop the color frame accordingly.
do I need to open a color image stream?
Yes. You can get the coordinates in the colour frame without opening the stream, but you won't be able to do anything useful with them because you'll have no colour data to index into!
or is it enough to just convert the depth into color?
There's no meaningful conversion of distance into colour. You need two image streams, and a co-ordinate conversion function.
how do I use NuiImageGetColorPixelCoordinateFrameFromDepthPixelFrameAtResolution?
That's a terribly documented function. Go take a look at NuiImageGetColorPixelCoordinatesFromDepthPixelAtResolution instead, because the function arguments and documentation actually make sense! Depth value and depth (x,y) coordinate in, RGB (x,y) coordinate out. Simple.
To get the RGB data at some given coordinates, you must first grab an RGB frame using NuiImageStreamGetNextFrame to get an INuiFrameTexture instance. Call LockRect on this to get a NUI_LOCKED_RECT. The pBits property of this object is a pointer to the first pixel of the raw XRGB image. This image is stored row wise, in top-to-bottom left-to-right order, with each pixel being represented by 4 sequential bytes representing a padding byte then R, G and B follwing it.
The pixel at position (100, 200) is therefore at
lockedRect->pBits[ ((200 * width * 4) + (100 * 4) ];
and the byte representing the red channel should be at
lockedRect->pBits[ ((200 * width * 4) + (100 * 4) + 1 ];
This is a standard 32bit RGB image format, and the buffer can be freely passed to your image manipulation library of choice... GDI, WIC, OpenCV, IPL, whatever.
(caveat... I'm not totally certain I have the pixel byte ordering correct. I think it is XRGB, but it could be XBGR or BGRX, for example. Testing for which one is actually being returned should be trivial)

Direct Show YUY2 Pixel Output from videoInput

I'm using videoInput to interface with DirectShow and get pixel data from my webcam.
From another question I've asked, people have suggested that the pixel format is just appended arrays in the order of the Y, U, and V channels.
FourCC's website suggests that the pixel format does not actually follow this pattern, and is instead |Y0|U0|Y1|V0|Y2|U0|Y3|V0|
I'm working on a few functions that convert the YUY2 input image into RGB and YV12, and after having little to no success, thought that it might be an issue with how I'm interpreting the initial YUY2 image data.
Am I correct in assuming that the pixel data should be in the format from the FourCC website, or are the Y, U and V channels separate arrays that have be concentrated (so the data is in the order of channels, for example: YYYYUUVV?
In YUY2 each row is a sequence of 4-byte packets: YUYV describing two adjacent pixels.
In YV12 there are 3 separate planes: first Y of size width*height then V and then U, both of size width/2 * height/2.

How to rotate yuv420 data?

I need to know how to rotate an image, which is in yuv420p format by 90 degrees. The option of converting this to rgb, rotating and again reconverting to yuv is not feasible. Even an algorithm would help.
Regards,
Anirudh.
In case the image is yuv420 planar, this is how the image data is encoded.
Planar meaning the y section is first, followed by U section and then with V section.
Considering the width of the image w, and height of the image h.
The total size of the image is w*h*3/2
The Y section also called luminescence occupies w*h.
there is a U pixel and V pixel for every 2x2 block in Y section.
the U section comes next, occupies (w/2)*(h/2) and is laid at an offset w*h from beginning of the image.
the V section follows, occupies (w/2)*(h/2) and is laid at an offset of (w*h)+((w*h)/4).
In order to rotate the image by 90 degrees, you essentially copy this w*h array to an array of h*w
As mentioned in above post, you simply need to copy each of the 3 above Y, U, V blocks separately.
Start with the Y section. The 1st pixel to be copied is at (h-1)*w in Source Array, copy this to (0,0) of destination array. The 2nd pixel is at (h-2)*w and so on...
Remember that the U and V sections are only (w/2)*(h/2)
Next copy the U section. The first pixel to be copied is at (w*h)+(((h/2)-1)*(w/2)) in Source Array, copy this to (h*w)+(0,0) in the Destination Array. The 2nd pixel is at (w*h)+(((h/2)-2)*(w/2)) and so on...
Finally copy the V section. The first pixel to be copied is at ((w*h)+(w*h/4))+(((h/2)-1)*(w/2)) in Source Array, copy this to (h*w)+(w*h/4)+(0,0) in the Destination Array. The 2nd pixel is at ((w*h)+(w*h/4))+(((h/2)-2)*(w/2)) and so on...
The Destination Array obtained in this way contains the 90 degree rotated image.
I suppose it is not planar YUV, if it is it already it's quite easy (skip first and last steps). You meant to have YUV 4:2:0 planar, but then I do not understand why you have difficulties.
convert it to a planar first: allocate space for planes and put bytes at right places according to the packed YUV format you have.
rotate the Y, U, V planes separately. The "color" (U, V) information for each block then shall be kept the same.
recombine the planes to reobtain the right packed YUV you had at the beginning
This always works fine if your image dimensions are multiple of 4. If not, then take care...
I think YUV420p is indeed planar.
Try and take a look at AviSynth's source code. The turn (rotate) functions are in turn.cpp and turnfunc.cpp
http://www.avisynth.org/