How to rotate yuv420 data? - c++

I need to know how to rotate an image, which is in yuv420p format by 90 degrees. The option of converting this to rgb, rotating and again reconverting to yuv is not feasible. Even an algorithm would help.
Regards,
Anirudh.

In case the image is yuv420 planar, this is how the image data is encoded.
Planar meaning the y section is first, followed by U section and then with V section.
Considering the width of the image w, and height of the image h.
The total size of the image is w*h*3/2
The Y section also called luminescence occupies w*h.
there is a U pixel and V pixel for every 2x2 block in Y section.
the U section comes next, occupies (w/2)*(h/2) and is laid at an offset w*h from beginning of the image.
the V section follows, occupies (w/2)*(h/2) and is laid at an offset of (w*h)+((w*h)/4).
In order to rotate the image by 90 degrees, you essentially copy this w*h array to an array of h*w
As mentioned in above post, you simply need to copy each of the 3 above Y, U, V blocks separately.
Start with the Y section. The 1st pixel to be copied is at (h-1)*w in Source Array, copy this to (0,0) of destination array. The 2nd pixel is at (h-2)*w and so on...
Remember that the U and V sections are only (w/2)*(h/2)
Next copy the U section. The first pixel to be copied is at (w*h)+(((h/2)-1)*(w/2)) in Source Array, copy this to (h*w)+(0,0) in the Destination Array. The 2nd pixel is at (w*h)+(((h/2)-2)*(w/2)) and so on...
Finally copy the V section. The first pixel to be copied is at ((w*h)+(w*h/4))+(((h/2)-1)*(w/2)) in Source Array, copy this to (h*w)+(w*h/4)+(0,0) in the Destination Array. The 2nd pixel is at ((w*h)+(w*h/4))+(((h/2)-2)*(w/2)) and so on...
The Destination Array obtained in this way contains the 90 degree rotated image.

I suppose it is not planar YUV, if it is it already it's quite easy (skip first and last steps). You meant to have YUV 4:2:0 planar, but then I do not understand why you have difficulties.
convert it to a planar first: allocate space for planes and put bytes at right places according to the packed YUV format you have.
rotate the Y, U, V planes separately. The "color" (U, V) information for each block then shall be kept the same.
recombine the planes to reobtain the right packed YUV you had at the beginning
This always works fine if your image dimensions are multiple of 4. If not, then take care...

I think YUV420p is indeed planar.
Try and take a look at AviSynth's source code. The turn (rotate) functions are in turn.cpp and turnfunc.cpp
http://www.avisynth.org/

Related

C++ OpenCV boundRect[].tl() unit of output

I was wondering what the unit is of my boundRect[].tl() output.
topleft = boundRect[largest_contour_index].tl();
My assumption is that it is in pixels.
If so, do I need to look at the pixels of my camera and the format it outputs to calculate the position of my object?
Or do the pixels that the function outputs change due to the fact that OpenCV converts the image to an 8-bit image? I can imagine that the amount of pixels where the image consists of becomes smaller when the image is converted to 8 bit.
Please correct me if I'm wrong.
Thank you!
First of all, the BoundingRect returns x,y coordinates, width and height. you can refer to its documentation: docs.opencv.org/2.4/modules/core/doc/basic_structures.html#rect
second, the 8-bit image conversion was based on pixel value of color and doesn't have a direct relation with pixel count. So converting a 100x100 image to 8-bit image will still be 100x100 px

YUV420 to grayscale in C++

I'm trying to write a function to convert a camera stream with a YUV420 pixel format to grayscale. To my understanding, I only need to extract the Y values since these are the luminance of the image frames.
I am using this page as a reference https://linuxtv.org/downloads/v4l-dvb-apis/uapi/v4l/pixfmt-yuv420.html, but I am having trouble understanding what a planar format is and how to essentially skip through the UV values. I think I would need to get the 2x2 Y values (e.g., Y00, Y01, Y10, Y11 in the above link) for every UV values, but I'm also not sure if I should just write them consecutively into my image frame's destination. That is, if I have a pointer to my destination and I have these for Y values, should I just write them in the order of Y00, Y01, Y10, Y11, Y02, etc?

Open CV Mat structure coordinates

I am confused about the coordinates in OpenCV Mat structure. When I want to get a pixel I do something like this
image.at<Vec3b>(i,j)
The question is whether (0,0) coordinate is the top-left corner coordinate. I'm not sure about that, because when I try to get (-100,-100) it still works and gets a pixel.
Yes it is the top-left.
From official documentation (for all pixel-access methods) here :
the 0-based row index (or y-coordinate) goes first and the 0-based
column index (or x-coordinate) follows it
The at(-100,100) works because it is allowed to read (fast) everywhere in memory, but the data you get is not a pixel.

how go get RGB values of ROI selected in depth stream

I wrote an simple kinect application where I'm accessing the depth values to detect some objects. I use the following code to get the depth value
depth = NuiDepthPixelToDepth(pBufferRun);
this will give me the depth value for each pixel. Now I want to subselect a region of the image, and get the RGB camera values of this corresponding region.
What I'm not sure about:
do I need to open a color image stream?
or is it enough to just convert the depth into color?
how do I use NuiImageGetColorPixelCoordinateFrameFromDepthPixelFrameAtResolution?
I'm fine with the simplest solution where I have a depth frame and a color frame, so that I can select a ROI with opencv and then crop the color frame accordingly.
do I need to open a color image stream?
Yes. You can get the coordinates in the colour frame without opening the stream, but you won't be able to do anything useful with them because you'll have no colour data to index into!
or is it enough to just convert the depth into color?
There's no meaningful conversion of distance into colour. You need two image streams, and a co-ordinate conversion function.
how do I use NuiImageGetColorPixelCoordinateFrameFromDepthPixelFrameAtResolution?
That's a terribly documented function. Go take a look at NuiImageGetColorPixelCoordinatesFromDepthPixelAtResolution instead, because the function arguments and documentation actually make sense! Depth value and depth (x,y) coordinate in, RGB (x,y) coordinate out. Simple.
To get the RGB data at some given coordinates, you must first grab an RGB frame using NuiImageStreamGetNextFrame to get an INuiFrameTexture instance. Call LockRect on this to get a NUI_LOCKED_RECT. The pBits property of this object is a pointer to the first pixel of the raw XRGB image. This image is stored row wise, in top-to-bottom left-to-right order, with each pixel being represented by 4 sequential bytes representing a padding byte then R, G and B follwing it.
The pixel at position (100, 200) is therefore at
lockedRect->pBits[ ((200 * width * 4) + (100 * 4) ];
and the byte representing the red channel should be at
lockedRect->pBits[ ((200 * width * 4) + (100 * 4) + 1 ];
This is a standard 32bit RGB image format, and the buffer can be freely passed to your image manipulation library of choice... GDI, WIC, OpenCV, IPL, whatever.
(caveat... I'm not totally certain I have the pixel byte ordering correct. I think it is XRGB, but it could be XBGR or BGRX, for example. Testing for which one is actually being returned should be trivial)

Direct Show YUY2 Pixel Output from videoInput

I'm using videoInput to interface with DirectShow and get pixel data from my webcam.
From another question I've asked, people have suggested that the pixel format is just appended arrays in the order of the Y, U, and V channels.
FourCC's website suggests that the pixel format does not actually follow this pattern, and is instead |Y0|U0|Y1|V0|Y2|U0|Y3|V0|
I'm working on a few functions that convert the YUY2 input image into RGB and YV12, and after having little to no success, thought that it might be an issue with how I'm interpreting the initial YUY2 image data.
Am I correct in assuming that the pixel data should be in the format from the FourCC website, or are the Y, U and V channels separate arrays that have be concentrated (so the data is in the order of channels, for example: YYYYUUVV?
In YUY2 each row is a sequence of 4-byte packets: YUYV describing two adjacent pixels.
In YV12 there are 3 separate planes: first Y of size width*height then V and then U, both of size width/2 * height/2.