A misunderstanding of V4L2

A misunderstanding of V4L2 - c++

I have a small problem with the size of my buffers in a C++ program.
I grab YUYV images from a camera using V4L2 (an example is available here )
I want to take one image and put it into a my own image structure.
Here is the buffer given by the V4L2 structure and its size
(uchar*)buffers_[buf.index].start, buf.bytesused
In my structure, I create a new buffer (mybuffer) with a size of width*height*bitSize (byte size is 4 since I grab YUYV or YUV422 images).
The problem is that I was expecting the buffer buf to be the same size as the one that I created. But this is not the case, for example when I grab a 640*480 image buf=614400 and mybuffer=1228800 (twice as big).
Does anyone have any idea why this is the case ?

YUV422 uses 4 bytes per 2 pixels
In YUV422 mode the U ans V values are shared between two pixels. The bytes in the Image are ordered like U0 Y0 V0 Y1 U2 Y2 V2 Y3 etc.
Giving pixels like:
pixel 0 U0Y0V0
pixel 1 U0Y1V0
pixel 2 U2Y2V2
pixel 3 U2Y3V2

Related

Maximum PNG size according to resolution

Is there a way to calculate the Maximum size that could take any image compressed with PNG ?
I need to know, that (for example) a PNG of a resolution of 350x350 (px), can't be larger than "X" KB. (and for a constant quality compression, like 90)
the "X" value is the one I'm looking for. Or in math expression
350px * 350px * (90q) < X KB
I'm not quite familiar with the PNG compression algorithm, but there is probably a max value for a specific resolution ?
P.S. : the PNG has no alpha is this case.

From the PNG format here:
The maximum case happens when the data is incompressible
(for example, if the image resolution is 1x1,
or if the image is larger but contains random incompressible data).
That would make the maximum size:
8 // PNG signature bytes
+ 25 // IHDR chunk (Image Header)
+ 12 // IDAT chunk (assuming only one IDAT chunk)
+ height // in pixels
* (
1 // filter byte for each row
+ (
width // in pixels
* 3 // Red, blue, green color samples
* 2 // 16 bits per color sample
)
)
+ 6 // zlib compression overhead
+ 2 // deflate overhead
+ 12 // IEND chunk
Compression "quality" doesn't enter into this.
Most applications will probably separate the IDAT chunk into smaller chunks, typically 8 kbytes each, so in the case of a 350x350 image there would be 44 IDAT chunks, so add 43*12 for IDAT chunk overhead.
As a check, a 1x1 16-bit RGB image can be written as a 72-byte PNG, and a 1x1 8-bit grayscale image is 67 bytes.
If the image is interlaced, or has any ancillary chunks, or has an alpha channel, it will naturally be bigger.

FFMPEG API: decode MPEG to YUV frames and change these frames

I need save all frames from MPEG4 or H.264 video to YUV-frames using C++ library. For example, in .yuv, .y4m or .y format. Then I need read these frames like a digital files and change some samples (Y-value). How can I do it without convert to RGB?
And how store values of AVFrame->data? Where store Y-, U- and V-values?
Thanks and sorry for my English=)

If you use libav* to decode, you will receive the frames in their native colorspace (usually YUV 420) But it is what ever was chosen at encode time. Assuming you are in YUV420 or convert to YUV420 y: AVFrame->data[0], u: AVFrame->data[1], v: AVFrame->data[2]
For Y, 1 byte per pixel AVFrame->data[0][(x*AVFrame->linesize[0]) + y]
For U and V its 4 pixles per byte (quarter resolution of Y plane). So
AVFrame->data[1][(x/2*AVFrame->linesize[1]) + y/2], AVFrame->data[2][(x/2*AVFrame->linesize[2]) + y/2]

how go get RGB values of ROI selected in depth stream

I wrote an simple kinect application where I'm accessing the depth values to detect some objects. I use the following code to get the depth value
depth = NuiDepthPixelToDepth(pBufferRun);
this will give me the depth value for each pixel. Now I want to subselect a region of the image, and get the RGB camera values of this corresponding region.
What I'm not sure about:
do I need to open a color image stream?
or is it enough to just convert the depth into color?
how do I use NuiImageGetColorPixelCoordinateFrameFromDepthPixelFrameAtResolution?
I'm fine with the simplest solution where I have a depth frame and a color frame, so that I can select a ROI with opencv and then crop the color frame accordingly.

do I need to open a color image stream?
Yes. You can get the coordinates in the colour frame without opening the stream, but you won't be able to do anything useful with them because you'll have no colour data to index into!
or is it enough to just convert the depth into color?
There's no meaningful conversion of distance into colour. You need two image streams, and a co-ordinate conversion function.
how do I use NuiImageGetColorPixelCoordinateFrameFromDepthPixelFrameAtResolution?
That's a terribly documented function. Go take a look at NuiImageGetColorPixelCoordinatesFromDepthPixelAtResolution instead, because the function arguments and documentation actually make sense! Depth value and depth (x,y) coordinate in, RGB (x,y) coordinate out. Simple.
To get the RGB data at some given coordinates, you must first grab an RGB frame using NuiImageStreamGetNextFrame to get an INuiFrameTexture instance. Call LockRect on this to get a NUI_LOCKED_RECT. The pBits property of this object is a pointer to the first pixel of the raw XRGB image. This image is stored row wise, in top-to-bottom left-to-right order, with each pixel being represented by 4 sequential bytes representing a padding byte then R, G and B follwing it.
The pixel at position (100, 200) is therefore at
lockedRect->pBits[ ((200 * width * 4) + (100 * 4) ];
and the byte representing the red channel should be at
lockedRect->pBits[ ((200 * width * 4) + (100 * 4) + 1 ];
This is a standard 32bit RGB image format, and the buffer can be freely passed to your image manipulation library of choice... GDI, WIC, OpenCV, IPL, whatever.
(caveat... I'm not totally certain I have the pixel byte ordering correct. I think it is XRGB, but it could be XBGR or BGRX, for example. Testing for which one is actually being returned should be trivial)

Direct Show YUY2 Pixel Output from videoInput

I'm using videoInput to interface with DirectShow and get pixel data from my webcam.
From another question I've asked, people have suggested that the pixel format is just appended arrays in the order of the Y, U, and V channels.
FourCC's website suggests that the pixel format does not actually follow this pattern, and is instead |Y0|U0|Y1|V0|Y2|U0|Y3|V0|
I'm working on a few functions that convert the YUY2 input image into RGB and YV12, and after having little to no success, thought that it might be an issue with how I'm interpreting the initial YUY2 image data.
Am I correct in assuming that the pixel data should be in the format from the FourCC website, or are the Y, U and V channels separate arrays that have be concentrated (so the data is in the order of channels, for example: YYYYUUVV?

In YUY2 each row is a sequence of 4-byte packets: YUYV describing two adjacent pixels.
In YV12 there are 3 separate planes: first Y of size width*height then V and then U, both of size width/2 * height/2.

How to rotate yuv420 data?

I need to know how to rotate an image, which is in yuv420p format by 90 degrees. The option of converting this to rgb, rotating and again reconverting to yuv is not feasible. Even an algorithm would help.
Regards,
Anirudh.

In case the image is yuv420 planar, this is how the image data is encoded.
Planar meaning the y section is first, followed by U section and then with V section.
Considering the width of the image w, and height of the image h.
The total size of the image is w*h*3/2
The Y section also called luminescence occupies w*h.
there is a U pixel and V pixel for every 2x2 block in Y section.
the U section comes next, occupies (w/2)*(h/2) and is laid at an offset w*h from beginning of the image.
the V section follows, occupies (w/2)*(h/2) and is laid at an offset of (w*h)+((w*h)/4).
In order to rotate the image by 90 degrees, you essentially copy this w*h array to an array of h*w
As mentioned in above post, you simply need to copy each of the 3 above Y, U, V blocks separately.
Start with the Y section. The 1st pixel to be copied is at (h-1)*w in Source Array, copy this to (0,0) of destination array. The 2nd pixel is at (h-2)*w and so on...
Remember that the U and V sections are only (w/2)*(h/2)
Next copy the U section. The first pixel to be copied is at (w*h)+(((h/2)-1)*(w/2)) in Source Array, copy this to (h*w)+(0,0) in the Destination Array. The 2nd pixel is at (w*h)+(((h/2)-2)*(w/2)) and so on...
Finally copy the V section. The first pixel to be copied is at ((w*h)+(w*h/4))+(((h/2)-1)*(w/2)) in Source Array, copy this to (h*w)+(w*h/4)+(0,0) in the Destination Array. The 2nd pixel is at ((w*h)+(w*h/4))+(((h/2)-2)*(w/2)) and so on...
The Destination Array obtained in this way contains the 90 degree rotated image.

I suppose it is not planar YUV, if it is it already it's quite easy (skip first and last steps). You meant to have YUV 4:2:0 planar, but then I do not understand why you have difficulties.
convert it to a planar first: allocate space for planes and put bytes at right places according to the packed YUV format you have.
rotate the Y, U, V planes separately. The "color" (U, V) information for each block then shall be kept the same.
recombine the planes to reobtain the right packed YUV you had at the beginning
This always works fine if your image dimensions are multiple of 4. If not, then take care...

I think YUV420p is indeed planar.
Try and take a look at AviSynth's source code. The turn (rotate) functions are in turn.cpp and turnfunc.cpp
http://www.avisynth.org/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js