FFMPEG API: decode MPEG to YUV frames and change these frames

FFMPEG API: decode MPEG to YUV frames and change these frames - c++

I need save all frames from MPEG4 or H.264 video to YUV-frames using C++ library. For example, in .yuv, .y4m or .y format. Then I need read these frames like a digital files and change some samples (Y-value). How can I do it without convert to RGB?
And how store values of AVFrame->data? Where store Y-, U- and V-values?
Thanks and sorry for my English=)

If you use libav* to decode, you will receive the frames in their native colorspace (usually YUV 420) But it is what ever was chosen at encode time. Assuming you are in YUV420 or convert to YUV420 y: AVFrame->data[0], u: AVFrame->data[1], v: AVFrame->data[2]
For Y, 1 byte per pixel AVFrame->data[0][(x*AVFrame->linesize[0]) + y]
For U and V its 4 pixles per byte (quarter resolution of Y plane). So
AVFrame->data[1][(x/2*AVFrame->linesize[1]) + y/2], AVFrame->data[2][(x/2*AVFrame->linesize[2]) + y/2]

Related

Convert RGB32 image to ofPixels in Open Frameworks

I am trying to display a video from a video decoder library.
The video is delivered as a byte array with RGB32 pixel format.
Meaning every pixel is represented by 32 bits.
RRBBGGFF - 8bit R, 8bit G, 8bit B, 8bit 0xFF.
Similar to QT Qimage Format_RGB32.
I thnik I need to convert the pixel array to ofPixels, Then load the pixels to ofTexture.
Then I can draw the texture.
I don't know how to convert/set the ofPixels from this pixel format.
Any tips/ideas are so so welcome.
Thanks!

Try using a ofThreadChannel as described in this example in order to avoid writing and reading from your ofTexture / ofPixels.
Then you can load a uint8_t* by doing :
by using an ofTexture's method loadFromPixels() :
// assuming data is populating the externalBuffer
void* externalBuffer;
tex.loadFromPixels((uint8*)externalData, width, height, GL_RGBA);
Hope this helps,
Best,
P

opencv can't open a yuv422 image while rawpixels.net can display the image

I am trying to open a yuv format image. I can open it with rawpixels.net and display it after setting the following
width:1920
height:1080
predefined format: yuv420 (nv12)
pixel format yuv
But if I open with opencv with the following code I failed to open.
#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/opencv.hpp>
int main() {
std::cout << "OpenCV version: " << CV_VERSION << std::endl;
cv::Mat image = cv::imread("camera_capture_256_2020_10_07_11_11_02.yuv");
if (image.empty() == true) {
std::cout << "image empty"<< std::endl;
return 0;
}
cv::imshow("opencv_logo", image);
cv::waitKey(0);
return 0;
}
The program prints as "image empty".
I am puzzled why I can't open the file with opencv.
The sample image is found here.
The yuv image opened with rawpixels.net would look like this.
Thanks,

The very first thing to do when dealing with raw (RGB, BGR, YUV, NV12 and others) images is to know the dimensions in pixels of the image - you are really quite lost without those - though you can do certain tricks to look for correlation to find the row width since each row is essentially similar to the one above normally.
The next thing is to check the filesize is correct. So if it is RGB and 8-bit 1920x1080, your file must be 1920x1080x3 pixels in size - if not there is a problem. Your image is 1920x1080 and NV12 which is 12-bits or 1.5 bytes per pixel, so I expect your file to be 1920x1080*1.5 bytes. It is not that, so there is immediately a problem. There is either a header, or multiple frames or trailing data or some other issue.
So, where is the image data in the file? At the start? At the end? One way to solve this is to look at the file as though it was purely a greyscale image and see if there are large blocks of black which are zero bytes or padding. As there is no known image size, I generally take the file size in bytes and go to Wolfram Alpha website and type in "factors of XXX" where XXX is the file size and then choose 2 numbers near the square-root of the file size so I get a square-ish image. So for yours, I chose 2720x3072 and treated your file as a single greyscale image of that size. Using ImageMagick in Terminal:
magick -depth 8 -size 2720x3072 gray:camera_preview_250_2020_10_07_11_11_02.yuv image.jpg
I can see, at a glance that the data are at the start of the file and the end of the file is zero-padding, i.e. black. If the black had been at the start of the image, I would have taken the final H x W x 1.5 bytes.
Another alternative for this step, is to take the file size in bytes and divide it by the image width to get a number of lines and see how that looks. So your file is 8355840 bytes, that would be 8355840/1920 or 4,325 lines. Let's try that:
magick -depth 8 -size 1920x4352 gray:camera_preview_250_2020_10_07_11_11_02.yuv image.jpg
That is very encouraging because we can see the Y (greyscale) image at the start of the file and some lower-resolution UV channels following, and the fact that there are not 2 separate channels following probably means they are interlaced, alternating U and V samples rather than planar U samples followed by V samples.
Ok, if your data is YUV or NV12, the best tool for that is ffmpeg. We already know that the data is at the start of the file and we know the dimensions and the format. We also know that there is padding after the image, so we need to just take the first frame like this:
ffmpeg -s 1920x1080 -pix_fmt nv12 -i cam*yuv -frames:v 1 image.png
Now we have confidence about the dimensions and format, we need OpenCV to read that. The normal cv2.imread() cannot read that because it is just raw data, and unlike JPEG or PNG or TIFF, there is no image height and width in a header - it is just pure sensor data.
So, you need to use the regular C/C++ read() system call to get the first 1920x1080x1.5 bytes. Then you need to call cv2.cvtColor() on the received buffer to convert it to a regular BGR format Mat.

How to encode grayscale video in libvpx (webm)?

I have a stream of raw images that cames from a network grayscale camera that we are developing. In this case, our images are arrays of 8bits pixels (640x480). Since this camera outputs more than 200 frames per second, I need to store these images as a WebM video, as quickly as possible, in order to not lose any frame.
What is the best way of doing that, using libvpx?

The fastest and easiest thing to do would be to provide the gray scale plane directly into libvpx compression function vpx_codec_encode with VPX_IMG_FMT_I420. You'll have to input two 2x2 subsampled color planes with it though - 320x240 in your case - make all the octets of those planes have the value 128.

A misunderstanding of V4L2

I have a small problem with the size of my buffers in a C++ program.
I grab YUYV images from a camera using V4L2 (an example is available here )
I want to take one image and put it into a my own image structure.
Here is the buffer given by the V4L2 structure and its size
(uchar*)buffers_[buf.index].start, buf.bytesused
In my structure, I create a new buffer (mybuffer) with a size of width*height*bitSize (byte size is 4 since I grab YUYV or YUV422 images).
The problem is that I was expecting the buffer buf to be the same size as the one that I created. But this is not the case, for example when I grab a 640*480 image buf=614400 and mybuffer=1228800 (twice as big).
Does anyone have any idea why this is the case ?

YUV422 uses 4 bytes per 2 pixels
In YUV422 mode the U ans V values are shared between two pixels. The bytes in the Image are ordered like U0 Y0 V0 Y1 U2 Y2 V2 Y3 etc.
Giving pixels like:
pixel 0 U0Y0V0
pixel 1 U0Y1V0
pixel 2 U2Y2V2
pixel 3 U2Y3V2

Direct Show YUY2 Pixel Output from videoInput

I'm using videoInput to interface with DirectShow and get pixel data from my webcam.
From another question I've asked, people have suggested that the pixel format is just appended arrays in the order of the Y, U, and V channels.
FourCC's website suggests that the pixel format does not actually follow this pattern, and is instead |Y0|U0|Y1|V0|Y2|U0|Y3|V0|
I'm working on a few functions that convert the YUY2 input image into RGB and YV12, and after having little to no success, thought that it might be an issue with how I'm interpreting the initial YUY2 image data.
Am I correct in assuming that the pixel data should be in the format from the FourCC website, or are the Y, U and V channels separate arrays that have be concentrated (so the data is in the order of channels, for example: YYYYUUVV?

In YUY2 each row is a sequence of 4-byte packets: YUYV describing two adjacent pixels.
In YV12 there are 3 separate planes: first Y of size width*height then V and then U, both of size width/2 * height/2.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js