I want to convert video streams from webcam. that video streams are called HDYC. I think it's a little special so I didn't control now.
My question is How to convert that format to rgb in c++ using ffmpeg? but there are some constraints.
I don't want to make a file. in other wors, It needs to convert video streams from webcam. also It's a real time operation.
Thanks.
I am not sure why you tagged it with h.264, because HDYC is a flavor of UYVY pixel format, layout and subsampling, just with ITU-R Rec. 709 defined color space.
So your question is how do you convert BT.709 YUV to RGB with FFmpeg. FFmpeg's libswscale can do this: its sws_scale does the conversion, and its sws_setColorspaceDetails lets you provide color space details for the conversion.
/**
* Scale the image slice in srcSlice and put the resulting scaled
* slice in the image in dst. A slice is a sequence of consecutive
* rows in an image.
[...] */
int sws_scale(struct SwsContext *c, const uint8_t *const srcSlice[],
const int srcStride[], int srcSliceY, int srcSliceH,
uint8_t *const dst[], const int dstStride[]);
/**
[...]
* #param table the yuv2rgb coefficients describing the output yuv space, normally ff_yuv2rgb_coeffs[x]
[...] */
int sws_setColorspaceDetails(struct SwsContext *c, const int inv_table[4],
int srcRange, const int table[4], int dstRange,
int brightness, int contrast, int saturation);
Related
I have a Mipi camera that captures frames and stores them into the struct buffer that you can see below. Once the frame is stored I want to convert it into a cv::Mat, the thing is that the Mat ends up looking like the first pic.
The var buf.index is just part of the V4L2 API, useful to understand which buffer I'm using.
//The structure where the data is stored
struct buffer{
void *start;
size_t length;
};
struct buffer *buffers;
//buffer->mat
cv::Mat im = cv::Mat(cv::Size(width, height), CV_8UC3, ((uint8_t*)buffers[buf.index].start));
At first I thought that the data might be corrupted but storing the image with lodepng results in a nice image without any distortion.
unsigned char* out_buf = (unsigned char*)malloc( width * height * 3);
for(int pix = 0; pix < width*height; ++pix) {
memcpy(out_buf + pix*3, ((uint8_t*)buffers[buf.index].start)+4*pix+1, 3);
}
lodepng_encode24_file(filename, out_buf, width, height);
I bet it's something really silly.
the picture you post has oddly colored pixels and the patterns look like there's more information than simply 24 bits per pixel.
after inspecting the data, it appears that V4L gives you four bytes per pixel, and the first byte is always 0xFF (let's call that X). further, the channel order seems to be XRGB.
create a cv::Mat using 8UC4 to contain the data.
to use the picture in OpenCV, you need BGR order. cv::split the received data into its four color planes which are X,R,G,B. use cv::merge to reassemble the B,G,R planes into a picture that OpenCV can handle, or reassemble into R,G,B to create a Mat for other purposes (that other library you seem to use).
I am using Video4Linux2 to open a connection to the camera connected to my machine. I have the ability to either request YUV or MJPEG data from my camera device. Since increasing the requested resolution from the camera, while also requesting YUV, causes the program to slow past the refresh rate of the camera (presumably because there is too much data to send in that amount of time), I require using the MJPEG data from the camera. I have been stuck for a while, and have found very little resources online on how to decode an MJPEG.
By the way, I have all of the following data:
unsigned char *data; // pointing to the data for the most current mjpeg frame from v4l2
size_t data_size; // the size (in bytes) of the mjpeg frame received from v4l2
unsigned char *r, *g, *b; // three heap allocated arrays in which to store the resulting data
// Can easily be altered to represent an array of structs holding all 3 components,
// as well as using yuv at different rates.
All I need is the ability to convert my mjpeg frame live into raw data, either RGB, or YUV.
I have heard of libraries like libjpeg, mjpegtools, nvjpeg, etc, however I have not been able to find much on how to use them to decode an mjpeg from where I am. Any help whatsoever would be greatly appreciated!
I figured it out via the sources linked in the comments. My working example is as follows:
// variables:
struct jpeg_decompress_struct cinfo;
struct jpeg_error_mgr jerr;
unsigned int width, height;
// data points to the mjpeg frame received from v4l2.
unsigned char *data;
size_t data_size;
// a *to be allocated* heap array to put the data for
// all the pixels after conversion to RGB.
unsigned char *pixels;
// ... In the initialization of the program:
cinfo.err = jpeg_std_error(&jerr);
jpeg_create_decompress(&cinfo);
pixels = new unsigned char[width * height * sizeof(Pixel)];
// ... Every frame:
if (!(data == nullptr) && data_size > 0) {
jpeg_mem_src(&cinfo, data, data_size);
int rc = jpeg_read_header(&cinfo, TRUE);
jpeg_start_decompress(&cinfo);
while (cinfo.output_scanline < cinfo.output_height) {
unsigned char *temp_array[] = {pixels + (cinfo.output_scanline) * width * 3};
jpeg_read_scanlines(&cinfo, temp_array, 1);
}
jpeg_finish_decompress(&cinfo);
}
If this still does not work for anyone who is trying to figure the same thing out, try to incorporate the "Huffman tables", which are needed by some cameras as said in the second comment.
https://github.com/jamieguinan/cti/blob/master/jpeg_misc.c#L234
https://github.com/jamieguinan/cti/blob/master/jpeghufftables.c
I am in C++.
Assume some mysterious function getData() returns all but only the pixel information of an image.
i.e a char* that points to only the pixel information with no metadata (no width, length, height, nor channels of any form)
Thus we have:
unsigned char *raw_data = getData();
Then we have another function that returns a structure containing the metadata.
eg:
struct Metadata {
int width;
int height;
int channels;
//other useful fields
}
I now need to prepend the object metadata in the correct way to create a valid image buffer.
So instead of [pixel1, pixel2, pixel3 ...]
I would have, for example [width, height, channels, pixel1, pixel2, pixel3...]
What is the correct order to prepend the metadata and are width, height and channels enough?
You can use Mat constructor to create an image from data and meta data
Mat::Mat(int rows, int cols, int type, void* data, size_t
step=AUTO_STEP); // documentation here
cv::Mat image = cv::Mat(height, width, CV_8UC3, raw_data);
type argument specifies the number of channels and data format. For example, typical RGB image data is unsigned char and the number of channels is 3 so its type = CV_8UC3
Available OpenCV Mat types are defined cvdef.h
I am having an issue where the .png image that I want to load as a byte array using DevIL is not having an alpha channel.
A complete black image is also appearing as having alpha channel values as 0.
This is my image loading function:
DevILCall(ilGenImages(1, &m_ImageID));
DevILCall(ilBindImage(m_ImageID));
ASSERT("Loading image: " + path);
DevILCall(ilLoadImage(path.c_str()));
GraphicComponents::Image image(
ilGetData(),
ilGetInteger(IL_IMAGE_HEIGHT),
ilGetInteger(IL_IMAGE_WIDTH),
ilGetInteger(IL_IMAGE_BITS_PER_PIXEL)
);
return image;
The Image object I am using is as follows:
struct Image
{
ILubyte * m_Image;
const unsigned int m_Height;
const unsigned int m_Width;
const unsigned int m_BPP;
Image(ILubyte imageData[ ], unsigned int height, unsigned int width, unsigned int bpp);
~Image();
};
And this is how I am printing out the image data for now:
for(unsigned int i = 0; i < image->m_Height*image->m_Width*4; i+=4)
{
LOG("Red:");
LOG((int) image->m_Image[i]);
LOG("Green:");
LOG((int) image->m_Image[i+1]);
LOG("Blue:");
LOG((int) image->m_Image[i+2]);
LOG("Alpha:");
LOG((int) image->m_Image[i+3]);
}
I also tried using the ilTexImage() to format the loaded image to RGBA format but that also doesn't seem to work. The printing loop starts reading garbage values when I change the maximum value of the loop variable to 4 times the number of pixels in the image.
The image is also confirmed to have an alpha channel.
What might be going wrong here?
EDIT: ilGetInteger(IL_IMAGE_BPP) is returning 3, which should mean RGB for now. When I use the ilTexImage() to force 4 channels, then ilGetInteger(IL_IMAGE_BPP) returns 4 but I still see garbage values popping up at the std output
The problem was fixed by a simple ilConvertImage(IL_RGBA, IL_UNSIGNED_BYTE) call after loading the image.
I suppose DevIL loads the image in RGB mode with unsigned byte values by default and to use otherwise, you need to convert the loaded image using ilConvertImage().
I am trying to move image data from Magick++ to tesseract.
I have the PNG data and some info about it.
And the signature for the tesseract method is:
void SetImage(const unsigned char* imagedata, int width, int height, int bytes_per_pixel, int bytes_per_line);
The first three arguments I can supply just fine. But bytes_per_pixel and bytes_per_line I'm not so sure about. The image itself has 11564 pixels but the length of the data is only 356 bytes... It's mostly a white image with some text. 11564/356 = 32.48 which obviously is not the correct bytes per pixel. How can I get the right bytes / pixel information? It's ok to just get that for one image on my desktop or something and set it as a constant, all the images I'm processing will have the same format.
Then as far as bytes per line, would that just be image width in pixels * bytes per pixel?
bytes_per_pixel can be obtained from PNG data. They are usually 8, 24 or 32.
bytes_per_line too, but you can compute it doing: bytes_per_pixel * width / 8