How to decode an MJPEG to raw RGB (or YUV) data - c++

I am using Video4Linux2 to open a connection to the camera connected to my machine. I have the ability to either request YUV or MJPEG data from my camera device. Since increasing the requested resolution from the camera, while also requesting YUV, causes the program to slow past the refresh rate of the camera (presumably because there is too much data to send in that amount of time), I require using the MJPEG data from the camera. I have been stuck for a while, and have found very little resources online on how to decode an MJPEG.
By the way, I have all of the following data:
unsigned char *data; // pointing to the data for the most current mjpeg frame from v4l2
size_t data_size; // the size (in bytes) of the mjpeg frame received from v4l2
unsigned char *r, *g, *b; // three heap allocated arrays in which to store the resulting data
// Can easily be altered to represent an array of structs holding all 3 components,
// as well as using yuv at different rates.
All I need is the ability to convert my mjpeg frame live into raw data, either RGB, or YUV.
I have heard of libraries like libjpeg, mjpegtools, nvjpeg, etc, however I have not been able to find much on how to use them to decode an mjpeg from where I am. Any help whatsoever would be greatly appreciated!

I figured it out via the sources linked in the comments. My working example is as follows:
// variables:
struct jpeg_decompress_struct cinfo;
struct jpeg_error_mgr jerr;
unsigned int width, height;
// data points to the mjpeg frame received from v4l2.
unsigned char *data;
size_t data_size;
// a *to be allocated* heap array to put the data for
// all the pixels after conversion to RGB.
unsigned char *pixels;
// ... In the initialization of the program:
cinfo.err = jpeg_std_error(&jerr);
jpeg_create_decompress(&cinfo);
pixels = new unsigned char[width * height * sizeof(Pixel)];
// ... Every frame:
if (!(data == nullptr) && data_size > 0) {
jpeg_mem_src(&cinfo, data, data_size);
int rc = jpeg_read_header(&cinfo, TRUE);
jpeg_start_decompress(&cinfo);
while (cinfo.output_scanline < cinfo.output_height) {
unsigned char *temp_array[] = {pixels + (cinfo.output_scanline) * width * 3};
jpeg_read_scanlines(&cinfo, temp_array, 1);
}
jpeg_finish_decompress(&cinfo);
}
If this still does not work for anyone who is trying to figure the same thing out, try to incorporate the "Huffman tables", which are needed by some cameras as said in the second comment.
https://github.com/jamieguinan/cti/blob/master/jpeg_misc.c#L234
https://github.com/jamieguinan/cti/blob/master/jpeghufftables.c

Related

uint8_t buffer to cv::Mat conversion results in distorted image

I have a Mipi camera that captures frames and stores them into the struct buffer that you can see below. Once the frame is stored I want to convert it into a cv::Mat, the thing is that the Mat ends up looking like the first pic.
The var buf.index is just part of the V4L2 API, useful to understand which buffer I'm using.
//The structure where the data is stored
struct buffer{
void *start;
size_t length;
};
struct buffer *buffers;
//buffer->mat
cv::Mat im = cv::Mat(cv::Size(width, height), CV_8UC3, ((uint8_t*)buffers[buf.index].start));
At first I thought that the data might be corrupted but storing the image with lodepng results in a nice image without any distortion.
unsigned char* out_buf = (unsigned char*)malloc( width * height * 3);
for(int pix = 0; pix < width*height; ++pix) {
memcpy(out_buf + pix*3, ((uint8_t*)buffers[buf.index].start)+4*pix+1, 3);
}
lodepng_encode24_file(filename, out_buf, width, height);
I bet it's something really silly.
the picture you post has oddly colored pixels and the patterns look like there's more information than simply 24 bits per pixel.
after inspecting the data, it appears that V4L gives you four bytes per pixel, and the first byte is always 0xFF (let's call that X). further, the channel order seems to be XRGB.
create a cv::Mat using 8UC4 to contain the data.
to use the picture in OpenCV, you need BGR order. cv::split the received data into its four color planes which are X,R,G,B. use cv::merge to reassemble the B,G,R planes into a picture that OpenCV can handle, or reassemble into R,G,B to create a Mat for other purposes (that other library you seem to use).

How to convert RGB Pixel Buffer to IRandomAccessStream^

I have structure like this:
struct PixelBuffer
{
unsigned int width,height,stride;
PixelFormat format;
unsigned char * buffer;
}
I want to convert it to IRandomAccessStream^ how do I do that? (bytes that buffer is pointing to)
I am not sure if PixelFormat makes difference and if there is favourable one, but let's say it is RGB format.
Thanks!
Note
I need SoftwareBitmap^ So my idea is to get IRandomAccessStream^ and then get BitmapDecoder^ from which I would get SoftwareBitmap^ if there is some shortcut that I am not aware of that would be great!
There is no need to go through Stream if SoftwareBitmap^ is what you need, you can get it directly from buffer like this
vector<unsigned char> bufferBGRA; // Convert your bytes to BGRA
DataWriter ^writer = ref new DataWriter();
writer->WriteBytes(Platform::ArrayReference<BYTE>(
bufferBGRA.data(),
width * height * 4)); // 4 channels (BGRA)
IBuffer ^buff= writer->DetachBuffer();
// Create SoftwareBitmap from buff
SoftwareBitmap^ softwareBitmap = ref new SoftwareBitmap(BitmapPixelFormat::Bgra8, width, height);
softwareBitmap->CopyFromBuffer(buff);
And yeah in this case BGRA is the favorable format, since you will need to convert your buffer to the format that BitmapPixelFormat supports like BGRA

FFMPEG decoding artifacts between keyframes

Marked question as outdated as using the deprecated avcodec_decode_video2
I'm currently experiencing artifacts when decoding video using ffmpegs api. On what I would assume to be intermediate frames, artifacts build slowly only from active movement in the frame. These artifacts build for 50-100 frames until I assume a keyframe resets them. Frames are then decoded correctly and the artifacts proceed to build again.
One thing that is bothering me is I have a few video samples that are 30fps(h264) that work correctly, but all of my 60fps videos(h264) experience the problem.
I don't currently have enough reputation to post an image, so hopefully this link will work.
http://i.imgur.com/PPXXkJc.jpg
int numBytes;
int frameFinished;
AVFrame* decodedRawFrame;
AVFrame* rgbFrame;
//Enum class for decoding results, used to break decode loop when a frame is gathered
DecodeResult retResult = DecodeResult::Fail;
decodedRawFrame = av_frame_alloc();
rgbFrame = av_frame_alloc();
if (!decodedRawFrame) {
fprintf(stderr, "Could not allocate video frame\n");
return DecodeResult::Fail;
}
numBytes = avpicture_get_size(PIX_FMT_RGBA, mCodecCtx->width,mCodecCtx->height);
uint8_t* buffer = (uint8_t *)av_malloc(numBytes*sizeof(uint8_t));
avpicture_fill((AVPicture *) rgbFrame, buffer, PIX_FMT_RGBA, mCodecCtx->width, mCodecCtx->height);
AVPacket packet;
while(av_read_frame(mFormatCtx, &packet) >= 0 && retResult != DecodeResult::Success)
{
// Is this a packet from the video stream?
if (packet.stream_index == mVideoStreamIndex)
{
// Decode video frame
int decodeValue = avcodec_decode_video2(mCodecCtx, decodedRawFrame, &frameFinished, &packet);
// Did we get a video frame?
if (frameFinished)// && rgbFrame->pict_type != AV_PICTURE_TYPE_NONE )
{
// Convert the image from its native format to RGB
int SwsFlags = SWS_BILINEAR;
// Accurate round clears up a problem where the start
// of videos have green bars on them
SwsFlags |= SWS_ACCURATE_RND;
struct SwsContext *ctx = sws_getCachedContext(NULL, mCodecCtx->width, mCodecCtx->height, mCodecCtx->pix_fmt, mCodecCtx->width, mCodecCtx->height,
PIX_FMT_RGBA, SwsFlags, NULL, NULL, NULL);
sws_scale(ctx, decodedRawFrame->data, decodedRawFrame->linesize, 0, mCodecCtx->height, rgbFrame->data, rgbFrame->linesize);
//if(count%5 == 0 && count < 105)
// DebugSavePPMImage(rgbFrame, mCodecCtx->width, mCodecCtx->height, count);
++count;
// Viewable frame is a struct to hold buffer and frame together in a queue
ViewableFrame frame;
frame.buffer = buffer;
frame.frame = rgbFrame;
mFrameQueue.push(frame);
retResult = DecodeResult::Success;
sws_freeContext(ctx);
}
}
// Free the packet that was allocated by av_read_frame
av_free_packet(&packet);
}
// Check for end of file leftover frames
if(retResult != DecodeResult::Success)
{
int result = av_read_frame(mFormatCtx, &packet);
if(result < 0)
isEoF = true;
av_free_packet(&packet);
}
// Free the YUV frame
av_frame_free(&decodedRawFrame);
I'm attempting to build a queue of the decoded frames that I then use and free as needed. Is my seperation of the frames causing the intermediate frames to be decoded incorrectly? I also break the decoding loop once I've successfully gathered a frame(Decode::Success, most examples I've seen tend to loop through the whole video.
All codec contect, video stream information, and format contexts are setup up exactly as shown in the main function of https://github.com/chelyaev/ffmpeg-tutorial/blob/master/tutorial01.c
Any suggestions would be greatly appreciated.
For reference if someone finds themselves in a similar position. Apparently with some of the older versions of FFMPEG there's an issue when using sws_scale to convert an image and not changing the actual dimensions of the final frame. If instead you create a flag for the SwsContext using:
int SwsFlags = SWS_BILINEAR; //Whatever you want
SwsFlags |= SWS_ACCURATE_RND; // Under the hood forces ffmpeg to use the same logic as if scaled
SWS_ACCURATE_RND has a performance penalty but for regular video it's probably not that noticeable. This will remove the splash of green, or green bars along the edges of textures if present.
I wanted to thank Multimedia Mike, and George Y, they were also right in that the way I was decoding the frame wasn't preserving the packets correctly and that was what caused the video artifacts building from previous frames.

How do I write PNG files from an openGL screen?

So I have this script which reads the display data into a character array pixels:
typedef unsigned char uchar;
// we will store the image data here
uchar *pixels;
// the thingy we use to write files
FILE * shot;
// we get the width/height of the screen into this array
int screenStats[4];
// get the width/height of the window
glGetIntegerv(GL_VIEWPORT, screenStats);
// generate an array large enough to hold the pixel data
// (width*height*bytesPerPixel)
pixels = new unsigned char[screenStats[2]*screenStats[3]*3];
// read in the pixel data, TGA's pixels are BGR aligned
glReadPixels(0, 0, screenStats[2], screenStats[3], 0x80E0,
GL_UNSIGNED_BYTE, pixels);
Normally, I save this to a TGA file, but since these get monstrously large I was hoping to use PNG instead as I quickly run out of hard drive space doing it this way (my images are highly monotonous and easily compressible, so the potential gain is huge). So I'm looking at PNG writer but I'm open to other suggestions. The usage example they give at their website is this:
#include <pngwriter.h>
int main()
{
pngwriter image(200, 300, 1.0, "out.png");
image.plot(30, 40, 1.0, 0.0, 0.0); // print a red dot
image.close();
return 0;
}
As I'm somewhat new to image processing I'm a little confused about the form of my pixels array and how I would convert this to a form representable in the above format. As a reference, I've been using the following script to convert my files to TGA:
//////////////////////////////////////////////////
// Grab the OpenGL screen and save it as a .tga //
// Copyright (C) Marius Andra 2001 //
// http://cone3d.gz.ee EMAIL: cone3d#hot.ee //
//////////////////////////////////////////////////
// (modified by me a little)
int screenShot(int const num)
{
typedef unsigned char uchar;
// we will store the image data here
uchar *pixels;
// the thingy we use to write files
FILE * shot;
// we get the width/height of the screen into this array
int screenStats[4];
// get the width/height of the window
glGetIntegerv(GL_VIEWPORT, screenStats);
// generate an array large enough to hold the pixel data
// (width*height*bytesPerPixel)
pixels = new unsigned char[screenStats[2]*screenStats[3]*3];
// read in the pixel data, TGA's pixels are BGR aligned
glReadPixels(0, 0, screenStats[2], screenStats[3], 0x80E0,
GL_UNSIGNED_BYTE, pixels);
// open the file for writing. If unsucessful, return 1
std::string filename = kScreenShotFileNamePrefix + Function::Num2Str(num) + ".tga";
shot=fopen(filename.c_str(), "wb");
if (shot == NULL)
return 1;
// this is the tga header it must be in the beginning of
// every (uncompressed) .tga
uchar TGAheader[12]={0,0,2,0,0,0,0,0,0,0,0,0};
// the header that is used to get the dimensions of the .tga
// header[1]*256+header[0] - width
// header[3]*256+header[2] - height
// header[4] - bits per pixel
// header[5] - ?
uchar header[6]={((int)(screenStats[2]%256)),
((int)(screenStats[2]/256)),
((int)(screenStats[3]%256)),
((int)(screenStats[3]/256)),24,0};
// write out the TGA header
fwrite(TGAheader, sizeof(uchar), 12, shot);
// write out the header
fwrite(header, sizeof(uchar), 6, shot);
// write the pixels
fwrite(pixels, sizeof(uchar),
screenStats[2]*screenStats[3]*3, shot);
// close the file
fclose(shot);
// free the memory
delete [] pixels;
// return success
return 0;
}
I don't normally like to just dump and bail on these forums but in this instance I'm simply stuck. I'm sure the conversion is close to trivial I just don't understand enough about image processing to get it done. If someone could provide a simple example for how to convert the pixels array into image.plot() in the PNG writer library, or provide a way of achieving this using a different library that would be great! Thanks.
Your current implementation does almost all the work. All you have to do is to write into the PNG file the pixel colors returned by OpenGL. Since there is no method in PNG Writer to pass an array of colors, you will have to write the pixels one by one.
Your call to glReadPixels() hides the requested color format. You should use one of the predefined constants (see the format argument) instead of 0x80E0. According to how you build the pixel array, I guess you are requesting red/green/blue components.
Thus, your pixel-to-png code may look like this:
const std::size_t image_width( screenStats[2] );
const std::size_t image_height( screenStats[3] );
pngwriter image( image_width, image_height, /*…*/ );
for ( std::size_t y(0); y != image_height; ++y )
for ( std::size_t x(0); x != image_width; ++x )
{
unsigned char* rgb( pixels + 3 * (y * image_width + x) );
image.plot( x, y, rgb[0], rgb[1], rgb[2] );
}
image.close()
As an alternative to PNGwriter, you may have a look at libclaw or use libpng as is.

How to convert Live video stream from YUV(HDYC) to RGB

I want to convert video streams from webcam. that video streams are called HDYC. I think it's a little special so I didn't control now.
My question is How to convert that format to rgb in c++ using ffmpeg? but there are some constraints.
I don't want to make a file. in other wors, It needs to convert video streams from webcam. also It's a real time operation.
Thanks.
I am not sure why you tagged it with h.264, because HDYC is a flavor of UYVY pixel format, layout and subsampling, just with ITU-R Rec. 709 defined color space.
So your question is how do you convert BT.709 YUV to RGB with FFmpeg. FFmpeg's libswscale can do this: its sws_scale does the conversion, and its sws_setColorspaceDetails lets you provide color space details for the conversion.
/**
* Scale the image slice in srcSlice and put the resulting scaled
* slice in the image in dst. A slice is a sequence of consecutive
* rows in an image.
[...] */
int sws_scale(struct SwsContext *c, const uint8_t *const srcSlice[],
const int srcStride[], int srcSliceY, int srcSliceH,
uint8_t *const dst[], const int dstStride[]);
/**
[...]
* #param table the yuv2rgb coefficients describing the output yuv space, normally ff_yuv2rgb_coeffs[x]
[...] */
int sws_setColorspaceDetails(struct SwsContext *c, const int inv_table[4],
int srcRange, const int table[4], int dstRange,
int brightness, int contrast, int saturation);