Improve speed saving texture to file using Directx 11 c++ - c++

I have a DirectX11 based render, and I need to save a lot of rendered images to hard disk. I have used SaveWICTextureToFile but takes 0.2 seconds to save each image.
Images are saved in resolution 1024x768.
Here it is the code to save the images:
ComPtr<ID3D11Texture2D> backBuffer;
HRESULT hr = _swapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), reinterpret_cast<LPVOID*>(backBuffer.GetAddressOf()));
throwIfFail(hr, "Unable to get a buffer");
#ifdef LOG
auto end = high_resolution_clock::now();
wchar_t str[256];
auto tmp = end;
#endif
hr = SaveWICTextureToFile(_context.Get(), backBuffer.Get(), GUID_ContainerFormatJpeg, w.c_str()/*,&GUID_WICPixelFormat32bppBGRA*/);
//hr = SaveDDSTextureToFile(_context.Get(), backBuffer.Get(), w.c_str()/*,&GUID_WICPixelFormat32bppBGRA*/);
#ifdef LOG
end = high_resolution_clock::now();
wsprintf(str, L"DXRender::saveLastRenderToFile: %d \n", duration_cast<microseconds>(end - tmp).count());
OutputDebugString(str);
tmp = end;
#endif
throwIfFail(hr, "Unable to save buffer");
How can I reduce the time it takes to save each image?

I have tested the libJPEG and libPNG libraries to save images, and it work fine and faster than SaveWICToTextureFile, the only trick here is that need to be in account the format of the DirectX texture..
Example:
Texture format: B8G8R8A8
Then to access to each color need to get the n-th 8 bits to get each channel..
This is the only trick here..
Example:
auto lastText = _textureDesc;
_inputTexture->GetDesc(&_textureDesc);
unsigned char r, g, b;
int _textureRowSize = _mappedResource.RowPitch / sizeof(unsigned char);
int hIdx = 0;
for (int i = 0; i < _textureDesc.Height; ++i)
{
int wIdx = 0;
for (int j = 0; j < _textureDesc.Width; ++j)
{
r = (unsigned char)textPtr[hIdx + wIdx + 2];
g = (unsigned char)textPtr[hIdx + wIdx + 1];
b = (unsigned char)textPtr[hIdx + wIdx];
//do whatever you want with this values...
wIdx += 4;
}
hIdx += _textureRowSize;
}

Related

Image Packing Using FreeImage C++ Library, Pixel Values of all images are not adding

I was trying to pack multiple images in a single image, using Bin Packing algorithm. In the part of adding images in a single image I was trying with collecting all the image pixel values and put them in the empty frame, but this is not working. Is there any suggestions?
Hi Edited the question,
` FIBITMAP *out_bmp = FreeImage_Allocate(4096, 4096, 32, 0, 0, 0);
BYTE *out_bits = FreeImage_GetBits(out_bmp);
int out_pitch = FreeImage_GetPitch(out_bmp);
// copy all the images to the final one
for (int i = 0; i < files.size(); i++) {
string s = "PathToFile" + files[i];
FIBITMAP* img0 = FreeImage_Load(FreeImage_GetFileType(s.c_str(), 0), s.c_str());
// make sure the input picture is 32-bits
if (FreeImage_GetBPP(img0) != 32) {
FIBITMAP *new_bmp = FreeImage_ConvertTo32Bits(img0);
FreeImage_Unload(img0);
img0 = new_bmp;
}
int img_pitch = FreeImage_GetPitch(img0);
BYTE *img_bits = FreeImage_GetBits(img0);
BYTE *out_bits_ptr = out_bits + out_pitch *
FreeImage_GetHeight(img0) + 4 * FreeImage_GetWidth(img0);
for (int y = 0; y < FreeImage_GetHeight(img0); y += 1) {
memcpy(out_bits_ptr, img_bits, FreeImage_GetWidth(img0) * 4);
out_bits_ptr += out_pitch;
img_bits += img_pitch;
}
}`

How to set variable FPS in libx264 and what encoder parameters to use?

I'm trying to encode a webcam frames with libx264 in realtime, and face with one problem - the resulting video length is exactly what I set, but camera is delays somtimes and the real capture time is more than video length. As a result the picture in video changes to fast.I think it is due to constant FPS in x264 settings, so I need to make it dynamic somehow. Is it possible? If I wrong about FPS, so what I need to do, to synchronize capturing and writing?
Also I would like to know what are the optimal encoder parameters for streaming via internet and for recording to disk (the client is streaming from camera or screen, and the server is recording)?
Here is console logs screenshot and my code:
#include <stdint.h>
#include "stringf.h"
#include "Capture.h"
#include "x264.h"
int main( int argc, char **argv ){
Camera instance;
if(!instance.Enable(0)){printf("Camera not available\n");return 1;}
// Initializing metrics and buffer of frame
unsigned int width, height, size = instance.GetMetrics(width, height);
unsigned char *data = (unsigned char *)malloc(size);
// Setting encoder (I'm not sure about all parameters)
x264_param_t param;
x264_param_default_preset(&param, "ultrafast", "zerolatency");
param.i_threads = 1;
param.i_width = width;
param.i_height = height;
param.i_fps_num = 20;
param.i_fps_den = 1;
// Intra refres:
param.i_keyint_max = 8;
param.b_intra_refresh = 1;
// Rate control:
param.rc.i_rc_method = X264_RC_CRF;
param.rc.f_rf_constant = 25;
param.rc.f_rf_constant_max = 35;
// For streaming:
param.b_repeat_headers = 1;
param.b_annexb = 1;
x264_param_apply_profile(&param, "baseline");
x264_t* encoder = x264_encoder_open(&param);
int seconds, expected_time, operation_start, i_nals, frame_size, frames_count;
expected_time = 1000/param.i_fps_num;
operation_start = 0;
seconds = 1;
frames_count = param.i_fps_num * seconds;
int *Timings = new int[frames_count];
x264_picture_t pic_in, pic_out;
x264_nal_t* nals;
x264_picture_alloc(&pic_in, X264_CSP_I420, param.i_width, param.i_height);
// Capture-Encode-Write loop
for(int i = 0; i < frames_count; i++){
operation_start = GetTickCount();
size = instance.GrabBGR(&data);
instance.BGRtoI420(data, &pic_in.img.plane[0], &pic_in.img.plane[1], &pic_in.img.plane[2], param.i_width, param.i_height);
frame_size = x264_encoder_encode(encoder, &nals, &i_nals, &pic_in, &pic_out);
if( frame_size > 0){
stringf::WriteBufferToFile("test.h264",std::string(reinterpret_cast<char*>(nals->p_payload), frame_size),1);
}
Timings[i] = GetTickCount() - operation_start;
}
while( x264_encoder_delayed_frames( encoder ) ){ // Flush delayed frames
frame_size = x264_encoder_encode(encoder, &nals, &i_nals, NULL, &pic_out);
if( frame_size > 0 ){stringf::WriteBufferToFile("test.h264",std::string(reinterpret_cast<char*>(nals->p_payload), frame_size),1);}
}
unsigned int total_time = 0;
printf("Expected operation time was %d ms per frame at %u FPS\n",expected_time, param.i_fps_num);
for(unsigned int i = 0; i < frames_count; i++){
total_time += Timings[i];
printf("Frame %u takes %d ms\n",(i+1), Timings[i]);
}
printf("Record takes %u ms\n",total_time);
free(data);
x264_encoder_close( encoder );
x264_picture_clean( &pic_in );
return 0;
}
The capture takes 1453 ms and the output file plays exactly 1 sec.
So, in general, the video length must be the same as a capture time, but not as encoder "wants".How to do it?

Problems with fcvDrawContouru8 for 64bit systems

I am using fastcv v1.7 to develop an image processing algorithm, a part of the process includes finding contours from an image, selecting a choice few contours among them and then drawing those contours only.
This code block runs smoothly in 32bit systems producing expected output but while on 64bit systems same code crashes unexpectedly during the loop which executes fcvDrawContouru8. The crash is unexpected as sometimes loop iterataes 2 or 3 times and sometimes crashes on first iteration. Can't seem to work out if the problem is with memory allocation in 64bit or with fastcv itself. Any suggestions will be helpful.
uint8_t* dist_fcv = (uint8_t*)fcvMemAlloc(dist_8u.cols*dist_8u.rows*OPT_CV_ELEM_SIZE(OPT_CV_8UC1), FCV_ALIGN);
memset(dist_fcv, 0, dist_8u.cols*dist_8u.rows*OPT_CV_ELEM_SIZE(OPT_CV_8UC1));
uint32_t maxNumContours = MAX_CNT;
uint32_t sizeOfpBuffer = 0;
uint32_t maxPoints= ((2*dist_8u.cols) + (2 * dist_8u.rows));
uint32_t pNumContours = 0;
uint32_t pNumContourPoints[MAX_CNT] = {0};
uint32_t** pContourStartPointsfind = (uint32_t**)fcvMemAlloc(MAX_CNT*2*sizeof(uint32_t*),16);
sizeOfpBuffer = (MAX_CNT * 2 * maxPoints * sizeof(uint32_t));
uint32_t *pPointBuffer=(uint32_t *)malloc(sizeOfpBuffer);
memset(pPointBuffer,0,sizeOfpBuffer);
int32_t hierarchy[MAX_CNT][4];
void* cHandle = fcvFindContoursAllocate(dist_8u.cols);
fcvFindContoursExternalu8(textureless.data.ptr,
dist_8u.cols,
dist_8u.rows,
dist_8u.cols,
maxNumContours,
&pNumContours,
pNumContourPoints,
pContourStartPointsfind,
pPointBuffer,
sizeOfpBuffer,
hierarchy,
cHandle);
size_t n_TL = 0;
uint32_t** pContourStartPointsdraw = (uint32_t**)fcvMemAlloc(MAX_CNT*2*sizeof(uint32_t*),16);
uint32_t pNumDrawContourPoints[MAX_CNT] = {0};
uint32_t* dPointBuffer=(uint32_t *)malloc(sizeOfpBuffer);
uint32_t* start_contour = pPointBuffer;
uint32_t* start_contour_dPoint = dPointBuffer;
uint32_t** startFind_ptr = pContourStartPointsfind;
uint32_t** draw_ptr = pContourStartPointsdraw;
for (size_t i = 0; i < pNumContours; i++,startFind_ptr++)
{
int points_per_contour = pNumContourPoints[i];
double area = polyArea(start_contour,points_per_contour*2);
if(area < min_textureless_area)
{
start_contour = start_contour + points_per_contour*2;
continue;
}
*(draw_ptr) = *(startFind_ptr);
pNumDrawContourPoints[n_TL] = pNumContourPoints[i];
memcpy(start_contour_dPoint,start_contour,points_per_contour*2*sizeof(uint32_t));
start_contour_dPoint = start_contour_dPoint + points_per_contour*2;
start_contour = start_contour + points_per_contour*2;
n_TL++;
draw_ptr++;
}
uint32_t* holeflag = (uint32_t*)malloc(pNumContours*sizeof(uint32_t));
memset(holeflag,0,pNumContours*sizeof(uint32_t));
uint32_t bufferSize = 0;
start_contour_dPoint = dPointBuffer;
draw_ptr = pContourStartPointsdraw;
for(int i = 0; i < n_TL; i++)
{
int points_per_contour = pNumDrawContourPoints[i];
bufferSize = points_per_contour*2*sizeof(uint32_t);
fcvDrawContouru8(dist_fcv,
dist_8u.cols,
dist_8u.rows,
dist_8u.cols,
1,
holeflag,
&pNumDrawContourPoints[i],
(const uint32_t ** __restrict)(draw_ptr),
bufferSize,
start_contour_dPoint,
hierarchy,
1,1,i+1,0)
start_contour_dPoint = start_contour_dPoint + points_per_contour*2;
draw_ptr++;
}
free(pPointBuffer);
fcvFindContoursDelete(cHandle);
fcvMemFree(pContourStartPointsfind);

DirectShow ISampleGrabber: samples are upside-down and color channels reverse

I have to use MS DirectShow to capture video frames from a camera (I just want the raw pixel data).
I was able to build the Graph/Filter network (capture device filter and ISampleGrabber) and implement the callback (ISampleGrabberCB). I receive samples of appropriate size.
However, they are always upside down (flipped vertically that is, not rotated) and the color channels are BGR order (not RGB).
I tried setting the biHeight field in the BITMAPINFOHEADER to both positive and negative values, but it doesn't have any effect. According to MSDN documentation, ISampleGrapper::SetMediaType() ignores the format block for video data anyways.
Here is what I see (recorded with a different camera, not DS), and what DirectShow ISampleGrabber gives me: The "RGB" is actually in red, green and blue respectively:
Sample of the code I'm using, slightly simplified:
// Setting the media type...
AM_MEDIA_TYPE* media_type = 0 ;
this->ds.device_streamconfig->GetFormat(&media_type); // The IAMStreamConfig of the capture device
// Find the BMI header in the media type struct
BITMAPINFOHEADER* bmi_header;
if (media_type->formattype != FORMAT_VideoInfo) {
bmi_header = &((VIDEOINFOHEADER*)media_type->pbFormat)->bmiHeader;
} else if (media_type->formattype != FORMAT_VideoInfo2) {
bmi_header = &((VIDEOINFOHEADER2*)media_type->pbFormat)->bmiHeader;
} else {
return false;
}
// Apply changes
media_type->subtype = MEDIASUBTYPE_RGB24;
bmi_header->biWidth = width;
bmi_header->biHeight = height;
// Set format to video device
this->ds.device_streamconfig->SetFormat(media_type);
// Set format for sample grabber
// bmi_header->biHeight = -(height); // tried this for either and both interfaces, no effect
this->ds.sample_grabber->SetMediaType(media_type);
// Connect filter pins
IPin* out_pin= getFilterPin(this->ds.device_filter, OUT, 0); // IBaseFilter interface for the capture device
IPin* in_pin = getFilterPin(this->ds.sample_grabber_filter, IN, 0); // IBaseFilter interface for the sample grabber filter
out_pin->Connect(in_pin, media_type);
// Start capturing by callback
this->ds.sample_grabber->SetBufferSamples(false);
this->ds.sample_grabber->SetOneShot(false);
this->ds.sample_grabber->SetCallback(this, 1);
// start recording
this->ds.media_control->Run(); // IMediaControl interface
I'm checking return types for every function and don't get any errors.
I'm thankful for any hint or idea.
Things I already tried:
Setting the biHeight field to a negative value for either the capture device filter or the sample grabber or for both or for neither - doesn't have any effect.
Using IGraphBuilder to connect the pins - same problem.
Connecting the pins before changing the media type - same problem.
Checking if the media type was actually applied by the filter by querying it again - but it apparently is applied or at least stored.
Interpreting the image as total byte reversed (last byte first, first byte last) - then it would be flipped horizontally.
Checking if it's a problem with the video camera - when I test it with VLC (DirectShow capture) it looks normal.
My quick hack for this:
void Camera::OutputCallback(unsigned char* data, int len, void *instance_)
{
Camera *instance = reinterpret_cast<Camera*>(instance_);
int j = 0;
for (int i = len-4; i > 0; i-=4)
{
instance->buffer[j] = data[i];
instance->buffer[j + 1] = data[i + 1];
instance->buffer[j + 2] = data[i + 2];
instance->buffer[j + 3] = data[i + 3];
j += 4;
}
Transport::RTPPacket packet;
packet.payload = instance->buffer;
packet.payloadSize = len;
instance->receiver->Send(packet);
}
It's correct on RGB32 color space, for other color spaces this code need to be corrected
I noticed that when using the I420 color space turning disappears.
In addition, most current codecs (VP8) is used as a format raw I/O I420 color space.
I wrote a simple mirroring frame function in color space I420.
void Camera::OutputCallback(unsigned char* data, int len, uint32_t timestamp, void *instance_)
{
Camera *instance = reinterpret_cast<Camera*>(instance_);
Transport::RTPPacket packet;
packet.rtpHeader.ts = timestamp;
packet.payload = data;
packet.payloadSize = len;
if (instance->mirror)
{
Video::ResolutionValues rv = Video::GetValues(instance->resolution);
int k = 0;
// Chroma values
for (int i = 0; i != rv.height; ++i)
{
for (int j = rv.width; j != 0; --j)
{
int l = ((rv.width * i) + j);
instance->buffer[k++] = data[l];
}
}
// U values
for (int i = 0; i != rv.height/2; ++i)
{
for (int j = (rv.width/2); j != 0; --j)
{
int l = (((rv.width / 2) * i) + j) + rv.height*rv.width;
instance->buffer[k++] = data[l];
}
}
// V values
for (int i = 0; i != rv.height / 2; ++i)
{
for (int j = (rv.width / 2); j != 0; --j)
{
int l = (((rv.width / 2) * i) + j) + rv.height*rv.width + (rv.width/2)*(rv.height/2);
if (l == len)
{
instance->buffer[k++] = 0;
}
else
{
instance->buffer[k++] = data[l];
}
}
}
packet.payload = instance->buffer;
}
instance->receiver->Send(packet);
}

How to compress YUYV raw data to JPEG using libjpeg?

I'm looking for an example of how to save a YUYV format frame to a JPEG file using the libjpeg library.
In typical computer APIs, "YUV" actually means YCbCr, and "YUYV" means "YCbCr 4:2:2" stored as Y0, Cb01, Y1, Cr01, Y2 ...
Thus, if you have a "YUV" image, you can save it to libjpeg using the JCS_YCbCr color space.
When you have a 422 image (YUYV) you have to duplicate the Cb/Cr values to the two pixels that need them before writing the scanline to libjpeg. Thus, this write loop will do it for you:
// "base" is an unsigned char const * with the YUYV data
// jrow is a libjpeg row of samples array of 1 row pointer
cinfo.image_width = width & -1;
cinfo.image_height = height & -1;
cinfo.input_components = 3;
cinfo.in_color_space = JCS_YCbCr;
jpeg_set_defaults(&cinfo);
jpeg_set_quality(&cinfo, 92, TRUE);
jpeg_start_compress(&cinfo, TRUE);
unsigned char *buf = new unsigned char[width * 3];
while (cinfo.next_scanline < height) {
for (int i = 0; i < cinfo.image_width; i += 2) {
buf[i*3] = base[i*2];
buf[i*3+1] = base[i*2+1];
buf[i*3+2] = base[i*2+3];
buf[i*3+3] = base[i*2+2];
buf[i*3+4] = base[i*2+1];
buf[i*3+5] = base[i*2+3];
}
jrow[0] = buf;
base += width * 2;
jpeg_write_scanlines(&cinfo, jrow, 1);
}
jpeg_finish_compress(&cinfo);
delete[] buf;
Use your favorite auto-ptr to avoid leaking "buf" if your error or write function can throw / longjmp.
Providing YCbCr to libjpeg directly is preferrable to converting to RGB, because it will store it directly in that format, thus saving a lot of conversion work. When the image comes from a webcam or other video source, it's also usually most efficient to get it in YCbCr of some sort (such as YUYV.)
Finally, "U" and "V" mean something slightly different in analog component video, so the naming of YUV in computer APIs that really mean YCbCr is highly confusing.
libjpeg also has a raw data mode, whereby you can directly supply the raw downsampled data (which is almost what you have in the YUYV format). This is more efficient than duplicating the UV values only to have libjpeg downscale them again internally.
To do so, you use jpeg_write_raw_data instead of jpeg_write_scanlines, and by default it will process exactly 16 scanlines at a time. JPEG expects the U and V planes to be 2x downsampled by default. YUYV format already has the horizontal dimension downsampled but not the vertical, so I skip U and V every other scanline.
Initialization:
cinfo.image_width = /* width in pixels */;
cinfo.image_height = /* height in pixels */;
cinfo.input_components = 3;
cinfo.in_color_space = JCS_YCbCr;
jpeg_set_defaults(&cinfo);
cinfo.raw_data_in = true;
JSAMPLE y_plane[16][cinfo.image_width];
JSAMPLE u_plane[8][cinfo.image_width / 2];
JSAMPLE v_plane[8][cinfo.image_width / 2];
JSAMPROW y_rows[16];
JSAMPROW u_rows[8];
JSAMPROW v_rows[8];
for (int i = 0; i < 16; ++i)
{
y_rows[i] = &y_plane[i][0];
}
for (int i = 0; i < 8; ++i)
{
u_rows[i] = &u_plane[i][0];
}
for (int i = 0; i < 8; ++i)
{
v_rows[i] = &v_plane[i][0];
}
JSAMPARRAY rows[] { y_rows, u_rows, v_rows };
Compressing:
jpeg_start_compress(&cinfo, true);
while (cinfo.next_scanline < cinfo.image_height)
{
for (JDIMENSION i = 0; i < 16; ++i)
{
auto offset = (cinfo.next_scanline + i) * cinfo.image_width * 2;
for (JDIMENSION j = 0; j < cinfo.image_width; j += 2)
{
y_plane[i][j] = image.data[offset + j * 2 + 0];
y_plane[i][j + 1] = image.data[offset + j * 2 + 2];
if (i % 2 == 0)
{
u_plane[i / 2][j / 2] = image_data[offset + j * 2 + 1];
v_plane[i / 2][j / 2] = image_data[offset + j * 2 + 3];
}
}
}
jpeg_write_raw_data(&cinfo, rows, 16);
}
jpeg_finish_compress(&cinfo);
I was able to get about a 33% decrease in compression time with this method compared to the one in #JonWatte's answer. This solution isn't for everyone though; some caveats:
You can only compress images with dimensions that are a multiple of 8. If you have different-sized images, you will have to write code to pad in the edges. If you're getting the images from a camera though, they will most likely be this way.
The quality is somewhat impaired by the fact that I simply skip color values for alternating scanlines instead of something fancier like averaging them. For my application though, speed was more important than quality.
The way it's written right now it allocates a ton of memory on the stack. This was acceptable for me because my images were small (640x480) and enough memory was available.
Documentation for libjpeg-turbo: https://raw.githubusercontent.com/libjpeg-turbo/libjpeg-turbo/master/libjpeg.txt