How to capture DirectX animations as videos - c++

I am trying to record an animation that I created using DirectX 11 as a video, which I can present whenever needed (without re-rendering). I am still learning about DirectX and windows API.
This's what I've done so far, I was able to capture animation frames using DirectXTk and following this post. After that I'm using OpenCV to collect frames from disk, and create a video. Is there a way to merge this process? That way I'd be able to append frames into a video file right after img capture.
Code for animation capture:
static int Frame_Number;
void D3D::screenCapture() {
//For each Call to Present() do the following:
//Get Device
//ID3D11Device* baseDevice;
HRESULT gd = m_swapChain->GetDevice(__uuidof(ID3D11Device), (void**)&m_device);
assert(gd == S_OK);
//Get context
//ID3D11DeviceContext* context;
m_device->GetImmediateContext(&m_deviceContext);
//get pointer to back buffer
ID3D11Texture2D* backbufferTex;
HRESULT gb = m_swapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), (LPVOID*)&backbufferTex);
assert(gb == S_OK);
//Set-up Directory
std::wstringstream Image_Directory;
Image_Directory << L"path to directory/screenShots" << Frame_Number << L".JPG";
//Capture Frame
REFGUID GUID_ContainerFormatJpeg{ 0x19e4a5aa, 0x5662, 0x4fc5, 0xa0, 0xc0, 0x17, 0x58, 0x2, 0x8e, 0x10, 0x57 };
HRESULT hr = DirectX::SaveWICTextureToFile(m_deviceContext, backbufferTex, GUID_ContainerFormatJpeg, Image_Directory.str().c_str());
assert(hr == S_OK);
Frame_Number = Frame_Number + 1;
}
I call this function after I present the rendered scene to the screen. After that I use a python script to create a video from the captured frames.
This's not optimal, especially in the case of rendering many animations. It will take forever, I would like to eliminate the reading and writing to disk. Is there a way to get frames from
SaveWICTextureToFile
that I can push in a video in a sequential manner.
How could one accomplish this?
I would really appreciate any help or pointers.

Possible, but relatively hard, many pages of code.
Here’s a tutorial written by Microsoft. You gonna need to change following there.
Integrate with Direct3D. To do that, call MFCreateDXGIDeviceManager, then IMFDXGIDeviceManager.ResetDevice, then pass that IMFDXGIDeviceManager interface in MF_SINK_WRITER_D3D_MANAGER attribute when creating the sink writer. Also, don’t forget to set MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS to TRUE, you don’t want software encoders, they are way too slow.
Replace the video codec, use MFVideoFormat_H264 instead of MFVideoFormat_WMV3, and *.mp4 extension for the output file.
The sample code encodes video frames provided in system memory. Instead, you should supply your video frames in VRAM. Every time your D3D app renders a frame, create a new D3D texture, copy your render target into that new texture with CopyResource, then call MFCreateDXGISurfaceBuffer. This will create IMFMediaBuffer object referencing a video frame in VRAM. You can then submit that video frame to the sink writer, and it should do the rest of the things automagically.
If you’ll manage to implement that correctly, Media Foundation framework gonna use a proprietary hardware transform to convert RGBA textures into NV12 textures on GPU, then use another proprietary hardware transform to encode NV12 into h.264, download encoded video samples from VRAM to system RAM as soon as ready, and append these encoded samples into mpeg-4 container file on disk.
Both of the above transforms are implemented by GPU, in hardware. All 3 GPU vendors have hardware for that, and they ship media foundation transform DLLs to use their hardware as a part of their GPU drivers.

Related

Handling Image data from IMFSourceReader and IMFSample

I am attempting to use the IMFSourceReader to read and decode a .mp4 file. I have configured the source reader to decode to MFVideoFormat_NV12 by setting a partial media type and calling IMFSourceReader::SetCurrentMediaType and loaded a video with dimensions of 1266x544.
While processing I receive the MF_SOURCE_READERF_CURRENTMEDIATYPECHANGED flag with a new dimension of 1280x544 and a MF_MT_MINIMUM_DISPLAY_APERTURE of 1266x544.
I believe the expectation is to then use either the video resizer dsp or video processor mft. However it is my understanding that the video processor mft requires windows 8.1 while I am on windows 7, and the video resizer dsp does not support MFVideoFormat_NV12.
What is the correct way to crop out the extra data added by the source reader to display only the data within the minimum display aperture for MFVideoFormat_NV12?
New media type says this: "video is 1266x544 and you expected/requested, but I have to carry it in 1280x544 textures because this is how GPU wanted it to work".
Generally speaking this does not require further scaling or cropping you already have the frames you need. If you are reading them out of sample objects - which is what I believe you are trying to do - just use increased stride (1280 bytes between consecutive rows).
If you are using this as a texture, presenting it somewhere or using it as a part of rendering, you would just use adjusted coordinates (0, 0) - (1266, 544) ignoring the remainder, as opposed to using full texture.

How to decode a video file straight to a Direct3D11 texture using Windows Media Foundation?

I'd like to decode the contents of a video file to a Direct3D11 texture and avoid the copies back and forth to CPU memory. Ideally, the library will play the audio itself and call back into my code whenever a video frame has been decoded.
On the surface, the Windows Media Foundation's IMFPMediaPlayer (ie MFPCreateMediaPlayer() and IMFPMediaPlayer::CreateMediaItemFromURL()) seem like a good match, except that the player decodes straight to the app's HWND. The documentation implies that I can add a custom video sink, but I have not been able to find documentation nor sample code on how to do that. Please point me in the right direction.
Currently, I am using libVLC to accomplish the above, but it only provides the video surface in CPU memory, which can become a bottleneck for my use-case.
Thanks.
Take a look at this source code from my project 'Stackoverflow' : MFVideoEVR
This program shows how to setup EVR (enhanced video renderer), and how to provide video samples to it, using a Source Reader.
The key is to provide video samples, so you can use them for your purpose.
This program provides samples through IMFVideoSampleAllocator. It is for DirectX9 texture. You need to change source code, and to use IMFVideoSampleAllocatorEx, instead : IMFVideoSampleAllocatorEx
About MFCreateVideoSampleAllocatorEx :
This function creates an allocator for DXGI video surfaces. The buffers created by this allocator expose the IMFDXGIBuffer interface.
So to retreive texture : IMFDXGIBuffer::GetResource
You can use this method to get a pointer to the ID3D11Texture2D interface of the surface. If the buffer is locked, the method returns MF_E_INVALIDREQUEST.
You will also have to manage sound through IMFSourceReader.
With this approach, there is no copy back to system memory.
PS : You don't talk about video format (h265, h264, mpeg2, others ??). MediaFoundation doesn't handle all video format, natively.

DXGI Desktop Duplication: encoding frames to send them over the network

I'm trying to write an app which will capture a video stream of the screen and send it to a remote client. I've found out that the best way to capture a screen on Windows is to use DXGI Desktop Duplication API (available since Windows 8). Microsoft provides a neat sample which streams duplicated frames to screen. Now, I've been wondering what is the easiest, but still relatively fast way to encode those frames and send them over the network.
The frames come from AcquireNextFrame with a surface that contains the desktop bitmap and metadata which contains dirty and move regions that were updated. From here, I have a couple of options:
Extract a bitmap from a DirectX surface and then use an external library like ffmpeg to encode series of bitmaps to H.264 and send it over RTSP. While straightforward, I fear that this method will be too slow as it isn't taking advantage of any native Windows methods. Converting D3D texture to a ffmpeg-compatible bitmap seems like unnecessary work.
From this answer: convert D3D texture to IMFSample and use MediaFoundation's SinkWriter to encode the frame. I found this tutorial of video encoding, but I haven't yet found a way to immediately get the encoded frame and send it instead of dumping all of them to a video file.
Since I haven't done anything like this before, I'm asking if I'm moving in the right direction. In the end, I want to have a simple, preferably low latency desktop capture video stream, which I can view from a remote device.
Also, I'm wondering if I can make use of dirty and move regions provided by Desktop Duplication. Instead of encoding the frame, I can send them over the network and do the processing on the client side, but this means that my client has to have DirectX 11.1 or higher available, which is impossible if I would want to stream to a mobile platform.
You can use IMFTransform interface for H264 encoding. Once you get IMFSample from ID3D11Texture2D just pass it to IMFTransform::ProcessInput and get the encoded IMFSample from IMFTransform::ProcessOutput.
Refer this example for encoding details.
Once you get the encoded IMFSamples you can send them one by one over the network.

Writing variable framerate videos in openCV

The steps I follow for writing a video file in openCV are as follows:
CvVideoWriter *writer =cvCreateVideoWriter(fileName, Codec ID, frameRate, frameSize); // Create Video Writer
cvWriteFrame(writer, frame); // Write frame
cvReleaseVideoWriter(&writer); // Release video writer
The above code snippet writes at a fixed frame rate. I need to write out variable frame rate videos. The approach I had used earlier with libx264 involved writing individual timestamps to each frame.
So, the question is how do I write timestamps to a frame in openCV - what is the specific API ? More generally, how do I create variable frame rate videos ?
I don't think it is possible to do this with OpenCV directly without modifying the code to give access under the hood. You would need to use a different library like libvlc to do so using the imem to get your raw RGB frames in OpenCV into a file. This link provides an example using imem with raw images loaded from OpenCV. You would just need to change the :sout options to save to the file you want using your preferred codec.

Whole screen capture and render in DirectX [PERFORMANCE]

I need some way to get screen data and pass them to DX9 surface/texture in my aplication and render it at at least 25fps at 1600*900 resolution, 30 would be better.
I tried BitBliting but even after that I am at 20fps and after loading data into texture and rendering it I am at 11fps which is far behind what I need.
GetFrontBufferData is out of question.
Here is something about using Windows Media API, but I am not familiar with it. Sample is saving data right into file, maybe it can be set up to give you individual frames, but I haven't found good enough documentation to try it on my own.
My code:
m_memDC.BitBlt(0, 0, m_Rect.Width(),m_Rect.Height(), //m_Rect is area to be captured
&m_dc, m_Rect.left, m_Rect.top, SRCCOPY);
//at 20-25fps after this if I comment out the rest
//DC,HBITMAP setup and memory alloc is done once at the begining
GetDIBits( m_hDc, (HBITMAP)m_hBmp.GetSafeHandle(),
0L, // Start scan line
(DWORD)m_Rect.Height(), // # of scan lines
m_lpData, // LPBYTE
(LPBITMAPINFO)m_bi, // address of bitmapinfo
(DWORD)DIB_RGB_COLORS); // Use RGB for color table
//at 17-20fps
IDirect3DSurface9 *tmp;
m_pImageBuffer[0]->GetSurfaceLevel(0,&tmp); //m_pImageBuffer is Texture of same
//size as bitmap to prevent stretching
hr= D3DXLoadSurfaceFromMemory(tmp,NULL,NULL,
(LPVOID)m_lpData,
D3DFMT_X8R8G8B8,
m_Rect.Width()*4,
NULL,
&r, //SetRect(&r,0,0,m_Rect.Width(),m_Rect.Height();
D3DX_DEFAULT,0);
//12-14fps
IDirect3DSurface9 *frameS;
hr=m_pFrameTexture->GetSurfaceLevel(0,&frameS); // Texture of that is rendered
pd3dDevice->StretchRect(tmp,NULL,frameS,NULL,D3DTEXF_NONE);
//11fps
I found out that for 512*512 square its running on 30fps (for i.e. 490*450 at 20-25) so I tried dividing screen, but it didn't seem to work well.
If there is something missing in code please write, don't vote down. Thanks
Starting with Windows 8, there is a new desktop duplication API that can be used to capture the screen in video memory, including mouse cursor changes and which parts of the screen actually changed or moved. This is far more performant than any of the GDI or D3D9 approaches out there and is really well-suited to doing things like encoding the desktop to a video stream, since you never have to pull the texture out of GPU memory. The new API is available by enumerating DXGI outputs and calling DuplicateOutput on the screen you want to capture. Then you can enter a loop that waits for the screen to update and acquires each frame in turn.
To encode the frames to a video, I'd recommend taking a look at Media Foundation. Take a look specifically at the Sink Writer for the simplest method of encoding the video frames. Basically, you just have to wrap the D3D textures you get for each video frame into IMFSample objects. These can be passed directly into the sink writer. See the MFCreateDXGISurfaceBuffer and MFCreateVideoSampleFromSurface functions for more information. For the best performance, typically you'll want to use a codec like H.264 that has good hardware encoding support (on most machines).
For full disclosure, I work on the team that owns the desktop duplication API at Microsoft, and I've personally written apps that capture the desktop (and video, games, etc.) to a video file at 60fps using this technique, as well as a lot of other scenarios. This is also used to do screen streaming, remote assistance, and lots more within Microsoft.
If you don't like the FrontBuffer, try the BackBuffer:
LPDIRECT3DSURFACE9 surface;
surface = GetBackBufferImageSurface(&fmt);
to save it to a file use
D3DXSaveSurfaceToFile(filename, D3DXIFF_JPG, surface, NULL, NULL);