Using DirectShow with Direct2D - c++

I have a windows only Direct2D application and would like to implement a video playback system for cutscenes. These files are mp4 but the format can be changed, if need be.
It seems like DirectShow is the advised way to render video/audio on windows.
Now how do I let DirectShow render the video frames to my Direct2D render target?
The VMR-9 filter looks like the best route, but I can't seem to find an elegant way of integrating it into my application

There is no Direct2D/DirectShow interoperability layer in Windows. To fit these two technologies you would have to copy data between the APIs in a rather inefficient way (and this will still take some time to develop the fitting).
With H.264/HEVC MP4 video files you would be better off using Media Foundation to read and decode frames, then load them into Direct2D bitmaps and display in your application. Performance wise it is possible to transfer video frames to Direct2D bitmaps via GPU at reasonable cost and with reasonable development effort, but even if you make a shortcut and do integration roughly and inefficiently it will be on par with DirectShow.
I recommend to start with looking at reading and decoding video frames with Media Foundation Source Reader API. Once you get familiar with fitting the technologies, you will take next step and optimize the transfer by using GPU capacity and interop between Direct3D and Direct2D.

Related

Media Foundation: Custom Topology with Direct3D 11

I am having to build a video topology manually, which includes using loading and configuring the mpeg2videoextension (decoder). Otherwise the default topoloader fails to resolve the video stream automatically. I am using the default topology loader to resolve the rest of the topology.
Since I am loading the decoder manually, the docs say that I am responsible to get the decoder the hardware acceleration manager. (This decoder is D3D11 Aware). If I create a DXGI device, then create manager in code, I can pass the manager to the decoder, and it seems to work.
The docs also say, however that "In a Media Session scenario, the video renderer creates the Direct3D 11 device."
If this is the case, how do I get a handle to that device? I assume I should be using that device in the device manager to pass into the decoder.
I'm going around in circles. All of the sample code uses IDirect3DDeviceManager9. I am unable to get those samples to work. So I decided to use 11. But I can't find any sample code that uses 11.
Can someone point me in the right direction?
Microsoft does not give a good solution for this challenge. Indeed, standard video renderer for Media Foundation is EVR and it is "aware" of Direct3D 9 only. So you cannot combine it with the decoder using common DXGI device manager. Newer Microsoft applications use a different Direct3D 11 aware renderer, which is not published as an API: you can take advantage of these rendering services as a part of wrapping APIs such as UWP or HTML5 media element playing video. MPEG-2 decoder extension targets primarily these scanarios leaving you with a problem if you are plugging this into older Media Foundation topologies.
I can think of a few solutions to this problems, none of which sound exactly perfect:
Stop using EVR and use DX11VideoRenderer instead: Microsoft gives a starting point with this sample and you are own your own to establish required wiring to share DXGI device manager.
Use multiple Direct3D devices and transfer video frames between the two; there should be graphics API interop to help transfer in efficient way, but overall this looks a sort of stupid work as of 2020 even though doable. This path looks more or less acceptable if you can accept performance hit from transfer through system memory, which makes things a tad easier to implement.
Stop using MPEG-2 decoder extension and implement your own decoder on top of lower level DXVA2 API and implement hardware assisted decoder without fallback to software, in which case you have more control over using GPU services and fitting to renderer's device.

UWP Hardware Video Decoding - DirectX12 vs Media Foundation

I would like to use DirectX 12 to load each frame of an H264 file into a texture and render it. There is however little to no information on doing this, and the Microsoft website has limited superficial documentation.
Media Foundation has plenty of examples and offers Hardware Enabled decoding. Is the Media Foundation a wrapper around DirectX or is it doing something else?
If not, how much less optimised would the Media Foundation equivalent be in comparison to a DX 12 approach?
Essentially, what are the big differences between Media Foundation and DirectX12 Video Decoding?
I am already using DirectX 12 in my engine so this is specifically regarding DX12.
Thanks in advance.
Hardware video decoding comes from DXVA (DXVA2) API. It's DirectX 11 evolution is D3D11 Video Device part of D3D11 API. Microsoft provides wrappers over hardware accelerated decoders in the format of Media Foundation API primitives, such as H.264 Video Decoder. This decoder is offering use of hardware decoding capabilities as well as fallback to software decoding scenario.
Note that even though Media Foundation is available for UWP development, your options are limited and you are not offered primitives like mentioned transform directly. However if you use higher level APIs (Media Foundation Source Reader API in particular) you can leverage hardware accelerated video decoding in your UWP application.
Media Foundation implementation offers interoperability with Direct3D 11, in the part of video encoding/decoding in particular, but not Direct3D 12. You will not be able to use Media Foundation and DirectX 12 together out of the box. You will either have to implement Direct3D 11/12 interop to transfer the data between the APIs (or, where applicable, use shared access to the same GPU data).
Or alternatively you will have to step down to underlying ID3D12VideoDevice::CreateVideoDecoder which is further evolution of mentioned DXVA2 and Direct3D 11 video decoding APIs with similar usage.
Unfortunately if Media Foundation is notoriously known for poor documentation and hard-to-start development, Direct3D 12 video decoding has zero information and you will have to enjoy a feeling of a pioneer.
Either way all the mentioned are relatively thin wrappers over hardware assisted video decoding implementation with the same great performance. I would recommend taking Media Foundation path and implement 11/12 interop if/when it becomes necessary.
You will get a lot of D3D12 errors caused by Media Foundation if you pass a D3D12 device to IMFDXGIDeviceManager::ResetDevice.
The errors could be avoided if you call IMFSourceReader::ReadSample slowly. It doesn't matter that you adopt sync or async mode to use this method. And, how slowly it should be depends on the machine that runs the program. I use ::Sleep(1) between ReadSample calls for sync mode playing a stream from network, and ::Sleep(3) for sync mode playing a local mp4 file on my machine.
Don't ask who I am. My name is 'the pioneer'.

How to decode a video file straight to a Direct3D11 texture using Windows Media Foundation?

I'd like to decode the contents of a video file to a Direct3D11 texture and avoid the copies back and forth to CPU memory. Ideally, the library will play the audio itself and call back into my code whenever a video frame has been decoded.
On the surface, the Windows Media Foundation's IMFPMediaPlayer (ie MFPCreateMediaPlayer() and IMFPMediaPlayer::CreateMediaItemFromURL()) seem like a good match, except that the player decodes straight to the app's HWND. The documentation implies that I can add a custom video sink, but I have not been able to find documentation nor sample code on how to do that. Please point me in the right direction.
Currently, I am using libVLC to accomplish the above, but it only provides the video surface in CPU memory, which can become a bottleneck for my use-case.
Thanks.
Take a look at this source code from my project 'Stackoverflow' : MFVideoEVR
This program shows how to setup EVR (enhanced video renderer), and how to provide video samples to it, using a Source Reader.
The key is to provide video samples, so you can use them for your purpose.
This program provides samples through IMFVideoSampleAllocator. It is for DirectX9 texture. You need to change source code, and to use IMFVideoSampleAllocatorEx, instead : IMFVideoSampleAllocatorEx
About MFCreateVideoSampleAllocatorEx :
This function creates an allocator for DXGI video surfaces. The buffers created by this allocator expose the IMFDXGIBuffer interface.
So to retreive texture : IMFDXGIBuffer::GetResource
You can use this method to get a pointer to the ID3D11Texture2D interface of the surface. If the buffer is locked, the method returns MF_E_INVALIDREQUEST.
You will also have to manage sound through IMFSourceReader.
With this approach, there is no copy back to system memory.
PS : You don't talk about video format (h265, h264, mpeg2, others ??). MediaFoundation doesn't handle all video format, natively.

Rendering to video file by SFML

I have written a program using SFML Library (in C++) rendering simple 2D animation.
I would like to save the animation to a video file instead of drawing it on the screen.
Does SFML provide such functionality? Is there any other, portable way to do this? (portable between different OSes)
SFML does not have such a feature, especially since video processing is a whole world of its own. You can take a look at FFmpeg and GStreamer. Both libraries are cross-platform and should be able to record, playback and stream videos. If you want a specific codec, you could directly look at the codec's website and/or search for good encoder.
Overall it's not an easy task and depending on what you're trying to do, you could also think about grabbing the rendering directly with an third-party application, e.g. Open Broadcaster Software or (again) FFmpeg.

Displaying a video in DirectX

What is the best/easiest way to display a video (with sound!) in an application using XAudio2 and Direct3D9/10?
At the very least it needs to be able to stream potentially larger videos, and take care of the fact that the windows aspect ratio may differ from the videos (eg by adding letter boxes), although ideally Id like the ability to embed the video into a 3D scene.
I could of course work out a way to load each frame into a texture, discarding/reusing the textures once rendered, and playing the audio separately through XAudio2, however as well as writing a loader for at least one format, ive also got to deal with stuff like synchronising the video and audio components, so hopefully there is an eaier solution available or even a ready made free one with a suitable lisence (commercial distribution in binary form, dynamic linking is fine in the case of say LGPL).
In Windows SDK, there is a DirectShow example for rendering video to texture. It handles audio output too.
But there are limitations and I can't honestly call it easy.
Have you looked at Bink video? Its what lots of games use for video playback. Works great and you don't have to code all that video stuff yourself from scratch.