How to use Intel Hardware MJPEG Decoder MFT in MediaFoundation SourceReader for Window Desktop application? - c++

I'm developing USB camera streaming Desktop application using MediaFoundation SourceReader technique. The camera is having USB3.0 support and gives 60fps for 1080p MJPG video format resolution.
I used Software MJPEG Decoder MFT to convert MJPG to YUY2 frames and then converted into the RGB32 frame to draw on the window. Instead of 60fps, I'm able to render only 30fps on the window when using this software decoder. I have posted a question on this site and got some suggestion to use Intel Hardware MJPEG Decoder MFT to solve frame drop issue.
I have faced an error 0xC00D36B5 - MF_E_NOTACCEPTING when calling IMFTransform::ProcessInput() method. To solve this error, MSDN suggested using IMFTranform interface asynchronously. So, I used IMFMediaEventGenerator interface to GetEvent for every In/Out sample. Successfully, I can process only one input sample and then continuously IMFMediaEventGenerator:: GetEvent() methods returns MF_E_NO_EVENTS_AVAILABLE error(GetEvent() is synchronous).
I have tried to configure an asynchronous callback for SourceReader as well as IMFTransform but MFAsyncCallback:: Invoke method is not invoking, hence I planned to use GetEvent method.
Am I missing anything?If Yes, Someone guides me to use Intel Hardware Decoder into my project?

Intel Hardware MJPEG Decoder MFT is an asynchronous MFT and if you are managing it directly, you are responsible to apply asynchronous model. You seem to be doing this but you don't provide information that allows nailing the problem down. Yes, you are supposed to use event model described in ProcessInput, ProcessOutput sections of the article linked above. As you get the first frame, you should debug further to make it work with smooth continuous processing.
When you use APIs like media session our source reader, you have Media Foundation itself dealing with the MFTs. It is capable of doing synchronous and asynchronous consumption when appropriate. In this case, however, you don't do IMFTransform calls and even from your vague description it comes you are doing it wrong way.

Related

Media Foundation: Custom Topology with Direct3D 11

I am having to build a video topology manually, which includes using loading and configuring the mpeg2videoextension (decoder). Otherwise the default topoloader fails to resolve the video stream automatically. I am using the default topology loader to resolve the rest of the topology.
Since I am loading the decoder manually, the docs say that I am responsible to get the decoder the hardware acceleration manager. (This decoder is D3D11 Aware). If I create a DXGI device, then create manager in code, I can pass the manager to the decoder, and it seems to work.
The docs also say, however that "In a Media Session scenario, the video renderer creates the Direct3D 11 device."
If this is the case, how do I get a handle to that device? I assume I should be using that device in the device manager to pass into the decoder.
I'm going around in circles. All of the sample code uses IDirect3DDeviceManager9. I am unable to get those samples to work. So I decided to use 11. But I can't find any sample code that uses 11.
Can someone point me in the right direction?
Microsoft does not give a good solution for this challenge. Indeed, standard video renderer for Media Foundation is EVR and it is "aware" of Direct3D 9 only. So you cannot combine it with the decoder using common DXGI device manager. Newer Microsoft applications use a different Direct3D 11 aware renderer, which is not published as an API: you can take advantage of these rendering services as a part of wrapping APIs such as UWP or HTML5 media element playing video. MPEG-2 decoder extension targets primarily these scanarios leaving you with a problem if you are plugging this into older Media Foundation topologies.
I can think of a few solutions to this problems, none of which sound exactly perfect:
Stop using EVR and use DX11VideoRenderer instead: Microsoft gives a starting point with this sample and you are own your own to establish required wiring to share DXGI device manager.
Use multiple Direct3D devices and transfer video frames between the two; there should be graphics API interop to help transfer in efficient way, but overall this looks a sort of stupid work as of 2020 even though doable. This path looks more or less acceptable if you can accept performance hit from transfer through system memory, which makes things a tad easier to implement.
Stop using MPEG-2 decoder extension and implement your own decoder on top of lower level DXVA2 API and implement hardware assisted decoder without fallback to software, in which case you have more control over using GPU services and fitting to renderer's device.

Using DirectShow with Direct2D

I have a windows only Direct2D application and would like to implement a video playback system for cutscenes. These files are mp4 but the format can be changed, if need be.
It seems like DirectShow is the advised way to render video/audio on windows.
Now how do I let DirectShow render the video frames to my Direct2D render target?
The VMR-9 filter looks like the best route, but I can't seem to find an elegant way of integrating it into my application
There is no Direct2D/DirectShow interoperability layer in Windows. To fit these two technologies you would have to copy data between the APIs in a rather inefficient way (and this will still take some time to develop the fitting).
With H.264/HEVC MP4 video files you would be better off using Media Foundation to read and decode frames, then load them into Direct2D bitmaps and display in your application. Performance wise it is possible to transfer video frames to Direct2D bitmaps via GPU at reasonable cost and with reasonable development effort, but even if you make a shortcut and do integration roughly and inefficiently it will be on par with DirectShow.
I recommend to start with looking at reading and decoding video frames with Media Foundation Source Reader API. Once you get familiar with fitting the technologies, you will take next step and optimize the transfer by using GPU capacity and interop between Direct3D and Direct2D.

How to decode a video file straight to a Direct3D11 texture using Windows Media Foundation?

I'd like to decode the contents of a video file to a Direct3D11 texture and avoid the copies back and forth to CPU memory. Ideally, the library will play the audio itself and call back into my code whenever a video frame has been decoded.
On the surface, the Windows Media Foundation's IMFPMediaPlayer (ie MFPCreateMediaPlayer() and IMFPMediaPlayer::CreateMediaItemFromURL()) seem like a good match, except that the player decodes straight to the app's HWND. The documentation implies that I can add a custom video sink, but I have not been able to find documentation nor sample code on how to do that. Please point me in the right direction.
Currently, I am using libVLC to accomplish the above, but it only provides the video surface in CPU memory, which can become a bottleneck for my use-case.
Thanks.
Take a look at this source code from my project 'Stackoverflow' : MFVideoEVR
This program shows how to setup EVR (enhanced video renderer), and how to provide video samples to it, using a Source Reader.
The key is to provide video samples, so you can use them for your purpose.
This program provides samples through IMFVideoSampleAllocator. It is for DirectX9 texture. You need to change source code, and to use IMFVideoSampleAllocatorEx, instead : IMFVideoSampleAllocatorEx
About MFCreateVideoSampleAllocatorEx :
This function creates an allocator for DXGI video surfaces. The buffers created by this allocator expose the IMFDXGIBuffer interface.
So to retreive texture : IMFDXGIBuffer::GetResource
You can use this method to get a pointer to the ID3D11Texture2D interface of the surface. If the buffer is locked, the method returns MF_E_INVALIDREQUEST.
You will also have to manage sound through IMFSourceReader.
With this approach, there is no copy back to system memory.
PS : You don't talk about video format (h265, h264, mpeg2, others ??). MediaFoundation doesn't handle all video format, natively.

DXGI Desktop Duplication: encoding frames to send them over the network

I'm trying to write an app which will capture a video stream of the screen and send it to a remote client. I've found out that the best way to capture a screen on Windows is to use DXGI Desktop Duplication API (available since Windows 8). Microsoft provides a neat sample which streams duplicated frames to screen. Now, I've been wondering what is the easiest, but still relatively fast way to encode those frames and send them over the network.
The frames come from AcquireNextFrame with a surface that contains the desktop bitmap and metadata which contains dirty and move regions that were updated. From here, I have a couple of options:
Extract a bitmap from a DirectX surface and then use an external library like ffmpeg to encode series of bitmaps to H.264 and send it over RTSP. While straightforward, I fear that this method will be too slow as it isn't taking advantage of any native Windows methods. Converting D3D texture to a ffmpeg-compatible bitmap seems like unnecessary work.
From this answer: convert D3D texture to IMFSample and use MediaFoundation's SinkWriter to encode the frame. I found this tutorial of video encoding, but I haven't yet found a way to immediately get the encoded frame and send it instead of dumping all of them to a video file.
Since I haven't done anything like this before, I'm asking if I'm moving in the right direction. In the end, I want to have a simple, preferably low latency desktop capture video stream, which I can view from a remote device.
Also, I'm wondering if I can make use of dirty and move regions provided by Desktop Duplication. Instead of encoding the frame, I can send them over the network and do the processing on the client side, but this means that my client has to have DirectX 11.1 or higher available, which is impossible if I would want to stream to a mobile platform.
You can use IMFTransform interface for H264 encoding. Once you get IMFSample from ID3D11Texture2D just pass it to IMFTransform::ProcessInput and get the encoded IMFSample from IMFTransform::ProcessOutput.
Refer this example for encoding details.
Once you get the encoded IMFSamples you can send them one by one over the network.

Read frame by request with DirectShow

I'm trying to use DirectShow to capture video from webcam. I assume to use SampleGabber class. For now I see that DirectShow can only read frames continiously with some desired fps. Can DirectShow read frames by request?
DirectShow pipeline sets up streaming video. Frames will continuously stream through Sample Grabber and its callback, if you set it up. The callback itself adds minimal processing overhead if you don't force format change (to force video to be RGB in particular). It is up to whether to process or skip a frame there.
On request grabbing will be taking either last known video frame streamed, or next to go through Sample Grabber. This is typical mode of operation.
Some devices offer additional feature of taking a still on request. This is a rarer case and it's described on MSDN here: Capturing an Image From a Still Image Pin:
Some cameras can produce a still image separate from the capture
stream, and often the still image is of higher quality than the images
produced by the capture stream. The camera may have a button that acts
as a hardware trigger, or it may support software triggering. A camera
that supports still images will expose a still image pin, which is pin
category PIN_CATEGORY_STILL.
The recommended way to get still images from the device is to use the
Windows Image Acquisition (WIA) APIs. [...]
To trigger the still pin, use [...]