Using Media Foundation to encode Direct X surfaces - c++

I'm trying to use the MediaFoundation API to encode a video but I'm having problems pushing the samples to the SinkWriter.
I'm getting the frames to encode through the Desktop Duplication API. What I end up with is an ID3D11Texture2D with the desktop image in it.
I'm trying to create an IMFVideoSample containing this surface and then push that video sample to a SinkWriter.
I've tried going about this in different ways:
I called MFCreateVideoSampleFromSurface(texture, &pSample) where texture is the ID3D11Texture2D, filled in the SampleTime and SampleDuration and then passed the created sample to the SinkWriter.
SinkWriter returned E_INVALIDARG.
I tried creating the sample by passing nullptr as the first argument and creating the buffer myself using MFCreateDXGISurfaceBuffer, and then passing the resulting buffer into the Sample.
That didn't work either.
I read through the MediaFoundation documentation and couldn't find detailed information on how to create the sample out of a DirectX texture.
I ran out of things to try.
Has anyone out there used this API before and can think of things I should check, or of any way on how I can go about debugging this?

First of all you should learn to use mftrace tool.
Very likely, it will tell you the problem right away.
But my guess is, following problems are likely.
Probably, some other attributes are required besides SampleTime / SampleDuration.
Probably, SinkWriter needs a texture it can read on CPU. To fix that, when a frame is available, create a staging texture of the same format + size, call CopyResource to copy desktop to staging texture, then pass that staging texture to MF.
Even if you use a hardware encoder so the CPU never tries to read the texture data, I don’t think it’s a good idea to directly pass your desktop texture to MF.
When you set a D3D texture for sample, no data is copied anywhere, the sample merely retains the texture.
MF works asynchronously, it may buffer several samples in its topology nodes if they want to.
DD gives you data synchronously, you may only access the texture between AcquireNextFrame and ReleaseFrame calls.

Related

How to decode a video file straight to a Direct3D11 texture using Windows Media Foundation?

I'd like to decode the contents of a video file to a Direct3D11 texture and avoid the copies back and forth to CPU memory. Ideally, the library will play the audio itself and call back into my code whenever a video frame has been decoded.
On the surface, the Windows Media Foundation's IMFPMediaPlayer (ie MFPCreateMediaPlayer() and IMFPMediaPlayer::CreateMediaItemFromURL()) seem like a good match, except that the player decodes straight to the app's HWND. The documentation implies that I can add a custom video sink, but I have not been able to find documentation nor sample code on how to do that. Please point me in the right direction.
Currently, I am using libVLC to accomplish the above, but it only provides the video surface in CPU memory, which can become a bottleneck for my use-case.
Thanks.
Take a look at this source code from my project 'Stackoverflow' : MFVideoEVR
This program shows how to setup EVR (enhanced video renderer), and how to provide video samples to it, using a Source Reader.
The key is to provide video samples, so you can use them for your purpose.
This program provides samples through IMFVideoSampleAllocator. It is for DirectX9 texture. You need to change source code, and to use IMFVideoSampleAllocatorEx, instead : IMFVideoSampleAllocatorEx
About MFCreateVideoSampleAllocatorEx :
This function creates an allocator for DXGI video surfaces. The buffers created by this allocator expose the IMFDXGIBuffer interface.
So to retreive texture : IMFDXGIBuffer::GetResource
You can use this method to get a pointer to the ID3D11Texture2D interface of the surface. If the buffer is locked, the method returns MF_E_INVALIDREQUEST.
You will also have to manage sound through IMFSourceReader.
With this approach, there is no copy back to system memory.
PS : You don't talk about video format (h265, h264, mpeg2, others ??). MediaFoundation doesn't handle all video format, natively.

OpenGL texture upload using PBOs?

I'm developing an OpenGL application using OpenGL2.1 and want to upload textures via threads.
What I have done so far:
Create a second context and share between the two
Upload texture data in a thread
Everything is working fine, except that I notice a small "lag" when the texture upload happens! I know this is because the driver have to synchronize the two contexts. The problem is that I want it to stream the texture. I don't want to update the texture later. I just want to load textures in the background while displaying an "almost smooth" loading animation without stalling the whole application.
That's the point I searched and found that PBOs can be used for DMA data transfer of pixel data. Is it possible to use a PBO for texture upload? If so, how?
You don't need a second context to upload the texture data async. Just make sure you don't use the buffer right after triggering the upload, or it will stall waiting for the copy to finish.
Here's an example of this process: http://www.songho.ca/opengl/gl_pbo.html#unpack
And here's a bit more info about what PBOs are and how they should be used: http://www.opengl.org/wiki/Pixel_Buffer_Object

AVFoundation on OSX: OpenGL texture from video WITHOUT needing access to pixel data

I've read a lot of posts describing how people use AVAssetReader or AVPlayerItemVideoOutput to get video frames as raw pixel data from a video file, which they then use to upload to an OpenGL texture. However, this seems to create the needless step of decoding the video frames with the CPU (as opposed to the graphics card), as well as creating unnecessary copies of the pixel data.
Is there a way to let AVFoundation own all aspects of the video playback process, but somehow also provide access to an OpenGL texture ID it created, which can just be drawn into an OpenGL context as necessary? Has anyone come across anything like this?
In other words, something like this pseudo code:
initialization:
open movie file, providing an opengl context;
get opengl texture id;
every opengl loop:
draw texture id;
If you were to use the Video Decode Acceleration Framework on OS X, it will give you a CVImageBufferRef when you "display" decoded frames, which you can call CVOpenGLTextureGetName (...) on to use as a native texture handle in OpenGL software.
This of course is lower level than your question, but it is definitely possible for certain video formats. This is the only technique that I have personal experience with. However, I believe QTMovie also has similar functionality at a much higher level, and would likely provide the full range of features you are looking for.
I wish I could comment on AVFoundation, but I have not done any development work on OS X since 10.6. I imagine the process ought to be similar though, it should be layered on top of CoreVideo.

SDL 1.3: How to render video with out displaying it?

So what I need is simple: Imagine we have no gui at all - ssh access to some linux where we gonna build and host our app. That app would generate video stream. We have some SDL app with OpenGL shader in it. All we want is to get rendering (as normally we would have in SDL window) as a char* (with size W*H*3) How to do such thing? How to make SDL render stuff not onto its gui window but into some swappable pointer?
To be of any use, OpenGL should be hardware accelerated, so first check if your server does have a GPU that meets your requirements. If you're on a rented virtual server or some standard root server, then you very likely don't have a GPU.
If you have a GPU, then there are two possible methods:
Method 1 -- the easy one
You'll (unfortunately) have to configure and start the X server for it and this X server must also be the current virtual terminal (i.e. it must be the active thing on the graphics card). Then you give the user who'll be running that video generator access to that X display (read man xauth and what it references)
The next step is independent of SDL, it's an OpenGL think: Create a Framebuffer Object onto which the desired graphics is rendered; a PBuffer would work as well, and actually I'd prefer it in this situation, however I found Framebuffer Objects be more reliable than PBuffers on current Linux and its drivers.
Then render to this Framebuffer Object or PBuffer as usual and retrieve the content using glReadPixels
Method 2 -- the flexible one
On the low level this is quite similar to Method 1, but things get abstracted for you: Get VirtualGL http://www.virtualgl.org/ to perform the actual OpenGL rendering on the GPU. Instead of starting your application on a secondary X server you make direct use of the VirtualGL server provided sending the GLX stream and get a JPEG image stream back. You could also use a secondary X server running a virtual framebuffer and take a continous screencapture of that. Or probably most elegant: Write your own X.Org video driver that passes the video to the video streamer directly.
You cannot directly render to a byte array in OpenGL.
There are two ways to work with this. The first way is the simplest and doesn't require context gimmickery, and the second way does.
So first, the simple way.
In order for OpenGL to work, you need to have a window. That doesn't mean the window needs to be visible, but you need to create one to get a valid OpenGL context. Therefore Step 1: Create a window and minimize it.
Now, in order to get valid rendering, the pixels in the framebuffer must pass the "pixel ownership test." When rendering to the framebuffer that holds the screen itself, pixels of the window that are not actually visible on screen fail the pixel ownership test. So the values of those pixels are undefined if you use glReadPixels.
However, this only pertains to the default framebuffer that is associated with the window. Framebuffer objects always pass the pixel ownership test. Therefore, Step 2: Create a framebuffer object and the associated renderbuffers for your needs.
From there, it's pretty simple. Just render as normal and do a glReadPixels when you want to get the data. Pixel buffer objects can be used to asynchronous transfer pixel data, if performance is a concern. Step 3: Render and use glReadPixels to get the data.
The second way is more widely available (FBOs require extension support or OpenGL 3.0), but more platform-specific.
Instead of creating an FBO in step 2, you instead have Step 2: use glXCreatePbuffer to create a pbuffer. A pbuffer is an off-screen render target that acts like the default framebuffer. You glXMakeContextCurrent to tell OpenGL to render to the pbuffer instead of the default framebuffer.
Steps 1 and 3 are the same as above.

GPU memory allocation for video

Is it possible to allocate some memory on the GPU without cuda?
i'm adding some more details...
i need to get the video frame decoded from VLC and have some compositing functions on the video; I'm doing so using the new SDL rendering capabilities.
All works fine until i have to send the decoded data to the sdl texture... that part of code is handled by standard malloc which is slow for video operations.
Right now i'm not even sure that using gpu video will actually help me
Let's be clear: are you are trying to accomplish real time video processing? Since your latest update changed the problem considerably, I'm adding another answer.
The "slowness" you are experiencing could be due to several reasons. In order get the "real-time" effect (in the perceptual sense), you must be able to process the frame and display it withing 33ms (approximately, for a 30fps video). This means you must decode the frame, run the compositing functions (as you call) on it, and display it on the screen within this time frame.
If the compositing functions are too CPU intensive, then you might consider writing a GPU program to speed up this task. But the first thing you should do is determine where the bottleneck of your application is exactly. You could strip your application momentarily to let it decode the frames and display them on the screen (do not execute the compositing functions), just to see how it goes. If its slow, then the decoding process could be using too much CPU/RAM resources (maybe a bug on your side?).
I have used FFMPEG and SDL for a similar project once and I was very happy with the result. This tutorial shows to do a basic video player using both libraries. Basically, it opens a video file, decodes the frames and renders them on a surface for displaying.
You can do this via Direct3D 11 Compute Shaders or OpenCL. These are similar in spirit to CUDA.
Yes, it is. You can allocate memory in the GPU through OpenGL textures.
Only indirectly through a graphics framework.
You can use OpenGL which is supported by virtually every computer.
You could use a vertex buffer to store your data. Vertex buffers are usually used to store points for rendering, but you can easily use it to store an array of any kind. Unlike textures, their capacity is only limited by the amount of graphics memory available.
http://www.songho.ca/opengl/gl_vbo.html has a good tutorial on how to read and write data to vertex buffers, you can ignore everything about drawing the vertex buffer.