Does DirectX11 Have Native Support for Rendering to a Video File? - c++

I'm working on a project that needs to write several minutes of DX11 swapchain output to a video file (of any format). I've found lots of resources for writing a completed frame to a texture file with DX11, but the only thing I found relating to a video render output is using FFMPEG to stream the rendered frame, which uses an encoding pattern that doesn't fit my render pipeline and discards the frame immediately after streaming it.
I'm unsure what code I could post that would help answer this, but it might help to know that in this scenario I have a composite Shader Resource View + Render Target View that contains all of the data (in RGBA format) that would be needed for the frame presented to the screen. Currently, it is presented to the screen as a window, but I need to also provide a method to encode the frame (and thousands of subsequent frames) into a video file. I'm using Vertex, Pixel, and Compute shaders in my rendering pipeline.

Found the answer thanks to a friend offline and Simon Mourier's reply! Check out this guide for a nice tutorial on using the Media Foundation API and the Media Sink to encode a data buffer to a video file:
https://learn.microsoft.com/en-us/windows/win32/medfound/tutorial--using-the-sink-writer-to-encode-video
Other docs in the same section describe useful info like the different encoding types and what input they need.
In my case, the best way to go about rendering my composite RTV to a video file was creating a CPU-Accessible buffer, copying the composite resource to it, then accessing the CPU buffer as an array of pixel colors, which media sink understands.

Related

Is there a direct way to render/encode Vulkan output as an ffmpeg video file?

I'm about to generate 2D and 3D music animations and render them to video using C++. I was thinking about using OpenGL, but I've read that, unfortunately, it is being discontinued in favour of Vulkan, which seems to offer higher performance using a GPU, but is also a lower-level API, making it more difficult to learn. I still have almost no knowledge in both OpenGL and Vulkan, beginning to learn now.
My question is:
is there a way to encode the Vulkan render output (showing a window or not) into a video file, preferentially through FFPMEG? If so, how could I do that?
Requisites:
Speed: the decrease in performance should be nearly that of encoding the video only, not much more than that (e.g. by having to save lossless frames as images first and then encoding a video from them).
Controllable FPS and resolution: the video fps and frame resolution can be freely chosen.
Reliability, reproducibility: running a code that gives a same Vulkan output twice should result in 2 equal videos independently of the system, i.e. no dropping frames, async problems (I want to sync with audio) or whatsoever. The chosen video fps should stay fixed (e.g. 60 fps), no matter if the computer can render 300 or 3 fps.
What I found out so far:
An example of taking "screenshots" from Vulkan output: it writes to a ppm image at the end, which is a binary uncompressed image file.
An encoder for rendering videos from OpenGL output, which is what I want, but using OpenGL in that case.
That Khronos includes in the Vulkan API a video subset.
A video tool to decode, demux, process videos using FFMPEG and Vulkan.
That is possible to render the output into a buffer without the need of a screen to display it.
First of all, ffmpeg is a framework used for video encoding and decoding. Second, if you have no experience with any of the GPU rendering API you should start with OpenGL. Vulkan is very low-level and complicated. OpenGL will be here for a very long time and will not be immediately replaced with Vulkan.
The off-screen rendering option you mentioned is probably the best one. It doesn't really matter though, you can also use the image from the framebuffer. The image is just a matrix of RGBA pixels. You need these data as the input for the video encoding. Please take a look at how ffmpeg works. You need to send the rendered frame data in the encoder which produces video packets that are stored in a video file. You need to chose a container (mp4, mkv, avi,...) and video format (h265, av1, vp9,...). You can of course implement a frame limiter and render the scene with a constant framerate or just pick the frames that have a constant timestep.
The performance problem happens, when you transfer the data from RAM to GPU memory and vice versa. For example, when downloading the rendered image from the buffer and passing it to the CPU encoder. Therefore, the most optimal approach would be with Vulkan, using the new video extension and directly sending the rendered frames in the HW accelerated encoder without any transfers from the GPU memory. You can also run the encoder in a different thread to make it work asynchronously.
But honestly, it's not trivial. The most simple solution (not realtime) for you to create a video from 3D render would be to:
Create a fixed FPS game loop
Make screenshots of the scene by downloading the framebuffer data in OGL or Vulkan
Process the frames by ffmpeg binary to create a video file
Another hack would be to use a screen recording software (OBS, Fraps, etc.) to create the video form your 3D app.

How to decode a video file straight to a Direct3D11 texture using Windows Media Foundation?

I'd like to decode the contents of a video file to a Direct3D11 texture and avoid the copies back and forth to CPU memory. Ideally, the library will play the audio itself and call back into my code whenever a video frame has been decoded.
On the surface, the Windows Media Foundation's IMFPMediaPlayer (ie MFPCreateMediaPlayer() and IMFPMediaPlayer::CreateMediaItemFromURL()) seem like a good match, except that the player decodes straight to the app's HWND. The documentation implies that I can add a custom video sink, but I have not been able to find documentation nor sample code on how to do that. Please point me in the right direction.
Currently, I am using libVLC to accomplish the above, but it only provides the video surface in CPU memory, which can become a bottleneck for my use-case.
Thanks.
Take a look at this source code from my project 'Stackoverflow' : MFVideoEVR
This program shows how to setup EVR (enhanced video renderer), and how to provide video samples to it, using a Source Reader.
The key is to provide video samples, so you can use them for your purpose.
This program provides samples through IMFVideoSampleAllocator. It is for DirectX9 texture. You need to change source code, and to use IMFVideoSampleAllocatorEx, instead : IMFVideoSampleAllocatorEx
About MFCreateVideoSampleAllocatorEx :
This function creates an allocator for DXGI video surfaces. The buffers created by this allocator expose the IMFDXGIBuffer interface.
So to retreive texture : IMFDXGIBuffer::GetResource
You can use this method to get a pointer to the ID3D11Texture2D interface of the surface. If the buffer is locked, the method returns MF_E_INVALIDREQUEST.
You will also have to manage sound through IMFSourceReader.
With this approach, there is no copy back to system memory.
PS : You don't talk about video format (h265, h264, mpeg2, others ??). MediaFoundation doesn't handle all video format, natively.

Video players questions

Given that FFmpeg is the leading multimedia framework and most of the video/audio players uses it, I'm wondering somethings about audio/video players using FFmpeg as intermediate.
I'm studying and I want to know how audio/video players works and I have some questions.
I was reading the ffplay source code and I saw that ffplay handles the subtitle stream. I tried to use a mkv file with a subtitle on it and doesn't work. I tried using arguments such as -sst but nothing happened. - I was reading about subtitles and how video files uses it (or may I say containers?). I saw that there's two ways putting a subtitle: hardsubs and softsubs - roughly speaking hardsubs mode is burned and becomes part of the video, and softsubs turns a stream of subtitles (I might be wrong - please, correct me).
The question is: How does they handle this? I mean, when the subtitle is part of the video there's nothing to do, the video stream itself shows the subtitle, but what about the softsubs? how are they handled? (I heard something about text subs as well). - How does the subtitle appears on the screen and can be configured changing fonts, size, colors, without encoding everything again?
I was studying some video players source codes and some or most of them uses OpenGL as renderer of the frame and others uses (such as Qt's QWidget) (kind of or for sure) canvas. - What is the most used and which one is fastest and better? OpenGL with shaders and stuffs? Handling YUV or RGB and so on? How does that work?
It might be a dump question but what is the format that AVFrame returns? For example, when we want to save frames as images first we need the frame and then we convert, from which format we are converting from? Does it change according with the video codec or it's always the same?
Most of the videos I've been trying to handle is using YUV720P, I tried to save the frames as png and I need to convert to RGB first. I did a test with the players and I put at the same frame and I took also screenshots and compared. The video players shows the frames more colorful. I tried the same with ffplay that uses SDL (OpenGL) and the colors (quality) of the frames seems to be really low. What might be? What they do? Is it shaders (or a kind of magic? haha).
Well, I think that is it for now. I hope you help me with that.
If this isn't the correct place, please let me know where. I haven't found another place in Stack Exchange communities.
There are a lot of question in one post:
How are 'soft subtitles' handled
The same way as any other stream :
read packets from a stream to the container
Give the packet to a decoder
Use the decoded frame as you wish. Here with most containers supporting subtitles the presentation time will be present. All you need at this time is get the text and burn it onto the image at the same presentation time. There are a lot of ways to print the text on the video, with ffmpeg or another library
What is the most used renderer and which one is fastest and better?
most used depend on the underlying system. For instance Qt only wrap native renderers, and even has a openGL version
You can only be as fast as the underlying system allows. Does it support ouble-buffering? Can it render in your decoded pixel format or do you have to perform color conversion before? This topic is too broad
Better only depend on the use case. this is too broad
what is the format that AVFrame returns?
It is a raw format (enum AVPixelFormat), and depends on the codec. There is a list of YUV and RGB FOURCCs which cover most formats in ffmpeg. Programmatically you can access the table AVCodec::pix_fmts to obtain the pixel format a specific codec support.

AVFoundation on OSX: OpenGL texture from video WITHOUT needing access to pixel data

I've read a lot of posts describing how people use AVAssetReader or AVPlayerItemVideoOutput to get video frames as raw pixel data from a video file, which they then use to upload to an OpenGL texture. However, this seems to create the needless step of decoding the video frames with the CPU (as opposed to the graphics card), as well as creating unnecessary copies of the pixel data.
Is there a way to let AVFoundation own all aspects of the video playback process, but somehow also provide access to an OpenGL texture ID it created, which can just be drawn into an OpenGL context as necessary? Has anyone come across anything like this?
In other words, something like this pseudo code:
initialization:
open movie file, providing an opengl context;
get opengl texture id;
every opengl loop:
draw texture id;
If you were to use the Video Decode Acceleration Framework on OS X, it will give you a CVImageBufferRef when you "display" decoded frames, which you can call CVOpenGLTextureGetName (...) on to use as a native texture handle in OpenGL software.
This of course is lower level than your question, but it is definitely possible for certain video formats. This is the only technique that I have personal experience with. However, I believe QTMovie also has similar functionality at a much higher level, and would likely provide the full range of features you are looking for.
I wish I could comment on AVFoundation, but I have not done any development work on OS X since 10.6. I imagine the process ought to be similar though, it should be layered on top of CoreVideo.

How to overlay direct3d in directshow

I am looking for a tutorial or documentation on how to overlay direct3d on top of a video (webcam) feed in directshow.
I want to provide a virtual web cam (a virtual device that looks like a web cam to the system (ie. so that it be used where ever a normal webcam could be used like IM video chats)
I want to capture a video feed from a webcam attached to the computer.
I want to overlay a 3d model on top of the video feed and provide that as the output.
I had planned on doing this in directshow only because it looked possible to do this in it. If you have any ideas about possible alternatives, I am all ears.
I am writing c++ using visual studio 2008.
Use the Video Mixing Renderer Filter to render the video to a texture, then render it to the scene as a full screen quad. After that you can render the rest of the 3D stuff on top and then present the scene.
Are you after a filter that sits somewhere in the graph that renders D3D stuff over the video?
If so then you need to look at deriving a filter from CTransformFilter. Something like the EZRGB example will give you something to work from. Basically once you have this sorted your filter needs to do the Direct 3D rendering and, literally, insert the resulting image into the direct show stream. Alas you can't render the Direct3D directly to a direct show video frame so you will have to do your rendering then lock the front/back buffer and copy the 3D data out and into the direct show stream. This isn't ideal as it WILL be quite slow (compared to standard D3D rendering) but its the best you can do, to my knowledge.
Edit: In light of your update what you want is quite complicated. You need to create a source filter (You should look at the CPushSource example) to begin with. Once you've done that you will need to register it as a video capture source. Basically you need to do this by using the IFilterMapper2::RegisterFilter call in your DLLRegisterServer function and pass in a class ID of "CLSID_VideoInputDeviceCategory". Adding the Direct3D will be as I stated above.
All round you want to spend as much time reading through the DirectShow samples in the windows SDK and start modifying them to do what YOU want them to do.