Is there a direct way to render/encode Vulkan output as an ffmpeg video file? - c++

I'm about to generate 2D and 3D music animations and render them to video using C++. I was thinking about using OpenGL, but I've read that, unfortunately, it is being discontinued in favour of Vulkan, which seems to offer higher performance using a GPU, but is also a lower-level API, making it more difficult to learn. I still have almost no knowledge in both OpenGL and Vulkan, beginning to learn now.
My question is:
is there a way to encode the Vulkan render output (showing a window or not) into a video file, preferentially through FFPMEG? If so, how could I do that?
Requisites:
Speed: the decrease in performance should be nearly that of encoding the video only, not much more than that (e.g. by having to save lossless frames as images first and then encoding a video from them).
Controllable FPS and resolution: the video fps and frame resolution can be freely chosen.
Reliability, reproducibility: running a code that gives a same Vulkan output twice should result in 2 equal videos independently of the system, i.e. no dropping frames, async problems (I want to sync with audio) or whatsoever. The chosen video fps should stay fixed (e.g. 60 fps), no matter if the computer can render 300 or 3 fps.
What I found out so far:
An example of taking "screenshots" from Vulkan output: it writes to a ppm image at the end, which is a binary uncompressed image file.
An encoder for rendering videos from OpenGL output, which is what I want, but using OpenGL in that case.
That Khronos includes in the Vulkan API a video subset.
A video tool to decode, demux, process videos using FFMPEG and Vulkan.
That is possible to render the output into a buffer without the need of a screen to display it.

First of all, ffmpeg is a framework used for video encoding and decoding. Second, if you have no experience with any of the GPU rendering API you should start with OpenGL. Vulkan is very low-level and complicated. OpenGL will be here for a very long time and will not be immediately replaced with Vulkan.
The off-screen rendering option you mentioned is probably the best one. It doesn't really matter though, you can also use the image from the framebuffer. The image is just a matrix of RGBA pixels. You need these data as the input for the video encoding. Please take a look at how ffmpeg works. You need to send the rendered frame data in the encoder which produces video packets that are stored in a video file. You need to chose a container (mp4, mkv, avi,...) and video format (h265, av1, vp9,...). You can of course implement a frame limiter and render the scene with a constant framerate or just pick the frames that have a constant timestep.
The performance problem happens, when you transfer the data from RAM to GPU memory and vice versa. For example, when downloading the rendered image from the buffer and passing it to the CPU encoder. Therefore, the most optimal approach would be with Vulkan, using the new video extension and directly sending the rendered frames in the HW accelerated encoder without any transfers from the GPU memory. You can also run the encoder in a different thread to make it work asynchronously.
But honestly, it's not trivial. The most simple solution (not realtime) for you to create a video from 3D render would be to:
Create a fixed FPS game loop
Make screenshots of the scene by downloading the framebuffer data in OGL or Vulkan
Process the frames by ffmpeg binary to create a video file
Another hack would be to use a screen recording software (OBS, Fraps, etc.) to create the video form your 3D app.

Related

Windows Enhanced Video Renderer (EVR): Layer multiple 1080p Videos with transparency?

I am looking for ways to layer multiple 1080p Videos with transparency on Windows in C++ and DirectX or Opengl. The videos will start at different moments in time. Ideally the videos can be blended with another render target with other game content, so the resulting video texture should contain transparent pixels.
Can this be done with EVR and hardware acceleration? Which codecs are supported? http://en.wikipedia.org/wiki/Media_Foundation mentions transparency, but does not answer my questions. It sounds as if all the videos have to start at the same time and the resulting video texture has no transparency.
TIA
Christoph
This are my research results from around 03/14 with no definitive answer to this problem.
I did not try the mentioned possibility in Media Foundation, since it sounded as if the result has no transparency.
I was able to use a second gray scale video to mask the rgb video inside a shader. This can be done with a separate video stream, but syncing is needed. Moreover it is possible to encode a video with two frames side by side, but many HW accelerated video codecs do not allow this, WMF being the exception. Performance is not great but I was able to play 3 1080p30 videos simultaneously.
On a side note, to my surprise Flash was able to play 5+ 1080p30 videos with transparency simultaneously. The flash video codec allows alpha values, but I managed only inside flash to use them.

AVFoundation on OSX: OpenGL texture from video WITHOUT needing access to pixel data

I've read a lot of posts describing how people use AVAssetReader or AVPlayerItemVideoOutput to get video frames as raw pixel data from a video file, which they then use to upload to an OpenGL texture. However, this seems to create the needless step of decoding the video frames with the CPU (as opposed to the graphics card), as well as creating unnecessary copies of the pixel data.
Is there a way to let AVFoundation own all aspects of the video playback process, but somehow also provide access to an OpenGL texture ID it created, which can just be drawn into an OpenGL context as necessary? Has anyone come across anything like this?
In other words, something like this pseudo code:
initialization:
open movie file, providing an opengl context;
get opengl texture id;
every opengl loop:
draw texture id;
If you were to use the Video Decode Acceleration Framework on OS X, it will give you a CVImageBufferRef when you "display" decoded frames, which you can call CVOpenGLTextureGetName (...) on to use as a native texture handle in OpenGL software.
This of course is lower level than your question, but it is definitely possible for certain video formats. This is the only technique that I have personal experience with. However, I believe QTMovie also has similar functionality at a much higher level, and would likely provide the full range of features you are looking for.
I wish I could comment on AVFoundation, but I have not done any development work on OS X since 10.6. I imagine the process ought to be similar though, it should be layered on top of CoreVideo.

GPU memory allocation for video

Is it possible to allocate some memory on the GPU without cuda?
i'm adding some more details...
i need to get the video frame decoded from VLC and have some compositing functions on the video; I'm doing so using the new SDL rendering capabilities.
All works fine until i have to send the decoded data to the sdl texture... that part of code is handled by standard malloc which is slow for video operations.
Right now i'm not even sure that using gpu video will actually help me
Let's be clear: are you are trying to accomplish real time video processing? Since your latest update changed the problem considerably, I'm adding another answer.
The "slowness" you are experiencing could be due to several reasons. In order get the "real-time" effect (in the perceptual sense), you must be able to process the frame and display it withing 33ms (approximately, for a 30fps video). This means you must decode the frame, run the compositing functions (as you call) on it, and display it on the screen within this time frame.
If the compositing functions are too CPU intensive, then you might consider writing a GPU program to speed up this task. But the first thing you should do is determine where the bottleneck of your application is exactly. You could strip your application momentarily to let it decode the frames and display them on the screen (do not execute the compositing functions), just to see how it goes. If its slow, then the decoding process could be using too much CPU/RAM resources (maybe a bug on your side?).
I have used FFMPEG and SDL for a similar project once and I was very happy with the result. This tutorial shows to do a basic video player using both libraries. Basically, it opens a video file, decodes the frames and renders them on a surface for displaying.
You can do this via Direct3D 11 Compute Shaders or OpenCL. These are similar in spirit to CUDA.
Yes, it is. You can allocate memory in the GPU through OpenGL textures.
Only indirectly through a graphics framework.
You can use OpenGL which is supported by virtually every computer.
You could use a vertex buffer to store your data. Vertex buffers are usually used to store points for rendering, but you can easily use it to store an array of any kind. Unlike textures, their capacity is only limited by the amount of graphics memory available.
http://www.songho.ca/opengl/gl_vbo.html has a good tutorial on how to read and write data to vertex buffers, you can ignore everything about drawing the vertex buffer.

Combining Direct3D, Axis to make multiple IP camera GUI

Right now, what I'm trying to do is to make a new GUI, essentially a software using directX (more exact, direct3D), that display streaming images from Axis IP cameras.
For the time being I figured that the flow for the entire program would be like this:
1. Get the Axis program to get streaming images
2. Pass the images to the Direct3D program.
3. Display the program, on the screen.
Currently I have made a somewhat basic Direct3D app that loads and display video frames from avi videos(for testing). I dunno how to load images directly from videos using DirectX, so I used OpenCV to save frames from the video and have DX upload them up. Very slow.
Right now I have some unclear things:
1. How to Get an Axis program that works in C++ (gonna look up examples later, prolly no big deal)
2. How to upload images directly from the Axis IP camera program.
So guys, do you have any recommendations or suggestions on how to make my program work more efficiently? Anything just let me know.
Well you may find it faster to use directshow and add a custom renderer at the far end that, directly, copies the decompressed video data directly to a Direct3D texture.
Its well worth double buffering that texture. ie have texture 0 displaying and texture 1 being uploaded too and then swap the 2 over when a new frame is available (ie display texture 1 while uploading to texture 0).
This way you can de-couple the video frame rate from the rendering frame rate which makes dropped frames a little easier to handle.
I use in-place update of Direct3D textures (using IDirect3DTexture9::LockRect) and it works very fast. What part of your program works slow?
For capture images from Axis cams you may use iPSi c++ library: http://sourceforge.net/projects/ipsi/
It can be used for capturing images and control camera zoom and rotation (if available).

Displaying a video in DirectX

What is the best/easiest way to display a video (with sound!) in an application using XAudio2 and Direct3D9/10?
At the very least it needs to be able to stream potentially larger videos, and take care of the fact that the windows aspect ratio may differ from the videos (eg by adding letter boxes), although ideally Id like the ability to embed the video into a 3D scene.
I could of course work out a way to load each frame into a texture, discarding/reusing the textures once rendered, and playing the audio separately through XAudio2, however as well as writing a loader for at least one format, ive also got to deal with stuff like synchronising the video and audio components, so hopefully there is an eaier solution available or even a ready made free one with a suitable lisence (commercial distribution in binary form, dynamic linking is fine in the case of say LGPL).
In Windows SDK, there is a DirectShow example for rendering video to texture. It handles audio output too.
But there are limitations and I can't honestly call it easy.
Have you looked at Bink video? Its what lots of games use for video playback. Works great and you don't have to code all that video stuff yourself from scratch.