How to render YUV420P frames using DirectX? - c++

I'm using FFMPEG C++ to decode a video file.
I decoded the frames under YUV420P format in System Memory.
Next step, I want to render these frames by using DirectX.
I searched many times, and I know that I have to use the texture2D, but there is not any code sample for my case.
How to create a texture2D from YUV420 (or RGB instead) from System Memory and render it to the screen.
And the performance is also important, so I need the best way to do it.
Someone can show me the sample code?
Many Thanks!

Related

Is there a direct way to render/encode Vulkan output as an ffmpeg video file?

I'm about to generate 2D and 3D music animations and render them to video using C++. I was thinking about using OpenGL, but I've read that, unfortunately, it is being discontinued in favour of Vulkan, which seems to offer higher performance using a GPU, but is also a lower-level API, making it more difficult to learn. I still have almost no knowledge in both OpenGL and Vulkan, beginning to learn now.
My question is:
is there a way to encode the Vulkan render output (showing a window or not) into a video file, preferentially through FFPMEG? If so, how could I do that?
Requisites:
Speed: the decrease in performance should be nearly that of encoding the video only, not much more than that (e.g. by having to save lossless frames as images first and then encoding a video from them).
Controllable FPS and resolution: the video fps and frame resolution can be freely chosen.
Reliability, reproducibility: running a code that gives a same Vulkan output twice should result in 2 equal videos independently of the system, i.e. no dropping frames, async problems (I want to sync with audio) or whatsoever. The chosen video fps should stay fixed (e.g. 60 fps), no matter if the computer can render 300 or 3 fps.
What I found out so far:
An example of taking "screenshots" from Vulkan output: it writes to a ppm image at the end, which is a binary uncompressed image file.
An encoder for rendering videos from OpenGL output, which is what I want, but using OpenGL in that case.
That Khronos includes in the Vulkan API a video subset.
A video tool to decode, demux, process videos using FFMPEG and Vulkan.
That is possible to render the output into a buffer without the need of a screen to display it.
First of all, ffmpeg is a framework used for video encoding and decoding. Second, if you have no experience with any of the GPU rendering API you should start with OpenGL. Vulkan is very low-level and complicated. OpenGL will be here for a very long time and will not be immediately replaced with Vulkan.
The off-screen rendering option you mentioned is probably the best one. It doesn't really matter though, you can also use the image from the framebuffer. The image is just a matrix of RGBA pixels. You need these data as the input for the video encoding. Please take a look at how ffmpeg works. You need to send the rendered frame data in the encoder which produces video packets that are stored in a video file. You need to chose a container (mp4, mkv, avi,...) and video format (h265, av1, vp9,...). You can of course implement a frame limiter and render the scene with a constant framerate or just pick the frames that have a constant timestep.
The performance problem happens, when you transfer the data from RAM to GPU memory and vice versa. For example, when downloading the rendered image from the buffer and passing it to the CPU encoder. Therefore, the most optimal approach would be with Vulkan, using the new video extension and directly sending the rendered frames in the HW accelerated encoder without any transfers from the GPU memory. You can also run the encoder in a different thread to make it work asynchronously.
But honestly, it's not trivial. The most simple solution (not realtime) for you to create a video from 3D render would be to:
Create a fixed FPS game loop
Make screenshots of the scene by downloading the framebuffer data in OGL or Vulkan
Process the frames by ffmpeg binary to create a video file
Another hack would be to use a screen recording software (OBS, Fraps, etc.) to create the video form your 3D app.

Video players questions

Given that FFmpeg is the leading multimedia framework and most of the video/audio players uses it, I'm wondering somethings about audio/video players using FFmpeg as intermediate.
I'm studying and I want to know how audio/video players works and I have some questions.
I was reading the ffplay source code and I saw that ffplay handles the subtitle stream. I tried to use a mkv file with a subtitle on it and doesn't work. I tried using arguments such as -sst but nothing happened. - I was reading about subtitles and how video files uses it (or may I say containers?). I saw that there's two ways putting a subtitle: hardsubs and softsubs - roughly speaking hardsubs mode is burned and becomes part of the video, and softsubs turns a stream of subtitles (I might be wrong - please, correct me).
The question is: How does they handle this? I mean, when the subtitle is part of the video there's nothing to do, the video stream itself shows the subtitle, but what about the softsubs? how are they handled? (I heard something about text subs as well). - How does the subtitle appears on the screen and can be configured changing fonts, size, colors, without encoding everything again?
I was studying some video players source codes and some or most of them uses OpenGL as renderer of the frame and others uses (such as Qt's QWidget) (kind of or for sure) canvas. - What is the most used and which one is fastest and better? OpenGL with shaders and stuffs? Handling YUV or RGB and so on? How does that work?
It might be a dump question but what is the format that AVFrame returns? For example, when we want to save frames as images first we need the frame and then we convert, from which format we are converting from? Does it change according with the video codec or it's always the same?
Most of the videos I've been trying to handle is using YUV720P, I tried to save the frames as png and I need to convert to RGB first. I did a test with the players and I put at the same frame and I took also screenshots and compared. The video players shows the frames more colorful. I tried the same with ffplay that uses SDL (OpenGL) and the colors (quality) of the frames seems to be really low. What might be? What they do? Is it shaders (or a kind of magic? haha).
Well, I think that is it for now. I hope you help me with that.
If this isn't the correct place, please let me know where. I haven't found another place in Stack Exchange communities.
There are a lot of question in one post:
How are 'soft subtitles' handled
The same way as any other stream :
read packets from a stream to the container
Give the packet to a decoder
Use the decoded frame as you wish. Here with most containers supporting subtitles the presentation time will be present. All you need at this time is get the text and burn it onto the image at the same presentation time. There are a lot of ways to print the text on the video, with ffmpeg or another library
What is the most used renderer and which one is fastest and better?
most used depend on the underlying system. For instance Qt only wrap native renderers, and even has a openGL version
You can only be as fast as the underlying system allows. Does it support ouble-buffering? Can it render in your decoded pixel format or do you have to perform color conversion before? This topic is too broad
Better only depend on the use case. this is too broad
what is the format that AVFrame returns?
It is a raw format (enum AVPixelFormat), and depends on the codec. There is a list of YUV and RGB FOURCCs which cover most formats in ffmpeg. Programmatically you can access the table AVCodec::pix_fmts to obtain the pixel format a specific codec support.

Create video vom rendered OpenGL frames

i am searching for a way to create a video from a row of frames i have rendered with OpenGL and transfered to ram as int array using the glGetTexImage function. is it possible to achieve this directly in ram (~10 secs video) or do i have to save each frame to the harddisk and encode the video afterwards?
i have found this sample http://cekirdek.pardus.org.tr/~ismail/ffmpeg-docs/api-example_8c-source.html in an other SO question but is this still the best way to do it?
today i got a hint to use this example http://git.videolan.org/?p=ffmpeg.git;a=blob;f=doc/examples/decoding_encoding.c;h=cb63294b142f007cf78436c5d81c08a49f3124be;hb=HEAD to see how h264 could be achieved. the problem when you want to encode frames rendered by opengl is that they are in RGB(A) and most codecs require YUV. for the convertion you can use swscale (ffmpeg) - an example how to use it can be found here http://web.me.com/dhoerl/Home/Tech_Blog/Entries/2009/1/22_Revised_avcodec_sample.c.html
as ananthonline stated the direct encoding of the frames is very cpu intensive but you can also write your frames with ffmpeg as rawvideo format, which supports the rgb24 pixelformat, and convert it offline with the cmd commands of ffmpeg.
If you want a cross-platform way of doing it, you're probably going to have to use ffmpeg/libavcodec but on Windows, you can write an AVI quite simply using the resources here.

How to read .avi files C++

I want to read in an .avi video file for a program that I am making. I have the file location saved as a string. Is there any good tutorials on using .avi files in c++ or does anyone know who to read one in? Is it the same as normal files?
I have a previously asked SO question that goes into better detail but here is what I want to do:
I am making a program that will detect faces (though OpenCV) As of now I have been given a video processor program that will detect each face on a frame, and return the frame as a image and the CvRec of the faces. I want to take these faces and test them to validate that they are all actually faces.
After I have all the faces (tested) I want to then take the images and test them together. I test the faces on each frame for size and distance changes. If the faces pass this for a frame length of two seconds, then I want to crop the face and make it the subject of each frame.
After each frame is cropped I then want to save the new video file for the user.
Hopefully that helps. If anyone needs a better explanation please let me know.
First of all, a little background.
What is AVI?
AVI stands for Audio Video Interleave. It is a special case of the RIFF (Resource Interchange File Format). AVI is defined by Microsoft and it is the most common format for audio/video data.
I assume you would want to read a avi file and decode the compressed video frames. AVI file is just like any other normal file and you can use fread()(in C) or iostream(in C++) to open an avi file and read it contents. But the contents of an avi file are video frames in a compressed format. The compression allows video content of bigger sizes to be efficiently packed in less memory space.To make any sense of this compressed data you would have to decode the encoded data format.You will have to study the standard which describes how AVI encoding is done and then extract and decode the frames. this raw video data now when fed to a video device will be displayed in video format.
It seems you are staying within OpenCV so things are easy. If OpenCV is compiled properly it is capable of delegating io/coding/decoding to other libraries. Quicktime and others for example, but best is to use ffmpeg. You open, read and decode everything using the OpenCV API which gives you the video frame by frame.
Make sure your OpenCV is compiled with ffmpeg support and then read the OpenCV tutorial on how to read/write AVI files. It's really easy.
Getting OpenCV to be built with ffmpeg support might be hard though. You might want to switch to an older version of OpenCV if you can't get ffmpeg running with the current one.
Personally i would not spent time trying to read the video by yourself and delegate the task to OpenCV. That's how it is supposed to be used.

How to efficiently display custom video stream in OpenGL?

I have a custom library that can decode to RGBA or any other format.
What is the best way to marry it with OpenGL to decode onto texture so that it won't drop frames ?
Or is there a better way completely skipping textures?
Edit:
Full HD video streamed over net. So performance is an issue. 30 Hz. Recorded.
glTexSubImage2D() is quick and easy. You may be able to get more throughput with a PBO-based pipeline, at the expense of more latency.
Looks like glover may have some example code, as well as Ye Olde NeHe #35.