Writing an OpenMAX IL component, where to start - c++

I am about to grab the video output of my raspberry pi to pass it to kinda adalight ambient lightning system.
The XBMC's player for PI, omxplayer, users OpenMAX API for decoding and other functions.
Looking into the code gives the following:
m_omx_tunnel_sched.Initialize(&m_omx_sched, m_omx_sched.GetOutputPort(), &m_omx_render, m_omx_render.GetInputPort());
as far as I understand, this sets a pipeline between the video scheduler and the renderer [S]-->[R].
Now my idea is to write a grabber component and plug-in it hardly into the pipeline [S]-->[G]->[R]. The grabber will extract the pixels from the framebuffer and pass it to a deamon which will drive the leds.
Now I am about to dig into OpenMAX API which seems to be pretty weird. Where should I start? Is it a feasible approach?
Best Regards

If you want the decoded data then just do not send to the renderer. Instead of rendering, take the data and do whatever you want to do. The decoded data should be taken from the output port of the video_decode OpenMAX IL component. I suppose you'll also need to set the correct output pixel format, so set the component output port to the correct format you need, so the conversion is done by the GPU (YUV or RGB565 are available).

At first i think you should attach a buffer to the output of camera component, do everything you want with that frame in the CPU, and send a frame through a buffer attached to the input port of the render, its not going to be a trivial task, since there is little documentation about OpenMax on the raspberry.
Best place to start:
https://jan.newmarch.name/RPi/
Best place to have on hands:
http://home.nouwen.name/RaspberryPi/documentation/ilcomponents/index.html
Next best place: source codes distributed across the internet.
Good luck.

Related

Is there a direct way to render/encode Vulkan output as an ffmpeg video file?

I'm about to generate 2D and 3D music animations and render them to video using C++. I was thinking about using OpenGL, but I've read that, unfortunately, it is being discontinued in favour of Vulkan, which seems to offer higher performance using a GPU, but is also a lower-level API, making it more difficult to learn. I still have almost no knowledge in both OpenGL and Vulkan, beginning to learn now.
My question is:
is there a way to encode the Vulkan render output (showing a window or not) into a video file, preferentially through FFPMEG? If so, how could I do that?
Requisites:
Speed: the decrease in performance should be nearly that of encoding the video only, not much more than that (e.g. by having to save lossless frames as images first and then encoding a video from them).
Controllable FPS and resolution: the video fps and frame resolution can be freely chosen.
Reliability, reproducibility: running a code that gives a same Vulkan output twice should result in 2 equal videos independently of the system, i.e. no dropping frames, async problems (I want to sync with audio) or whatsoever. The chosen video fps should stay fixed (e.g. 60 fps), no matter if the computer can render 300 or 3 fps.
What I found out so far:
An example of taking "screenshots" from Vulkan output: it writes to a ppm image at the end, which is a binary uncompressed image file.
An encoder for rendering videos from OpenGL output, which is what I want, but using OpenGL in that case.
That Khronos includes in the Vulkan API a video subset.
A video tool to decode, demux, process videos using FFMPEG and Vulkan.
That is possible to render the output into a buffer without the need of a screen to display it.
First of all, ffmpeg is a framework used for video encoding and decoding. Second, if you have no experience with any of the GPU rendering API you should start with OpenGL. Vulkan is very low-level and complicated. OpenGL will be here for a very long time and will not be immediately replaced with Vulkan.
The off-screen rendering option you mentioned is probably the best one. It doesn't really matter though, you can also use the image from the framebuffer. The image is just a matrix of RGBA pixels. You need these data as the input for the video encoding. Please take a look at how ffmpeg works. You need to send the rendered frame data in the encoder which produces video packets that are stored in a video file. You need to chose a container (mp4, mkv, avi,...) and video format (h265, av1, vp9,...). You can of course implement a frame limiter and render the scene with a constant framerate or just pick the frames that have a constant timestep.
The performance problem happens, when you transfer the data from RAM to GPU memory and vice versa. For example, when downloading the rendered image from the buffer and passing it to the CPU encoder. Therefore, the most optimal approach would be with Vulkan, using the new video extension and directly sending the rendered frames in the HW accelerated encoder without any transfers from the GPU memory. You can also run the encoder in a different thread to make it work asynchronously.
But honestly, it's not trivial. The most simple solution (not realtime) for you to create a video from 3D render would be to:
Create a fixed FPS game loop
Make screenshots of the scene by downloading the framebuffer data in OGL or Vulkan
Process the frames by ffmpeg binary to create a video file
Another hack would be to use a screen recording software (OBS, Fraps, etc.) to create the video form your 3D app.

How to decode a video file straight to a Direct3D11 texture using Windows Media Foundation?

I'd like to decode the contents of a video file to a Direct3D11 texture and avoid the copies back and forth to CPU memory. Ideally, the library will play the audio itself and call back into my code whenever a video frame has been decoded.
On the surface, the Windows Media Foundation's IMFPMediaPlayer (ie MFPCreateMediaPlayer() and IMFPMediaPlayer::CreateMediaItemFromURL()) seem like a good match, except that the player decodes straight to the app's HWND. The documentation implies that I can add a custom video sink, but I have not been able to find documentation nor sample code on how to do that. Please point me in the right direction.
Currently, I am using libVLC to accomplish the above, but it only provides the video surface in CPU memory, which can become a bottleneck for my use-case.
Thanks.
Take a look at this source code from my project 'Stackoverflow' : MFVideoEVR
This program shows how to setup EVR (enhanced video renderer), and how to provide video samples to it, using a Source Reader.
The key is to provide video samples, so you can use them for your purpose.
This program provides samples through IMFVideoSampleAllocator. It is for DirectX9 texture. You need to change source code, and to use IMFVideoSampleAllocatorEx, instead : IMFVideoSampleAllocatorEx
About MFCreateVideoSampleAllocatorEx :
This function creates an allocator for DXGI video surfaces. The buffers created by this allocator expose the IMFDXGIBuffer interface.
So to retreive texture : IMFDXGIBuffer::GetResource
You can use this method to get a pointer to the ID3D11Texture2D interface of the surface. If the buffer is locked, the method returns MF_E_INVALIDREQUEST.
You will also have to manage sound through IMFSourceReader.
With this approach, there is no copy back to system memory.
PS : You don't talk about video format (h265, h264, mpeg2, others ??). MediaFoundation doesn't handle all video format, natively.

HD Video Calling in Unity3D

I am an amateur in video/image processing but I am trying to create an app for HD video calling. I hope someone would see where I may be doing wrong and guide me on the right path. Here is what I am doing and what I think I understand, please correct me if you know better.
I am using OpenCV currently to grab an image from my webcam in a DLL. (I will be using this image for other things later)
Currently, the image that opencv gives me is a Opencv::Mat. I resized this and converted to a byte array size of a 720p image, which is about 3 Megapixels.
I pass this ptr back to my C# code then I can now render this onto a texture.
Now I created a TCP socket and connect the server and client and start to transmit previously gotten image byte array. I am able to transmit the byte array over to the client then I use the GPU to render it to a texture.
Currently, there is a big delay of about 4-500ms delay. This is after I tried compressing the buffer with gzipstream for unity. It was able to compress the byte array from about 3 million bytes to 1.5 million. I am trying to get this to smallest as possible and also fastest as possible but this is where I am completely lost. I saw that Skype requires only 1.2Mbps connection for a 720p video calling at 22 fps. I have no idea how they can achieve such a small frame, but of course I don't need it to be that small. I need to be at least decent.
Please give me a lecture on how this can be done! And let me know if you need anything else from me.
I found a link that may be very useful to anyone working on something similar. https://www.cs.utexas.edu/~teammco/misc/udp_video/
https://github.com/chenxiaoqino/udp-image-streaming/

DXGI Desktop Duplication: encoding frames to send them over the network

I'm trying to write an app which will capture a video stream of the screen and send it to a remote client. I've found out that the best way to capture a screen on Windows is to use DXGI Desktop Duplication API (available since Windows 8). Microsoft provides a neat sample which streams duplicated frames to screen. Now, I've been wondering what is the easiest, but still relatively fast way to encode those frames and send them over the network.
The frames come from AcquireNextFrame with a surface that contains the desktop bitmap and metadata which contains dirty and move regions that were updated. From here, I have a couple of options:
Extract a bitmap from a DirectX surface and then use an external library like ffmpeg to encode series of bitmaps to H.264 and send it over RTSP. While straightforward, I fear that this method will be too slow as it isn't taking advantage of any native Windows methods. Converting D3D texture to a ffmpeg-compatible bitmap seems like unnecessary work.
From this answer: convert D3D texture to IMFSample and use MediaFoundation's SinkWriter to encode the frame. I found this tutorial of video encoding, but I haven't yet found a way to immediately get the encoded frame and send it instead of dumping all of them to a video file.
Since I haven't done anything like this before, I'm asking if I'm moving in the right direction. In the end, I want to have a simple, preferably low latency desktop capture video stream, which I can view from a remote device.
Also, I'm wondering if I can make use of dirty and move regions provided by Desktop Duplication. Instead of encoding the frame, I can send them over the network and do the processing on the client side, but this means that my client has to have DirectX 11.1 or higher available, which is impossible if I would want to stream to a mobile platform.
You can use IMFTransform interface for H264 encoding. Once you get IMFSample from ID3D11Texture2D just pass it to IMFTransform::ProcessInput and get the encoded IMFSample from IMFTransform::ProcessOutput.
Refer this example for encoding details.
Once you get the encoded IMFSamples you can send them one by one over the network.

How do you set a live video feed as an OpenGL RenderTarget or framebuffer?

i would like to take a live video feed from a video camera or 2 to do split screen stuff and render on top of them. How can i capture the input of the video?
i found some old code that uses pbuffers.. is this still the optimal way of doing it?
i guess their is a lot that depends on the connection interface, whether it is USB or fire wire or whatever?
thanks!
OpenCV has an abstraction layer that can handle web/video cameras.