I have recently purchased an Orbbec Astra camera, which uses the same technology and produces the same style depth map as a Microsoft Kinect.
What would be the correct file format to save the depth map frames, How would I go about saving the videos recorded?
I have been able to load a stream but am not sure what format the frames should be saved in so that i can load them for testing at a later stage and still have all the same information.
I am using OpenNI2, OpenCV3.1.0 and C++.
Related
I'm working on a project that needs to write several minutes of DX11 swapchain output to a video file (of any format). I've found lots of resources for writing a completed frame to a texture file with DX11, but the only thing I found relating to a video render output is using FFMPEG to stream the rendered frame, which uses an encoding pattern that doesn't fit my render pipeline and discards the frame immediately after streaming it.
I'm unsure what code I could post that would help answer this, but it might help to know that in this scenario I have a composite Shader Resource View + Render Target View that contains all of the data (in RGBA format) that would be needed for the frame presented to the screen. Currently, it is presented to the screen as a window, but I need to also provide a method to encode the frame (and thousands of subsequent frames) into a video file. I'm using Vertex, Pixel, and Compute shaders in my rendering pipeline.
Found the answer thanks to a friend offline and Simon Mourier's reply! Check out this guide for a nice tutorial on using the Media Foundation API and the Media Sink to encode a data buffer to a video file:
https://learn.microsoft.com/en-us/windows/win32/medfound/tutorial--using-the-sink-writer-to-encode-video
Other docs in the same section describe useful info like the different encoding types and what input they need.
In my case, the best way to go about rendering my composite RTV to a video file was creating a CPU-Accessible buffer, copying the composite resource to it, then accessing the CPU buffer as an array of pixel colors, which media sink understands.
I'm about to generate 2D and 3D music animations and render them to video using C++. I was thinking about using OpenGL, but I've read that, unfortunately, it is being discontinued in favour of Vulkan, which seems to offer higher performance using a GPU, but is also a lower-level API, making it more difficult to learn. I still have almost no knowledge in both OpenGL and Vulkan, beginning to learn now.
My question is:
is there a way to encode the Vulkan render output (showing a window or not) into a video file, preferentially through FFPMEG? If so, how could I do that?
Requisites:
Speed: the decrease in performance should be nearly that of encoding the video only, not much more than that (e.g. by having to save lossless frames as images first and then encoding a video from them).
Controllable FPS and resolution: the video fps and frame resolution can be freely chosen.
Reliability, reproducibility: running a code that gives a same Vulkan output twice should result in 2 equal videos independently of the system, i.e. no dropping frames, async problems (I want to sync with audio) or whatsoever. The chosen video fps should stay fixed (e.g. 60 fps), no matter if the computer can render 300 or 3 fps.
What I found out so far:
An example of taking "screenshots" from Vulkan output: it writes to a ppm image at the end, which is a binary uncompressed image file.
An encoder for rendering videos from OpenGL output, which is what I want, but using OpenGL in that case.
That Khronos includes in the Vulkan API a video subset.
A video tool to decode, demux, process videos using FFMPEG and Vulkan.
That is possible to render the output into a buffer without the need of a screen to display it.
First of all, ffmpeg is a framework used for video encoding and decoding. Second, if you have no experience with any of the GPU rendering API you should start with OpenGL. Vulkan is very low-level and complicated. OpenGL will be here for a very long time and will not be immediately replaced with Vulkan.
The off-screen rendering option you mentioned is probably the best one. It doesn't really matter though, you can also use the image from the framebuffer. The image is just a matrix of RGBA pixels. You need these data as the input for the video encoding. Please take a look at how ffmpeg works. You need to send the rendered frame data in the encoder which produces video packets that are stored in a video file. You need to chose a container (mp4, mkv, avi,...) and video format (h265, av1, vp9,...). You can of course implement a frame limiter and render the scene with a constant framerate or just pick the frames that have a constant timestep.
The performance problem happens, when you transfer the data from RAM to GPU memory and vice versa. For example, when downloading the rendered image from the buffer and passing it to the CPU encoder. Therefore, the most optimal approach would be with Vulkan, using the new video extension and directly sending the rendered frames in the HW accelerated encoder without any transfers from the GPU memory. You can also run the encoder in a different thread to make it work asynchronously.
But honestly, it's not trivial. The most simple solution (not realtime) for you to create a video from 3D render would be to:
Create a fixed FPS game loop
Make screenshots of the scene by downloading the framebuffer data in OGL or Vulkan
Process the frames by ffmpeg binary to create a video file
Another hack would be to use a screen recording software (OBS, Fraps, etc.) to create the video form your 3D app.
I initialized the kinect sensor using NUI_INITIALIZE(NUI_INITIALIZE_FLAG_USES_SKELETON) to get the skeletal data.
I'm working on Augmented Reality Project where i can display a virtual ball/cube in the video feed that kinect generates by gathering the skeletal data in the background.
I will get the coordinates of hands and i'll render the cube with respect to the hand.
However i can't find a way to have a video feed and skeletal data together.
NUI_INITIALIZE(NUI_INITIALIZE_FLAG_USES_COLOR) gives you color data, you can only initialize the camera once. So it is either the video feed or the skeleton coordinates.
I tried to find the solution but i can't find any.
Note: I don't have any use of RGB except for preview so i can see the virtual object, since i'll be using the skeleton data to get the hand coordinates.
Found the Answer:
NuiInitialize(NUI_INITIALIZE_FLAG_USES_COLOR|NUI_INITIALIZE_FLAG_USES_SKELETON);
This will allow use of both the data.
Some background:
I am attempting to create a DirectShow source filter based on the pushsource example from the DirectShow SDK. This essentially outputs a set of bitmaps, each of which can last for a long time (for example 30 seconds), to a video. I have set up a filter graph which uses Async_reader with a Wave Parser for audio and my new filter to push the video (the filter is a CSourceStream and I populate my frames in the FillBuffer function). These are both connected to a WMASFWriter to output a WMV.
The problem:
When I attempt to seek through the resulting video, I have to wait until a bitmap's start time occurs before it is displayed. For example, if I'm currently seeing bitmap 4 and skip back to the time which bitmap 2 is displayed the video output will not change until the third bitmap starts. Initially I wondered if I wasn't allowing FillBuffer to be called enough (as at the moment it's only once per bitmap) however I have since noted that when the audio track is very short (just a second long perhaps), I can seek through the video as expected. Is there a another way I should be introducing audio into the filter graph? Do I need to perform some kind of indexing when the WMV has been rendered? I'm at a bit of a loss...
You may need to do indexing as a post-processing step. Try indexing it with Windows Media File Editor from Windows Media Encoder SDK and see if this improves seeking.
Reducing key frame interval in the encoder profile may improve seeking. This can be done in Windows Media Profile Editor from the SDK. Note that this will cause file size increase.
Right now, what I'm trying to do is to make a new GUI, essentially a software using directX (more exact, direct3D), that display streaming images from Axis IP cameras.
For the time being I figured that the flow for the entire program would be like this:
1. Get the Axis program to get streaming images
2. Pass the images to the Direct3D program.
3. Display the program, on the screen.
Currently I have made a somewhat basic Direct3D app that loads and display video frames from avi videos(for testing). I dunno how to load images directly from videos using DirectX, so I used OpenCV to save frames from the video and have DX upload them up. Very slow.
Right now I have some unclear things:
1. How to Get an Axis program that works in C++ (gonna look up examples later, prolly no big deal)
2. How to upload images directly from the Axis IP camera program.
So guys, do you have any recommendations or suggestions on how to make my program work more efficiently? Anything just let me know.
Well you may find it faster to use directshow and add a custom renderer at the far end that, directly, copies the decompressed video data directly to a Direct3D texture.
Its well worth double buffering that texture. ie have texture 0 displaying and texture 1 being uploaded too and then swap the 2 over when a new frame is available (ie display texture 1 while uploading to texture 0).
This way you can de-couple the video frame rate from the rendering frame rate which makes dropped frames a little easier to handle.
I use in-place update of Direct3D textures (using IDirect3DTexture9::LockRect) and it works very fast. What part of your program works slow?
For capture images from Axis cams you may use iPSi c++ library: http://sourceforge.net/projects/ipsi/
It can be used for capturing images and control camera zoom and rotation (if available).