Set Kinect v2 frame rates (rgb, depth, skeleton) to 25 fps - c++

I'm working on a project which requires all three streams from the Kinect v2 sensor (RGB, Depth, and Skeleton) to be captured, processed and streamed at 25 fps.
My program works with default settings and all three streams seem to be operating at 30fps. Is there a method to reduce this to 25 fps?.
I'm working on a C++ environment.

The KINECT SDK offers no way of setting the frame rate. Moreover, the RGB frame rate is not fixed and may drop down to 15 fps (from 30 fps) in low light conditions. Your approach of adding delay does not change the native frame rate of the devices. All you are doing is selectively dropping some frames based on timing. If regular spacing between frame capture times is important to your application you should be implementing an interpolation method instead.

Related

How to detect camera frame loss using Windows media API like Media Foundation or DirectShow?

I am writing an application for Windows that runs a CUDA accelerated HDR algorithm. I've set up an external image signal processor device that presents as a UVC device, and delivers 60 frames per second to the Windows machine over USB 3.0.
Every "even" frame is a more underexposed frame, and every "odd" frame is a more overexposed frame, which allows my CUDA code perform a modified Mertens exposure fusion algorithm to generate a high quality, high dynamic range image.
Very abstract example of Mertens exposure fusion algorithm here
My only problem is that I don't know how to know when I'm missing frames, since the only camera API I have interfaced with on Windows (Media Foundation) doesn't make it obvious that a frame I grab with IMFSourceReader::ReadSample isn't the frame that was received after the last one I grabbed.
Is there any way that I can guarantee that I am not missing frames, or at least easily and reliably detect when I have, using a Windows available API like Media Foundation or DirectShow?
It wouldn't be such a big deal to miss a frame and then have to purposefully "skip" the next frame in order to grab the next oversampled or undersampled frame to pair with the last frame we grabbed, but I would need to know how many frames were actually missed since a frame was last grabbed.
Thanks!
There is IAMDroppedFrames::GetNumDropped method in DirectShow and chances are that it can be retrieved through Media Foundation as well (never tried - they are possibly obtainable with a method similar to this).
The GetNumDropped method retrieves the total number of frames that the filter has dropped since it started streaming.
However I would question its reliability. The reason is that with these both APIs, the attribute which is more or less reliable is a time stamp of a frame. Capture devices can flexibly reduce frame rate for a few reasons, including both external like low light conditions and internal like slow blocking processing downstream in the pipeline. This makes it hard to distinguish between odd and even frames, but time stamp remains accurate and you can apply frame rate math to convert to frame indices.
In your scenario I would however rather detect large gaps in frame times to identify possible gap and continuity loss, and from there run algorithm that compares exposure on next a few consecutive frames to get back to sync with under-/overexposition. Sounds like a more reliable way out.
After all this exposure problem is highly likely to be pretty much specific to the hardware you are using.
Normally MFSampleExtension_Discontinuity is here for this. When you use IMFSourceReader::ReadSample, check this.

Sampling rate deviation and sound playing position

When you set soundcard rate to, for example, 44100, you cannot guarantee actual rate be equal 44100. In my case traffic measurements between application and ALSA (in samples/sec) gave me value of 44066...44084.
This should not be related to resampling issues: even only-48000 hardware must "eat" data at 44100 rate in "44100" mode.
The problem occurs when i try to draw a cursor over waveform while this waveform is playing. I calculate cursor position using "ideal" sampling rate read from WAV-file (22050, ..., 44100, ..., 48000) and the milliseconds spent after playing start, using following C++ function:
long long getCurrentTimeMs(void)
{
boost::posix_time::ptime now = boost::posix_time::microsec_clock::local_time();
boost::posix_time::ptime epoch_start(boost::gregorian::date(1970,1,1));
boost::posix_time::time_duration dur = now - epoch_start;
return dur.total_milliseconds();
}
QTimer is used to generate frames for cursor animation, but i do not depend on QTimer precision, because i ask time by getCurrentTimeMs() (assiming it is precise enough) every frame, so i can work with varying framerate.
After 2-3 minutes of playing i see a little difference between what i hear and what i see - the cursor position is greater than playing position for something like 1/20 of second or so.
When i measure traffic that go through ALSA's callback i get mean value of 44083.7 samples/sec. Then i use this value in the screen drawing function as an actual rate. Now the problem disappears. The program is cross-platform, so i will test this measurements on windows and another soundcard later.
But is there a better way to sync sound and screen? Is there some not very CPU-consuming way of asking soundcard about actual playing sample number, for example?
This is a known effect, which is for example in Windows addressed by Rate Matching, described here Live Sources.
On playback, the effect is typically addressed by using audio hardware as "clock" and synchronizing to audio playback instead of "real" clock. That is, for example, with audio sampling rate 44100, next video frame of 25 fps video is presented in sync with 44100/25 sample playback rather than using 1/25 system time increment. This compensates for the imprecise effective playback rate.
On capture, the hardware itself acts as if it is delivering data at exactly requested rate. I think the best you can do is to measure effective rate and resample audio from effecive to correct sampling rate.

Playing video at frame rates that are not multiples of the refresh rate.

I'm working on an application to stream video to OpenGL textures. My first thought was to lock the rendering loop to 60hz, so to play a video at 30fps or 60fps I would update the texture on every other frame or every frame respectively. How do computers play videos at other frame rates when monitors are at 60hz, or for that matter if a monitor is at 75 hz how do they play 30fps video?
For most consumer devices, you get something like 3:2 pulldown, which basically copies the source video frames unevenly. Specifically, in a 24 Hz video being shown on a 60 Hz display, the frames are alternately doubled and tripled. For your use case (video in OpenGL textures), this is likely the best way to do it, as it avoids tearing.
If you have enough compute ability to run actual resampling algorithms, you can convert any frame rate to any other frame rate. Your choice of algorithm defines how smooth the conversion looks, and different algorithms will work best in different scenarios.
Too much smoothness may cause things like the 120 Hz "soap opera" effect
[1]
[2]:
We have been trained by growing up watching movies at 24 FPS to expect movies to have a certain look and feel to them that is an artifact of that particular frame rate.
When these movies are [processed], the extra sharpness and clearness can make the movies look wrong to viewers, even though the video quality is actually closer to real.
This is commonly called the Soap Opera Effect, because some feel it makes these expensive movies look like cheap shot-on-video soap operas (because the videotape format historically used on soap operas worked at 30 FPS).
Essentially you're dealing with a resampling problem. Your original data was sampled at 30Hz or 60Hz, and you've to resample it to another sample rate. The very same algorithms apply. Most of the time you'll find articles about audio signal resampling. Just think each pixel's color channel to be a individual waveform you want to resample.

OpenCV small screen showing delay?

I am using OpenCV to write an app (in C++ on Windows 7) that uses the cv.camshift() function to track an object on the screen. I noticed that the my camera window (my application window showing what the camera sees) has a little delay with respect to very rapid motions. The delay seems to be about 0.1 seconds - very small, but noticable. I am developing an application that is very sensitive to these delays. In order to rule out my coding error, I tried to use one of the OpenCV sample video apps that shows what the camera sees on the screen and it also had this tiny delay. Interestingly, when I look at what my camera sees through Skype, there seems to be virtually no delay at all. Is there anything I can do to make OpenCV operate faster to get rid of this tiny delay?
CamShift detects motion using meanShift - the mean motion of the object center. This has to be calculated over more than one frame. For a frame rate of 30 Hz, a depth of 3 frames would be 0.1 seconds.

Getting and Setting Camera Settings

I have been searching around and can't find an example of how to get and set the camera capturing settings. For example the capturing resolution, fps, color balance, etc. I have only seen examples of how to change the settings when saving the captured video but I want to be able to find all the camera's capturing modes and choose which one I want. For example, I am using the PS3eye webcam and in the test program it allows you to change the settings (320x240 at 15,30,60,120 fps, 640x480 at 15,30,60,75 fps). So is there a function in OpenCV for getting all the camera's capture modes and choosing one? I remember in OpenFrameworks there was a function to change these settings but I would like to know how to do it in OpenCV.
Here is the code for OpenFrameworks with OpenCV that does sort of what I want:
vidGrabber.setDeviceID( 4 );
vidGrabber.setDesiredFrameRate( 30 ); //I want this
vidGrabber.videoSettings();
vidGrabber.setVerbose(true);
vidGrabber.initGrabber(320,240); //And this
cvSetCaptureProperty()
with these flags:
CV_CAP_PROP_FRAME_WIDTH - width of frames in the video stream (only for cameras)
CV_CAP_PROP_FRAME_HEIGHT - height of frames in the video stream (only for cameras)
CV_CAP_PROP_FPS - frame rate (only for cameras)
I built a Directshow camera library which able to acquire the camera resolutions and properties.
https://github.com/kcwongjoe/directshow_camera
The fps is usually depended on your machine, resolution and exposure time. To change the fps, you can modify the exposure time in a fixed resolution.