Directshow returns wrong Frame rate FPS - c++

I want to have the frame rate of a media file using DirectShow.
Currently, I use the following methods which seem inaccurate in some cases:
I add a SourceFilter to my graph, enum its pins, then call a pPin->ConnectionMediaType(&compressedMediaFormat) and extract AvgTimePerFrame out from it. As far as I understand it is the average time per frame expressed in 100 nanoseconds. So, I just divide 10,000,000 / AvgTimePerFrame to get the average FPS of the file.
For those media files which have almost the same frame time for all frames, I get a correct FPS. But for those, which have different frame times for different frames this method returns very inaccurate results.
A correct way to get that would be to get the duration and frame count of the file and calculate the average FPS out of it (frameCount / duration). This is a costly operation however as I understand because calculating the exact number of frames requires passing through the whole file.
I wonder if there is a way to get that frame rate information more accurately?

The media files don't have to be of fixed frame rate, in general - there might be variable frame rate. The metadata of the file still has some frame rate related information which, in this case, might be inaccurate. When you start accessing the file, you have the quickly available metadata information about the frame rate. Indeed, to get the full picture you are supposed to read all frames and process their time stamps.
Even though in many it is technically possible to quickly read just time stamps of frames without reading the actual data, DirectShow demultiplexers/parsers have no method defined to obtain the information, so you would have to read and count the frames to get the accurate information.
You don't need to decompress video for that though, and you can also remove clock from the filter graph when doing this, so that counting frames does not require streaming data in real time (frames would be streamed at maximal rate in that case).

Related

How to detect camera frame loss using Windows media API like Media Foundation or DirectShow?

I am writing an application for Windows that runs a CUDA accelerated HDR algorithm. I've set up an external image signal processor device that presents as a UVC device, and delivers 60 frames per second to the Windows machine over USB 3.0.
Every "even" frame is a more underexposed frame, and every "odd" frame is a more overexposed frame, which allows my CUDA code perform a modified Mertens exposure fusion algorithm to generate a high quality, high dynamic range image.
Very abstract example of Mertens exposure fusion algorithm here
My only problem is that I don't know how to know when I'm missing frames, since the only camera API I have interfaced with on Windows (Media Foundation) doesn't make it obvious that a frame I grab with IMFSourceReader::ReadSample isn't the frame that was received after the last one I grabbed.
Is there any way that I can guarantee that I am not missing frames, or at least easily and reliably detect when I have, using a Windows available API like Media Foundation or DirectShow?
It wouldn't be such a big deal to miss a frame and then have to purposefully "skip" the next frame in order to grab the next oversampled or undersampled frame to pair with the last frame we grabbed, but I would need to know how many frames were actually missed since a frame was last grabbed.
Thanks!
There is IAMDroppedFrames::GetNumDropped method in DirectShow and chances are that it can be retrieved through Media Foundation as well (never tried - they are possibly obtainable with a method similar to this).
The GetNumDropped method retrieves the total number of frames that the filter has dropped since it started streaming.
However I would question its reliability. The reason is that with these both APIs, the attribute which is more or less reliable is a time stamp of a frame. Capture devices can flexibly reduce frame rate for a few reasons, including both external like low light conditions and internal like slow blocking processing downstream in the pipeline. This makes it hard to distinguish between odd and even frames, but time stamp remains accurate and you can apply frame rate math to convert to frame indices.
In your scenario I would however rather detect large gaps in frame times to identify possible gap and continuity loss, and from there run algorithm that compares exposure on next a few consecutive frames to get back to sync with under-/overexposition. Sounds like a more reliable way out.
After all this exposure problem is highly likely to be pretty much specific to the hardware you are using.
Normally MFSampleExtension_Discontinuity is here for this. When you use IMFSourceReader::ReadSample, check this.

Change FPS on video capture from file with opencv

I'm reading a video file, and it's slower than the actual FPS of the file (59 FPS #1080p) even if i'm not doing any processing to the image:
using namespace cv;
using namespace std;
// Global variables
UMat frame; //current frame
int main(int argc, char** argv)
{
VideoCapture capture("myFile.MP4");
namedWindow("Frame");
capture.set(CAP_PROP_FPS, 120); //not changing anything
cout>>capture.get(CAP_PROP_FPS);
while (charCheckForEscKey != 27) {
capture >>frame;
if (frame.empty())
break;
imshow("Frame", frame);
}
}
Even if I tried to set CAP_PROP_FPS to 120 it doesn't change the fps of the file and when I get(CAP_PROP_FPS) I still get 59.9...
When I read the video the actual outcome is more or less 54 FPS (even using UMat).
Is there a way to read the file at a higher FPS rate ?
I asked he question on the opencv Q&A website as well :http://answers.opencv.org/question/117482/change-fps-on-video-capture-from-file/
Is it just because my computer is too slow ?
TL;DR FPS is irrelevant to the problem, probably a performance issue
What FPS is used for? Before you can display a single frame of video, you have to read the data (from an HDD, DVD, Network, Internet or whatever) and decode it. Both of these operations take time, the amount of which differs from system to system, depending on the speed of HDD/Internet, processor speed etc. If we just display each frame as soon as it's ready, the resulting movie speed will, therefore, vary from system to system. That is usually not what we want, so along with the sequence of video frames we get the "frames per second" value (a. k. a. FPS), which tells us how soon we shall display each consecutive frame (once every 1/30-th of a second for 30 FPS, once every 1/60-th of a second for 60 FPS, etc.) If the frame is ready to be displayed but it's too early, we can wait till its time comes. If it's time to display a frame but it's not ready (on an underpowered/too busy system), there's not much we can do (maybe drop frames in some situations). To see the effect for yourself, try changing the FPS value for x2, saving the file and display it with VLC: for the same amount of data and same number of frame, you will notice that the speed of your video has doubled and the time - halved. Try writing each frame twice for your x2 FPS - you will see that the playback speed is back to normal (with double the number of frames and meaningless increase of the file size).
What FPS is not used for? When processing (not displaying) a video, we are not limited by the original FPS, and the processing goes as fast as it can. If your PC can process 1000 frames per second - good, if 1500 - even better. Needless to say, changing the FPS value in the file won't improve your CPU/HDD speed, so if you were only able to process 54 frames per second, you are still going to be able to only process 54 frames per second.
But how can VLC display faster? Assuming you didn't forget to switch from Debug to Release build before measuring time, there are still a number of possibilities: VLC is probably better optimized for the particular task of video playback (OpenCV is not really that fast at some tasks, plus it has to convert each frame to a more general Mat/UMat structure), multithreading (including "double buffering", as mentioned in the comments) is another possible reason, maybe caching as well (e. g. reading a block of data containing many frames from the HDD at once instead of reading and processing frames one by one).

Playing video at frame rates that are not multiples of the refresh rate.

I'm working on an application to stream video to OpenGL textures. My first thought was to lock the rendering loop to 60hz, so to play a video at 30fps or 60fps I would update the texture on every other frame or every frame respectively. How do computers play videos at other frame rates when monitors are at 60hz, or for that matter if a monitor is at 75 hz how do they play 30fps video?
For most consumer devices, you get something like 3:2 pulldown, which basically copies the source video frames unevenly. Specifically, in a 24 Hz video being shown on a 60 Hz display, the frames are alternately doubled and tripled. For your use case (video in OpenGL textures), this is likely the best way to do it, as it avoids tearing.
If you have enough compute ability to run actual resampling algorithms, you can convert any frame rate to any other frame rate. Your choice of algorithm defines how smooth the conversion looks, and different algorithms will work best in different scenarios.
Too much smoothness may cause things like the 120 Hz "soap opera" effect
[1]
[2]:
We have been trained by growing up watching movies at 24 FPS to expect movies to have a certain look and feel to them that is an artifact of that particular frame rate.
When these movies are [processed], the extra sharpness and clearness can make the movies look wrong to viewers, even though the video quality is actually closer to real.
This is commonly called the Soap Opera Effect, because some feel it makes these expensive movies look like cheap shot-on-video soap operas (because the videotape format historically used on soap operas worked at 30 FPS).
Essentially you're dealing with a resampling problem. Your original data was sampled at 30Hz or 60Hz, and you've to resample it to another sample rate. The very same algorithms apply. Most of the time you'll find articles about audio signal resampling. Just think each pixel's color channel to be a individual waveform you want to resample.

DirectShow filter graph using WMASFWriter creates video which is too short

I am attempting to create a DirectShow source filter based on the pushsource example from the DirectShow SDK. This essentially outputs a set of bitmaps to a video. I have set up a filter graph which uses Async_reader with a Wave Parser for audio and my new filter to push the video (the filter is a CSourceStream and I populate my frames in the FillBuffer function). These are both connected to a WMASFWriter to output a WMV.
Each bitmap can last for several seconds so in the FillBuffer function I'm calling SetTime on the passed IMediaSample with a start and end time several seconds apart. This works fine when rendering to the screen but writing to disk results in a file which is too short in duration. It seems like the last bitmap is being ignored when writing a WMV (it is shown as the video ends rather than lasting for the intended duration). This is the case both with my filter and a modified pushsource filter (in which the frame length has been increased).
I've seen additional odd behaviour in that it was not possible to have a video that wasn't a multiple of 10 seconds in length at one point whilst I was trying to make this work. I'm not sure what this was, but I though I'd mention it incase it's relevant.
I think the end time is simply ignored. Normally video samples only have a start time because they are a point in time. If there is movement in the video, the movement is fluent, though the video are just points in time.
I think the solution is simple. Because video stays the same until the next frame is received, you can just add a dummy frame at the end of your video. You can simply repeat the previous frame.

C/C++ library for seekable movie format

I'm doing some processing on some very large video files (often up to 16MP), and I need a way to store these videos in a format that allows seeking to specific frames (rather than to times, like ffmpeg). I was planning on just rolling my own format that concatenates all of the individually zlib compressed frames together, and then appends an index on the end that links frame numbers to file byte indices. Before I go about this though, I just wanted to check to make sure I'm not duplicating the functionality of another format/library. Has anyone heard of a format/library that allows lossless compression and random access of videos?
The reason it is hard to seek to a specific frame in most video codecs is that most frames depend on another frame or frames, so frames must be decoded as a group. For this reason, most libraries will only let you seek to the closest I-frame (Intra-frame - independently decodable frame). To actually produce an image from a non-I-frame, data from other frames is required, so you have to decode a number of frames worth of data.
The only ways I have seen this problem solved involve creating an index of some kind on the file. In other words, make a pass through the file and create an index of what frame corresponds to a certain time or section of the file. Since the seeking functions of most libraries are only able to seek to an I frame so you may have to seek to the closest I-frame and then decode from there to the exact frame you want.
If space is not of high importance, I would suggest doing it like you say, but use JPEG compression instead of zlib as it will give you a lot higher compression ratio since it exploits the fact you are dealing with image data.
If space is an issue, P frames (depend on previous frame/frames) can greatly reduce the size of the file. I would not mess with B frames (depend on previous and future frame/frames) since they make it much harder to get things right.
I have solved the problem of seeking to a specific frame in the presence of B and P frames in the past using ffmpeg (libavformat) to demux the video into packets (1 frame's worth of data per packet) and concatenate these into a single file. The important thing is to keep and index into that file so you can find packet bounds for a given frame. If the frame is an I-frame, you can just feed that frame's data into an ffmpeg decoder and it can be decoded. If the frame is a B or P frame, you have to go back to the last I-frame and decode forward from there. This can be quite tricky to get right, especially for B-frames since they are often sent in a different order than how they are displayed.
Some formats allow you to change the number of key frames per second.
For example, I've used ffmpeg to encode to flv at 25 frames per second with 25 key frames per second, and then used a player that was fine in moving to key frames. Basically this allowed me to do frame by frame seeking.
Also the last time I checked quicktime can do frame by frame seek without having to have each frame being a key frame.
May not be applicable to you but that's my thoughts.