DirectShow filter graph using WMASFWriter creates video which is too short - c++

I am attempting to create a DirectShow source filter based on the pushsource example from the DirectShow SDK. This essentially outputs a set of bitmaps to a video. I have set up a filter graph which uses Async_reader with a Wave Parser for audio and my new filter to push the video (the filter is a CSourceStream and I populate my frames in the FillBuffer function). These are both connected to a WMASFWriter to output a WMV.
Each bitmap can last for several seconds so in the FillBuffer function I'm calling SetTime on the passed IMediaSample with a start and end time several seconds apart. This works fine when rendering to the screen but writing to disk results in a file which is too short in duration. It seems like the last bitmap is being ignored when writing a WMV (it is shown as the video ends rather than lasting for the intended duration). This is the case both with my filter and a modified pushsource filter (in which the frame length has been increased).
I've seen additional odd behaviour in that it was not possible to have a video that wasn't a multiple of 10 seconds in length at one point whilst I was trying to make this work. I'm not sure what this was, but I though I'd mention it incase it's relevant.

I think the end time is simply ignored. Normally video samples only have a start time because they are a point in time. If there is movement in the video, the movement is fluent, though the video are just points in time.
I think the solution is simple. Because video stays the same until the next frame is received, you can just add a dummy frame at the end of your video. You can simply repeat the previous frame.

Related

How to detect camera frame loss using Windows media API like Media Foundation or DirectShow?

I am writing an application for Windows that runs a CUDA accelerated HDR algorithm. I've set up an external image signal processor device that presents as a UVC device, and delivers 60 frames per second to the Windows machine over USB 3.0.
Every "even" frame is a more underexposed frame, and every "odd" frame is a more overexposed frame, which allows my CUDA code perform a modified Mertens exposure fusion algorithm to generate a high quality, high dynamic range image.
Very abstract example of Mertens exposure fusion algorithm here
My only problem is that I don't know how to know when I'm missing frames, since the only camera API I have interfaced with on Windows (Media Foundation) doesn't make it obvious that a frame I grab with IMFSourceReader::ReadSample isn't the frame that was received after the last one I grabbed.
Is there any way that I can guarantee that I am not missing frames, or at least easily and reliably detect when I have, using a Windows available API like Media Foundation or DirectShow?
It wouldn't be such a big deal to miss a frame and then have to purposefully "skip" the next frame in order to grab the next oversampled or undersampled frame to pair with the last frame we grabbed, but I would need to know how many frames were actually missed since a frame was last grabbed.
Thanks!
There is IAMDroppedFrames::GetNumDropped method in DirectShow and chances are that it can be retrieved through Media Foundation as well (never tried - they are possibly obtainable with a method similar to this).
The GetNumDropped method retrieves the total number of frames that the filter has dropped since it started streaming.
However I would question its reliability. The reason is that with these both APIs, the attribute which is more or less reliable is a time stamp of a frame. Capture devices can flexibly reduce frame rate for a few reasons, including both external like low light conditions and internal like slow blocking processing downstream in the pipeline. This makes it hard to distinguish between odd and even frames, but time stamp remains accurate and you can apply frame rate math to convert to frame indices.
In your scenario I would however rather detect large gaps in frame times to identify possible gap and continuity loss, and from there run algorithm that compares exposure on next a few consecutive frames to get back to sync with under-/overexposition. Sounds like a more reliable way out.
After all this exposure problem is highly likely to be pretty much specific to the hardware you are using.
Normally MFSampleExtension_Discontinuity is here for this. When you use IMFSourceReader::ReadSample, check this.

How to insert a key frame(Iframe) to a h.264 video stream in ffmpeg C++ api?

I have a real time video stream, and want to cut some video clips from it by accurate timestamp(pts).
When I receiver an avpacket, I decode it, and do something and cache the avpacket. I don't want to re-encode all avpackets, it cost cpu resource.
There are many gop structure in H.264 stream, usually we should cut the video begin at the key frame, and end at the key frame. Otherwise the front some frames in the video clip would display error.
Now I use av_write_frame to make avpacket to video. But sometimes the length of gop is very long, such as it could be 250, 8.3s(30 frame per second). It means the distance between two I-frame could be 250 frames. The video clip is short, I don't want to add too many unused frames.
How should I do? I think i should insert a i-frame at the start position of video clip. Could I change a p-frame to i-frame?
Thanks your reading!
This is not possible in the generic case, but may be in specific cases. Even then, there are no open source/free tools to do this, and I am unaware of any commercial tools. The reason I say it is not possible in the generic case is each frame can reference up to 16 other frames. So you can not just replace a single frame, You will need to replace all referenced frames. Doing this will likely take almost as much CPU as encoding the whole GOP.

Capturing an AVI video with DirectShow

I'm trying to capture an AVI video, using DirectShow AVIMux and FileWriter Filters.
When I connect SampleGrabber filter instead of the AVIMux, I can clearly see that the stream is 30 fps, however upon capturing the video, each frame is duplicated 4 time and I get a 120 frames instead of 30. The movie is 4 times slower than it should be and only the first frame in the set of 4 is a Key Frame.
I tried the same experiment with 8 fps and for each image I received, I had 15 frames in the video. And in case of 15 fps, I got each frame 8 times.
I tried both writing the code in C++ and testing it with Graph Edit Plus.
Is there any way I can control it? Maybe some restrictions on the AVIMux filter?
You don't specify your capture format which could have some bearing on the problem, but generally it sounds like the graph when writing to file has some bottleneck which prevents the stream from continuing to flow at 30fps. The camera is attempting to produce frames at 30fps, and it will do so as long as buffers are recycled for it to fill.
But here the buffers aren't available because the file writer is busy getting them onto the disk. The capture filter is starved and in this situation it increments the "dropped frame" counter which travels with each captured frame. AVIMux uses this count to insert an indicator into the AVI file which says in effect "a frame should have been available here to write to file, but isn't; at playback time repeat the last frame". So the file should have placeholders for 30 frames per second - some filled with actual frames, and some "dropped frames".
Also, you don't mention whether you're muxing in audio, which would be acting as a reference clock for the graph to maintain audio-video sync. When capture completes if also using an audio stream, AVIMux alters the framerate of the video stream to make the duration of the two streams equal. You can check whether AVIMux has altered the framerate of the video stream by dumping the AVI file header (or maybe right click on the file in explorer and look at the properties).
If I had to hazard a guess as to the root of the problem, I'd wager the capture driver has a bug in calculating the dropped frame count which is in turn messing up AVIMux. Does this happen with a different camera?

Using Async_reader and Wave Parser in DirectShow filter graph results in video seeking issues

Some background:
I am attempting to create a DirectShow source filter based on the pushsource example from the DirectShow SDK. This essentially outputs a set of bitmaps, each of which can last for a long time (for example 30 seconds), to a video. I have set up a filter graph which uses Async_reader with a Wave Parser for audio and my new filter to push the video (the filter is a CSourceStream and I populate my frames in the FillBuffer function). These are both connected to a WMASFWriter to output a WMV.
The problem:
When I attempt to seek through the resulting video, I have to wait until a bitmap's start time occurs before it is displayed. For example, if I'm currently seeing bitmap 4 and skip back to the time which bitmap 2 is displayed the video output will not change until the third bitmap starts. Initially I wondered if I wasn't allowing FillBuffer to be called enough (as at the moment it's only once per bitmap) however I have since noted that when the audio track is very short (just a second long perhaps), I can seek through the video as expected. Is there a another way I should be introducing audio into the filter graph? Do I need to perform some kind of indexing when the WMV has been rendered? I'm at a bit of a loss...
You may need to do indexing as a post-processing step. Try indexing it with Windows Media File Editor from Windows Media Encoder SDK and see if this improves seeking.
Reducing key frame interval in the encoder profile may improve seeking. This can be done in Windows Media Profile Editor from the SDK. Note that this will cause file size increase.

C/C++ library for seekable movie format

I'm doing some processing on some very large video files (often up to 16MP), and I need a way to store these videos in a format that allows seeking to specific frames (rather than to times, like ffmpeg). I was planning on just rolling my own format that concatenates all of the individually zlib compressed frames together, and then appends an index on the end that links frame numbers to file byte indices. Before I go about this though, I just wanted to check to make sure I'm not duplicating the functionality of another format/library. Has anyone heard of a format/library that allows lossless compression and random access of videos?
The reason it is hard to seek to a specific frame in most video codecs is that most frames depend on another frame or frames, so frames must be decoded as a group. For this reason, most libraries will only let you seek to the closest I-frame (Intra-frame - independently decodable frame). To actually produce an image from a non-I-frame, data from other frames is required, so you have to decode a number of frames worth of data.
The only ways I have seen this problem solved involve creating an index of some kind on the file. In other words, make a pass through the file and create an index of what frame corresponds to a certain time or section of the file. Since the seeking functions of most libraries are only able to seek to an I frame so you may have to seek to the closest I-frame and then decode from there to the exact frame you want.
If space is not of high importance, I would suggest doing it like you say, but use JPEG compression instead of zlib as it will give you a lot higher compression ratio since it exploits the fact you are dealing with image data.
If space is an issue, P frames (depend on previous frame/frames) can greatly reduce the size of the file. I would not mess with B frames (depend on previous and future frame/frames) since they make it much harder to get things right.
I have solved the problem of seeking to a specific frame in the presence of B and P frames in the past using ffmpeg (libavformat) to demux the video into packets (1 frame's worth of data per packet) and concatenate these into a single file. The important thing is to keep and index into that file so you can find packet bounds for a given frame. If the frame is an I-frame, you can just feed that frame's data into an ffmpeg decoder and it can be decoded. If the frame is a B or P frame, you have to go back to the last I-frame and decode forward from there. This can be quite tricky to get right, especially for B-frames since they are often sent in a different order than how they are displayed.
Some formats allow you to change the number of key frames per second.
For example, I've used ffmpeg to encode to flv at 25 frames per second with 25 key frames per second, and then used a player that was fine in moving to key frames. Basically this allowed me to do frame by frame seeking.
Also the last time I checked quicktime can do frame by frame seek without having to have each frame being a key frame.
May not be applicable to you but that's my thoughts.