encoding camera with audio source in realtime with WMAsfWriter - jitter problem - c++

I build a DirectShow graph consisting of my video capture filter
(grabbing the screen), default audio input filter both connected
through spliiter to WM Asf Writter output filter and to VMR9 renderer.
This means I want to have realtime audio/video encoding to disk
together with preview. The problem is that no matter what WM profile I
choose (even very low resolution profile) the output video file is
always "jitter" - every few frames there is a delay. The audio is ok -
there is no jitter in audio. The CPU usage is low < 10% so I believe
this is not a problem of lack of CPU resources. I think I'm time-
stamping my frames correctly.
What could be the reason?
Below is a link to recorder video explaining the problem:
http://www.youtube.com/watch?v=b71iK-wG0zU
Thanks
Dominik Tomczak

I have had this problem in the past. Your problem is the volume of data being written to disk. Writing to a faster drive is a great and simple solution to this problem. The other thing I've done is placing a video compressor into the graph. You need to make sure both input streams are using the same reference clock. I have had a lot of problems using this compressor scheme and keeping a good preview. My preview's frame rate dies even if i use an infinite Tee rather than a Smart Tee, the result written to disk was fine though. Its also worth noting that the more of a beast the machine i was running it on was the less of an issue so it may not actually provide much of a win if you need both over sticking a new faster hard disk in the machine.

I don't think this is an issue. The volume of data written is less than 1MB/s (average compression ratio during encoding). I found the reason - when I build the graph without audio input (WM ASF writer has only video input pint) and my video capture pin is connected through Smart Tree to preview pin and to WM ASF writer input video pin then there is no glitch in the output movie. I reckon this is the problem with audio to video synchronization in my graph. The same happens when I build the graph in GraphEdit. Without audio, no glitch. With audio, there is a constant glitch every 1s. I wonder whether I time stamp my frames wrongly bu I think I'm doing it correctly. How is the general solution for audio to video synchronization in DirectShow graphs?

Related

How to detect camera frame loss using Windows media API like Media Foundation or DirectShow?

I am writing an application for Windows that runs a CUDA accelerated HDR algorithm. I've set up an external image signal processor device that presents as a UVC device, and delivers 60 frames per second to the Windows machine over USB 3.0.
Every "even" frame is a more underexposed frame, and every "odd" frame is a more overexposed frame, which allows my CUDA code perform a modified Mertens exposure fusion algorithm to generate a high quality, high dynamic range image.
Very abstract example of Mertens exposure fusion algorithm here
My only problem is that I don't know how to know when I'm missing frames, since the only camera API I have interfaced with on Windows (Media Foundation) doesn't make it obvious that a frame I grab with IMFSourceReader::ReadSample isn't the frame that was received after the last one I grabbed.
Is there any way that I can guarantee that I am not missing frames, or at least easily and reliably detect when I have, using a Windows available API like Media Foundation or DirectShow?
It wouldn't be such a big deal to miss a frame and then have to purposefully "skip" the next frame in order to grab the next oversampled or undersampled frame to pair with the last frame we grabbed, but I would need to know how many frames were actually missed since a frame was last grabbed.
Thanks!
There is IAMDroppedFrames::GetNumDropped method in DirectShow and chances are that it can be retrieved through Media Foundation as well (never tried - they are possibly obtainable with a method similar to this).
The GetNumDropped method retrieves the total number of frames that the filter has dropped since it started streaming.
However I would question its reliability. The reason is that with these both APIs, the attribute which is more or less reliable is a time stamp of a frame. Capture devices can flexibly reduce frame rate for a few reasons, including both external like low light conditions and internal like slow blocking processing downstream in the pipeline. This makes it hard to distinguish between odd and even frames, but time stamp remains accurate and you can apply frame rate math to convert to frame indices.
In your scenario I would however rather detect large gaps in frame times to identify possible gap and continuity loss, and from there run algorithm that compares exposure on next a few consecutive frames to get back to sync with under-/overexposition. Sounds like a more reliable way out.
After all this exposure problem is highly likely to be pretty much specific to the hardware you are using.
Normally MFSampleExtension_Discontinuity is here for this. When you use IMFSourceReader::ReadSample, check this.

Video on SmartEyeGlass

Is it possible to play a simple and short video on smart eye glasses?
I know that it can play audio and it can show images one after the other. It should not be too much work from there I am just guessing.
There is no direct support for video playback, but as Ahmet says, you can approach this with showing Bitmaps as fast as possible.
The playback speed depends on the connection - so it is recommended to use High performance mode - wifi connection to achieve highest frame rate (setPowerMode)
Also take a look at showBitmapWithCallback which provides you a callback right after previous frame gets rendered, so you can show another one.
Yes, it is possible. You can grab frames of the video and display them one after another, as bitmaps.
That should give you a video playback view on the SmartEyeglass.

Best way to load in a video and to grab images using c++

I am looking for a fast way to load in a video file and to create images from them at certain intervals ( every second, every minute, every hour, etc.).
I tried using DirectShow, but it just ran too slow for me to start the video file and move to a certain location to get data and to save it out to an image. Even if I disabled the reference clock. Tried OpenCV, but it has trouble opening the AVI file unless I know the exact codec information. So if I know a way to get the codec information out from OpenCV I may give it another shot. I tried to use FFMPEG, but I don't have as much control over it as well as I would wish.
Any advice would be greatly appreciated. This is being developed on a Windows box since it has to be hosted on a Windows box.
MPEG-4 format is not an intra-coded format, so you can't just jump to a random frame and decode it on its own, as most frames only encode the differences from one or more other frames. I suspect your decoding is slow because when you land on a frame for which several other dependent frames to be decoded first.
One way to improve performance would be to determine which frames are keyframes (or sometimes also called 'sync' points) and limit your decoding to those frames, since these can be decoded on their own.
I'm not very familiar with DirectShow capabilities, but I would expect it has some API to expose sync points.
Also, I should mention that the QuickTime SDK on Windows is possibly another good option that you have for decoding frames from movies. You should first test that your AVI movies are played correctly in the QuickTime Player. And the QT SDK does expose sync points, see the section Finding Interesting Times in the QT SDK documentation.
ffmpeg's libavformat might work for ya...

on-line recording with ffmpeg

Is this possible? Someone tried to do on-line recording of audio and video(of the screen) with ffmpeg? I read everything google can find about ffmpeg in the net. The variant of recording I deed load CPU to 100%, but it still can't convert frames with appr. speed relevant to how fast frames are recording, audio go good, but video lost frames..
Recording audio/video of the screen is possible with ffmpeg. People do this for the purposes of screen casting. Performance of this depends on the hardware in use, the codecs used and various other factors.
See this post (or this one) for some further advice and command line use.
This pretty much depends on the codec used, the frame size/complexity and obviously the capabilities of the computer doing the compression. You can try a low complexity codec like MJPEG, which might improve your experience.

YUV/PCM Visualizer to measure Lip Sync

I have a two dump files of raw video and raw audio from an encoder and I want to be able to measure the "Lip-sync". Imagine a video of a hammer striking an anvil. I want to go frame by frame and see that when the hammer finally hits the anvil, there is a spike in amplitude on the audio track.
Because of the speed that everything happens at, I cannot merely listen to the audio, i need to see the waveform in time domain.
Are there any tools out there that will let me see both the video and audio?
If you are concerned about validating a decoder then generally from a validation perspective the goal is to check Audio and Video PTS values against a common real time clock.
Raw YUV and PCM files do not include timestamps. If you know the frame-rate and sample-rate you can use a raw yuv file viewer (I wrote my own) to figure out the time (from start of file) of a given frame in the video, and a tool like Audacity to figure out the time form start of file to a start of tone in the audio file. this still may not tell you the whole story since tools usually embed a delay between the audio and video in the ts/ps file. Or you can hook up ab OScope and go old school.