DirectShow video playing too fast when audio pin rendering data - c++

I'm working on a custom Windows DirectShow source filter based on CSource and CSourceStream for each pin. There are two pins - video output and audio output. Both pins work fine when individually rendered in graphedit and similar tools such as Graph Studio with correct time stamps, frame rates and sound. I'm rendering the video to the Video Mixing Renderer (VMR7 or VMR9).
However when I render both pins the video plays back too fast while the audio still sounds correct. The video plays back approximately 50% too fast but I think this is limited by the speed of decoding.
The timestamps on the samples are the same in both cases. If I render the audio stream to a null renderer (the one in qedit.dll) then the video stream plays back at the correct frame rate. The filter is a 32 bit filter running on a Win7 x64 system.
When I added support for IMediaSeeking seeking I found that the seeking bar for the audio stream behaved quite bizarrely. However the problem happens without IMediaSeeking support.
Any suggestions for what could be causing this or suggestions for further investigation?
The output types from the audio and video pin are pasted below:
Mediatyp: Video Subtype: RGB24 Format: Type VideoInfo Video Size: 1024 x 576 Pixel, 24 Bit Image Size: 1769472 Bytes Compression: RGB Source: width 0, height 0 Target: width 0, height 0 Bitrate: 0 bits/sec. Errorrate: 0 bits/sec. Avg. display time: 41708 µsec.
Mediatyp: Video Subtype: RGB32 Format: Type VideoInfo Video Size: 1024 x 576 Pixel, 32 Bit Image Size: 2359296 Bytes Compression: RGB Source: width 0, height 0 Target: width 0, height 0 Bitrate: 0 bits/sec. Errorrate: 0 bits/sec. Avg. display time: 41708 µsec.
Majortyp: Audio
Subtype: PCM audio
Sample Size: 3
Type WaveFormatEx
Wave Format: Unknown
Channels: 1
Samples/sec.: 48000
Avg. bytes/sec.:144000
Block align: 3
Bits/sample: 24

I realised the problem straight after posting the question. A case of debugging by framing the question correctly.
The audio stream had completely bogus time stamps. The audio and video streams played back fine individually but did not synch at all with each other when played together.

Related

How calculate MPEG-1, 2 frame duration

I am trying to make a .mp4 file from the audio and video rtp packets of an IP camera.
When the audio format in the camera is configured as MPEG-2 ADTS i receive rtp packets with a payload size of 144 bytes, but these are made of two audio frames 72 bytes each.
However, when i configure the format as MPEG-1 the payload is made of only one audio frame.
What is the reason for this difference? I could get this info from some bits of the payload? as i do for the bitrate, samplerate, etc. I have read that the theoric packet size is 144 bytes, so how could i retrieve the frame size and number of frames in the package?
Besides, in order to calculate the theoric frame duration i am using the next formula:
time = 1/bitrate * framesize (in bytes) * 8
This is working well in the case of MPEG-2 with different combinations of bitrate and samplerate. However, it does not seem to work for the MPEG-1. Am i doing something wrong here?

Set Kinect v2 frame rates (rgb, depth, skeleton) to 25 fps

I'm working on a project which requires all three streams from the Kinect v2 sensor (RGB, Depth, and Skeleton) to be captured, processed and streamed at 25 fps.
My program works with default settings and all three streams seem to be operating at 30fps. Is there a method to reduce this to 25 fps?.
I'm working on a C++ environment.
The KINECT SDK offers no way of setting the frame rate. Moreover, the RGB frame rate is not fixed and may drop down to 15 fps (from 30 fps) in low light conditions. Your approach of adding delay does not change the native frame rate of the devices. All you are doing is selectively dropping some frames based on timing. If regular spacing between frame capture times is important to your application you should be implementing an interpolation method instead.

openh264 Repeating frame

I'm using the openh264 lib, in a c++ program, to convert a set of images into a h264 encoded mp4 file. These images represent updates to the screen during a session recording.
Lets say a set contains 2 images, one initial screen grab of the desktop and another one, 30 seconds later, when the clock changes.
Is there a way for the stream to represent a 30 seconds long video using only theses 2 images?
Right now, I'm brute forcing this by encoding multiple times the first frame to fill the gap. It there a more efficient and/or faster way of doing this.
Of course. Set a frame rate of 1/30 fps and you end up with 1 frame every 30 seconds. It doesn't even have to be in the H.264 stream - it can be done also when it gets muxed into an mp4 file afterwards for example.

FFmpeg: How to estimate number of samples in audio stream?

I'm currently writing a small application that's making use of the FFmpeg library in order to decode audio files (especially avformat and swresample) in C++.
Now I need the total number of samples in an audio stream. I know that the exact number can only be found out by actually decoding all the frames, I just need an estimation. What is the preferred method here? How can I find out the duration of a file?
There's some good info in this question about how to get info out of ffmpeg: FFMPEG Can't Display The Duration Of a Video.
To work out the number of samples in an audio stream, you need three basic bits of info:
The duration (in seconds)
The sample rate (in samples per second)
The number of channels in the stream (e.g. 2 for stereo)
Once you have that info, the total number of samples in your stream is simply [duration] * [rate] * [channels].
Note that this is not equivalent to bytes, as the samples are likely to be at least 16 bit, and possibly 24.
I believe what you need is the formula that is AUDIORATE / FRAMERATE. For instance, if ar=48000, and frame rate of video is let's say 50fps then 48000/50 = 960 samples per frame you need.
Buffer calculation comes later as samples_per_frame * nChannels * (audiobit/8).
AudioBit is usually 16bit (24 or 32bits also possible). So for 8 channels audio at 16bit 48Khz, you'll need 960 * 8 * 2 = 15360 bytes per audio frame.
Offical way to do this last calculation is to use :
av_samples_get_buffer_size(NULL, nChannels, SamplesPerFrame, audio_st->codec->sample_fmt, 0)
function.
av_samples_get_buffer_size(NULL, 8, 960, audio_st->codec->sample_fmt, 0)
will return also 15360 (For experts: yes I'm assuming format is pcm_s16le).
So this answers first part of your question. Hope that helps.

Writing 4:2:0 YUV-Rawdata into an AVI-File via DirectShow in C++

I'm trying to write some 4:2:0 rawdata received from a capture card into an AVI-File. For every pixel the char buffer contains 2 Bytes (16 Bit). The order of the data is the same as FOURCC UYVY: YUV 4:2:2 (Y sample at every pixel, U and V sampled at every second pixel horizontally on each line). A macropixel contains 2 pixels in 1 u_int32.
First I tried the OpenCV Videowriter. But this is simply too slow for this huge amount of video data (I'm capturing 2 video streams, each is 1080p25 format), so I switched to the "Video for Windows"-Library by Windows. But even this one does't proceed the file writing in real time. My last chance is Directshow. I want to use the AVI Mux and the File Writer Filters to store my raw data as an AVI-File, but I'm not shure how to "give" the AVI Mux my raw data (char array) which contains just video data in UYVY-order and no audio. Maybe you can give me an advice. This is what I've got until now:
CoInitialize(NULL);
IGraphBuilder*pGraph= NULL;
CoCreateInstance(CLSID_FilterGraph, NULL,CLSCTX_INPROC_SERVER,IID_IGraphBuilder, (void **)&pGraph);
IMediaControl*pMediaControl= NULL;
pGraph->QueryInterface(IID_IMediaControl,(void **)&pMediaControl);
ICaptureGraphBuilder2 *pCapture= NULL;
CoCreateInstance(CLSID_CaptureGraphBuilder2, NULL,CLSCTX_INPROC,IID_ICaptureGraphBuilder2, (void **)&pCapture);
IBaseFilter *pMux;
pCapture->SetOutputFileName(&MEDIASUBTYPE_Avi,L"Test.avi",&pMux,NULL);
IBaseFilter *pCap;
pCapture->RenderStream(&PIN_CATEGORY_CAPTURE, &MEDIATYPE_Video,pCap,NULL,pMux);
Thx a lot and Regards,
Valentin
(As you mentioned 10 fps in previous question, which I assume to be effective frame rate) Are you writing dual 1920x1080 12 bits per pixel 10 fps into a file? This is 60 megabytes per second, you might be just hitting your HDD writing capacity limit.
Choosing different API is not going to help if your HDD is not powerful enough. You need to either compress data, or lower resolution or FPS. Or use faster drives.