media foundation H264 decoder not working properly - c++

I'm creating an application for video conferencing using media foundation and I'm having an issue decoding the H264 video frames I receive over the network.
The Design
Currently my network source queues a token on every request sample, unless there is an available stored sample. If a sample arrives over the network and no token is available the sample is stored in a linked list. Otherwise it is queued with the MEMediaSample event. I also have the decoder set to low latency.
My Issue
When running the topology using my network source I immediately see the first frame rendered to the screen. I then experience a long pause until a live stream begins to play perfectly. After a few seconds the stream appears to pause but then you notice that it's just looping through the same frame over and over again adding in a live frame every couple of seconds that then disappears immediately and goes back to displaying the old loop.
Why is this happening? I'm by no means an expert in H264 or media foundation for that matter but, I've been trying to fix this issue for weeks with no success. I have no idea where the problem might be. Please help me!
The time stamp is created by starting at 0 and adding the duration to it for every new sample. The other data is retrieved from a IMFSampleGrabberSinkCallback.
I've also posted some of my MFTrace onto the msdn media foundation forums Link
I mentioned on there that the presentation clock doesn't seem to change on the trace but, I'm unsure if that's the cause or how to fix it.

EDIT:
Could you share the video and a full mftrace log for this issue? It's not clear for me what really happens: do you see the live video after a while?
The current log does not contain enough information to trace sample processing. From your description is looks like that only keyframes are rendered. Plus, duration is weird for the rendered keyframe:
Sample #00A74970, Time 6733ms, Duration 499ms. <- Duration is not 33ms.
I would like to see what happened to that sample.
In any case, if you are using standard encoder and decoder, the issue should be with your media source, and how it buffers frames. Incorrect circular buffer implementation? You may want to try and cache a second or two of samples before starting giving them to the decoder.

Related

What units for the progress to use for a transcoding converter using ffmpeg - % etc.?

I'm gonna make a converter to h.265 with ffmpeg, based on documentation: http://www.ffmpeg.org/doxygen/trunk/transcoding_8c-example.html
I want to add info about the progress, but I have no idea what number I can use to show that, for example in %.
Please help. :)
What about offering several variants with a choice with an argument?
I think time passed and the estimated time left are more suggestive for than % - for example in order to leave the machine or the window to work and return to check it later.
Also, the current frame rate of the conversion is suggestive, it gives hints eventually for adjusting the bitrate etc. if it's too slow.
So you may measure the time of the encoding so far and try to estimate the frame rate of processing and how much remains.
ffmpeg itself displays current time or current frame from the processed video and the duration of the video.

Convert frames to video on demand

I'm working on a c++ project that generates frames to be converted to a video later.
The project currently dumps all frames as jpg or png files in a folder and then I run ffmpeg manually to generate a mp4 video file.
This project runs on a web server and an ios/android app (under development) will call this web server to have the video generated and downloaded.
The web service is pretty much done and working fine.
I don't like this approach for obvious reasons like a server dependency, cost etc...
I successfully created a POC that exposes the frame generator lib to android and I got it to save the frames in a folder, my next step now is to convert it to video. I considered using any ffmpeg for android/ios lib and just call it when the frames are done.
Although it seems like I fixed half of the problem, I found a new one which is... each frame depending on the configuration could end up having 200kb+ in size, so depending on the amount of frames, it will take a lot of space from the user's device.
I'm sure this will become a huge problem very easily.
So I believe that the ideal solution would be to generate the mp4 file on demand as each frame is created, so in the end there would be no storage space being taken as I woudn't need to save a file for the frame.
The problem is that I don't know how to do that, I don't know much about ffmpeg, I know it's open source but I have no idea how to include a reference to it from the frames generator and generate the video "on demand".
I heard about libav as well but again, same problem...
I would really appreciate any sugestion on how to do it. What I need is basically a way to generate a mp4 video file given a list of frames.
thanks for any help!

DirectShow video stream ends immediately (m_pMediaSample is NULL)

I have a directshow Video renderer redived from CBaseVideoRenderer. The renderer is used in a graph that receives data from a live source (BDA). It looks like the connections are established properly, but the video rendering immediately ends because there is no sample. However, audio Rendering works, ie I can hear the sound while DoRenderSample of my renderer is never called.
Stepping through the code in the debugger, I found out that in CBaseRenderer::StartStreaming, the stream ends immedately, because the member m_pMediaSample is NULL. If I replace my renderer with the EVR renderer, it shows frames, ie the stream is not ending before the first frame for the EVR renderer, but only for my renderer.
Why is that and how can I fix it? I implemented (following the sample from http://www.codeproject.com/Articles/152317/DirectShow-Filters-Development-Part-Video-Render) what I understand as the basic interface (CheckMediaType, SetMediaType and DoRenderSample), so I do not see any possibility to influence what is happening here...
Edit: This is the graph as seen from the ROT:
What I basically try to do is capturing a DVB stream that uses VIDEOINFOHEADER2, which is not supported by the standard sample grabber. Although the channel is a public German TV channel without encryption, could it be that this is a DRM issue?
Edit 2: I have attached my renderer to another source (a Blackmagic Intensity Shuttle). It seams that the source causes the issue, because I get samples in the other graph.
Edit 3: Following Roman's Suggestion, I have created a transform filter. The graph looks like
an has unfortunately the same problem, ie I do not get any sample (Transform is not called).
You supposedly chose wrong path of fetching video frames out of media pipeline. So you are implementing a "network renderer", something that terminates the pipeline to further send data to network.
A renderer which accepts the feed sounds appropriate. Implementing a custom renderer, however, is an untypical task and then there is not so much information around on this. Additionally, a fully featured renderer is typically equipped with sample scheduling part, which end of stream delivery - things relatively easy to break when you customize it through inheriting from base classes. That is, while the approach sounds good, you might want to compare it to another option you have, which is...
A combination of Sample Grabber + Null Renderer, two standard filters, which you can attach your callback to and get frames having the pipeline properly terminated. The problem here is that standard Sample Grabber does not support VIDEOINFOHEADER2. With another video decoder you could possibly have the feed decoded info VIDEOINFOHEADER, which is one option. And then improvement of Sample Grabber itself is another solution: DirectX SDK Extras February 2005 (dxsdk_feb2005_extras.exe) was the SDK which included a filter similar to standard Sample Grabber called Grabber \DirectShow\Samples\C++\DirectShow\Filters\Grabber. It is/was available in source code and provided with a good description text file. It is relatively easy to extend to allow it accept VIDEOINFOHEADER2 and make payload data available to your application this way.
The easiest way to get data out of a DirectShow graph, if youњre not going to use
MultiMedia Streaming, is probably to write your own TransInPlace filter, a sub-variety
of a Transform filter. Then connect this filter to the desired stream of data you wish to
monitor, and then run, pause, seek, or otherwise control the graph. The data, as it passes
through the transform filter, can be manipulated however you want. We call this kind of
filter, a Њsample grabberћ. Microsoft released a limited-functionality sample grabber
with DX8.0. This filter is limited because it doesnњt deal with DV Data or mediatypes
with a format of VideoInfo2. It doesnњt allow the user to receive prerolled samples.
(Whatњs a preroll sample? See the DX8.1 docs) Its ЊOneShotћ mode also has some problems.
To add to this, the Grabber sample is pretty simple itself - perhaps 1000 lines of code all together, including comments.
Looks like your decoder or splitter isn't demuxing the video frames. Look further up the chain to see what filters are supplying your renderer pin with data, chances are its only recognising audio.
Try dropping the file into Graphedit (there's a better one on the web BTW) and see what filters it creates.
Then look at the samples in the DirectShow SDK.

Correcting live IMFMediaSource time stamps

I have two cameras, listed below, that I am trying to use in a Media Foundation topology. Here is a summary of my topology:
Webcam --> MJPG Decoder --> Custom MFT --> H264 Encoder --> MP4 File Sink
The problem with this is that the generated MP4 file has incorrect duration and time scale tags, both for the MP4 container and the H264 stream. I can easily correct this with a tool like MP4Box or YAMB, but my eventual goal is to stream the video.
One potential cause I have identified is that the samples generated by the camera sources do not start at time 0. According to bullet #2 in http://msdn.microsoft.com/en-us/library/windows/desktop/ms700134(v=vs.85).aspx#live_sources, timestamps of a live source should start at 0.
Along this line, I've tried the following to "correct" the sample timestamps:
Re-based the sample time in my custom MFT, using IMFSample::SetSampleTime.
Created a wrapper for the IMFMediaSource and IMFMediaStream objects, which catches and corrects the time stamps associated with the MEMediaSample and MEStreamTick events.
In both of these cases, the media session throws an error 0xC00D4A44 (MF_E_SINK_NO_SAMPLES_PROCESSED), and the resulting MP4 file ends abruptly after the "mdat" atom declaration.
Cameras
Logitech BCC950 ConferenceCam
Thinkpad W520 Integrated Camera
Systems used (both have same issue):
Windows 7 Professional x64
Windows 8 x86
Questions:
Is there some other cause I have overlooked for incorrect video duration/time scale?
If not, is there a correct approach for how to re-base sample timestamps?
Try to reset for every sample flag MFSampleExtension_Discontinuity
pSample->SetUINT32( MFSampleExtension_Discontinuity, FALSE );

Frame accurate synchronizing of subtitle files with MPEG video using DirectShow

This is a problem I have been dealing with for a while, and haven't been able to get a good answer (even from Microsoft). I'm using the generic dump filter to write hardware compressed MPEG files out to disk. In the graph, I also have a SampleGrabber filter that gets called on every frame. From the SampleGrabber callback, I get a subtitle, along with the DirectShow timestamp and write them out to a SAMI (.smi) subtitle file. This all seems to be working, as the SAMI file contains the correct subtitles for every frame. However, I have a few problems:
The first few (usually 3 or 4) DirectShow timestamps are all 0. If I'm getting callbacks from the SampleGrabber, shouldn't these timestamps be incrementing?
When I begin playback, the first timestamp shown is about 10-20 subtitles into the SAMI file. I'd assume the first frame would show the first timestamp in the file.
This is probably related to #2, but the subtitles are not synchronized to the appropriate frames in the file. They can sometimes be up to 40 frames late.
I'm using DirectShow via C++, capturing with a Hauppauge HVR-1800 under Windows XP SP3 (with latest drivers 09/08/2008), and playing back under Media Player Classic 6.4.9.0. Any ideas are welcome.
Are you using getting the incoming IMediaSample's GetTime or GetMediaTime. GetTime is what you want as it respresents the streams presentations time.
Be sure to also check the incoming IMediaSample's isPreRoll function. Preroll samples should be ignored as they will be output again during playback. Another thing I would do is make sure that your sample grabber is as far downstream in your filtergraph as it can be. Preferably after any demuxer's and renderers.
Also see the article on TimeStamps in the DirectShow documentation. It outlines the other caveats of using timestamps.
Of course, even after all of the tips above, there is still no absolute guarantee as to how a particular DirectShow filter is going to behave.