Video Recording Hangs on IMFSinkWriter->Finalize

Video Recording Hangs on IMFSinkWriter->Finalize - c++

I’ve implemented custom IMFMediaSink for use with sink writer. Works OK, receives h264 video samples. I don’t have any container, I’m consuming raw h264 video samples. I have not implemented custom writer, I'm using MFCreateSinkWriterFromMediaSink API to wrap my custom media sink into a framework-provided writer.
I’m unable to implement graceful shutdown, IMFSinkWriter::Finalize() never returns. When I implemented IMFSinkWriterCallback, IMFSinkWriter::Finalize() returns immediately but my IMFSinkWriterCallback::OnFinalize was never called.
The problem reproduces in 100% tests with both nvenc and MS software encoder.
Writer attributes:
MF_LOW_LATENCY = TRUE
MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS = TRUE (1)
MF_READWRITE_DISABLE_CONVERTERS = FALSE (2)
MF_SINK_WRITER_DISABLE_THROTTLING = TRUE
MF_SINK_WRITER_D3D_MANAGER
MF_SINK_WRITER_ASYNC_CALLBACK
(1) Tried both, same result
(2) Need the converters because nvenc only supports YUV and I have RGB textures on input.
Output media type (it’s fixed, I’m using the built-in handler created by MFCreateSimpleTypeHandler API).
MF_MT_MAJOR_TYPE = MFMediaType_Video
MF_MT_SUBTYPE = MFVideoFormat_H264
MF_MT_INTERLACE_MODE = MFVideoInterlace_Progressive
MF_MT_AVG_BITRATE = 40*1000*1000
MF_MT_FRAME_SIZE = { 3840, 2160 }
MF_MT_FRAME_RATE = { 60, 1 }
MF_MT_PIXEL_ASPECT_RATIO = { 1, 1 }
Input media type:
MF_MT_MAJOR_TYPE = MFMediaType_Video
MF_MT_SUBTYPE = MFVideoFormat_RGB32
MF_MT_INTERLACE_MODE = MFVideoInterlace_Progressive
MF_MT_FRAME_SIZE = { 3840, 2160 }
MF_MT_FRAME_RATE = { 60, 1 }
MF_MT_PIXEL_ASPECT_RATIO = { 1, 1 }
When not using IMFSinkWriterCallback, here’s a call stack at the time of hang:
ntdll.dll!_NtWaitForSingleObject#12 ()
KernelBase.dll!WaitForSingleObjectEx()
mfreadwrite.dll!CMFSinkWriter::InternalFinalize(void)
mfreadwrite.dll!CMFSinkWriter::Finalize(void)
MFTrace doesn’t have anything related to finalize even with -k All:
13700,3C60 19:01:25.79566 CMFTransformDetours::ProcessOutput #02EA6E3C failed hr=0xC00D6D72 MF_E_TRANSFORM_NEED_MORE_INPUT
13700,2A98 19:01:25.80250 CMFTransformDetours::ProcessOutput #1A6CEF38 Stream ID 0, Sample #1C244F30, Time 1216ms, Duration 16ms, Buffers 1, Size 12441600B, MFSampleExtension_CleanPoint=1;MFSampleExtension_Interlaced=0
13700,2098 19:01:25.80254 CMFTransformDetours::ProcessInput #02EA6E3C Stream ID 0, Sample #1C244F30, Time 1216ms, Duration 16ms, Buffers 1, Size 12441600B, MFSampleExtension_CleanPoint=1;MFSampleExtension_Interlaced=0
13700,2A98 19:01:25.80256 CMFTransformDetours::ProcessOutput #1A6CEF38 failed hr=0xC00D6D72 MF_E_TRANSFORM_NEED_MORE_INPUT
13700,2A98 19:01:25.80266 CMFTransformDetours::ProcessMessage #1A6CEF38 Message type=0x00000001 MFT_MESSAGE_COMMAND_DRAIN, param=00000000
13700,2A98 19:01:25.80267 CMFTransformDetours::ProcessOutput #1A6CEF38 failed hr=0xC00D6D72 MF_E_TRANSFORM_NEED_MORE_INPUT
13700,2098 19:01:25.81669 CMFTransformDetours::ProcessOutput #02EA6E3C Stream ID 0, Sample #1FB68CF8, Time 1216ms, Duration 16ms, Buffers 1, Size 680B, {2B5D5457-5547-4F07-B8C8-B4A3A9A1DAAC}=1;{73A954D4-09E2-4861-BEFC-94BD97C08E6E}=12166667 (0,12166667);{9154733F-E1BD-41BF-81D3-FCD918F71332}=65535;{973704E6-CD14-483C-8F20-C9FC0928BAD5}=1;MFSampleExtension_CleanPoint=0;{B2EFE478-F979-4C66-B95E-EE2B82C82F36}=16 (0,16)
13700,82C 19:01:25.81674 CMFTransformDetours::ProcessOutput #02EA6E3C failed hr=0xC00D6D72 MF_E_TRANSFORM_NEED_MORE_INPUT
13700,82C 19:01:25.81674 CMFTransformDetours::ProcessMessage #02EA6E3C Message type=0x00000001 MFT_MESSAGE_COMMAND_DRAIN, param=00000000
13700,82C 19:01:25.81674 CMFTransformDetours::ProcessOutput #02EA6E3C failed hr=0xC00D6D72 MF_E_TRANSFORM_NEED_MORE_INPUT
13700,1F54 19:01:27.24237 CKernel32ExportDetours::OutputDebugStringA # D3D11 WARNING: Process is terminating. Using simple reporting. Please call ReportLiveObjects() at runtime for standard reporting. [ STATE_CREATION WARNING #0: UNKNOWN]
13700,1F54 19:01:27.24255 CKernel32ExportDetours::OutputDebugStringA # D3D11 WARNING: Live Producer at 0x0311D91C, Refcount: 13. [ STATE_CREATION WARNING #0: UNKNOWN]
Warnings about live D3D resources are expected as I terminated the process after the hang.
Any ideas what’s going on? I think the writer probably waits for these SPS/PPS magic blobs to arrive but it never happens. Is there a way to instruct the h264 encoder to output SPS/PPS somewhere?

You’ve implemented custom IMFMediaSink, so i suppose you 've also implemented IMFStreamSink.
Doing this in the usual way with Mediafoundation, you have circular COM reference between IMFMediaSink and IMFStreamSink. That's why the Shutdown method from IMFMediaSink interface exists.
If a program that uses your custom MediaSink does not call Shutdown at the right place, there will be memory leaks.
From your IMFSinkWriterCallback problem, we don't have enough information to find where the problem is.
Also it is not clear about "custom IMFMediaSink" and "IMFSinkWriter". Are you also implementing a IMFSinkWriter...
EDIT1
Just two things :
MFCreateSinkWriterFromMediaSink
Call CoInitialize(Ex) and MFStartup before calling this function.
When you are done using the media sink, call the media sink's IMFMediaSink::Shutdown method. (The sink writer does not shut down the media sink.) Release the sink writer before calling Shutdown on the media sink.
do you release interfaces correctly ?
IMFSinkWriter::Finalize
Internally, this method calls IMFStreamSink::PlaceMarker to place end-of-segment markers for each stream on the media sink.
Do you handle this message (MFSTREAMSINK_MARKER_ENDOFSEGMENT) ?
We don't know how you handle CriticalSection/Event/CircularReference, so it's hard to found the problem.
EDIT2
Is there a way to instruct the h264 encoder to output SPS/PPS somewhere?
Normally, for h264 video format, you need to get the attributes MF_MT_MPEG_SEQUENCE_HEADER (BLOB type) when SetCurrentMediaType is called on your IMFStreamSink (assuming you implements IMFMediaTypeHandler).
EDIT3
Could you provide the real one (this is how i think the app should be) :
I don't remember if your custom sink creates a mp4 file. If it is, in the IMFSinkWriter::Finalize, you have to generate ftpy/moov atom.
EDIT4
Also you can read this : Video Recording Hangs on IMFSinkWriter->Finalize();
With no source code, this is the only answer i can give.

Related

Ffmpeg video output is 0 seconds with correct filesize when uploading to google cloud bucket

I've made a C++ program that lives in gke and takes some videos as input using ffmpeg, then does something with that input using opengl(not relevant), then finally encodes those edited videos as a single output. Normally the program works perfectly fine on my local machine, it encodes just as I want it to with no warnings or valgrind errors whatsoever. Then, after encoding the said video, I want my program to upload that video to the google cloud storage. This is where the problem comes, I have tried 2 methods for this: First, I tried using curl to upload to the cloud using a signed url. Second, I tried mounting the google storage using gcsfuse(I was already mounting the bucket to access the inputs in question). Both of those methods yielded undefined, weird behaviour's ranging from: Outputing a 0byte or 44byte file, (This is the most common one:) encoding in the correct file size ~500mb but the video is 0 seconds long, outputing a 0.4 second video or just encoding the desired output normally (really rare).
From the logs I can't see anything unusual, everything seems to work fine and ffmpeg does not give any errors or warnings, so does valgrind. Everything seems to work normally, even when I use curl to upload the video to the cloud the output is perfectly fine when it first encodes it (before sending it with curl) but the video gets messed up when curl uploads it to the cloud.
I'm using the muxing.c example of ffmpeg to encode my video with the only difference being:
void video_encoder::fill_yuv_image(AVFrame *frame, struct SwsContext *sws_context) {
const int in_linesize[1] = { 4 * width };
//uint8_t* dest[4] = { rgb_data, NULL, NULL, NULL };
sws_context = sws_getContext(
width, height, AV_PIX_FMT_RGBA,
width, height, AV_PIX_FMT_YUV420P,
SWS_BICUBIC, 0, 0, 0);
sws_scale(sws_context, (const uint8_t * const *)&rgb_data, in_linesize, 0,
height, frame->data, frame->linesize);
}
rgb_data is the data I got after editing the inputs. Again, this works fine and I don't think there are any errors here.
I'm not sure where the error is and since the code is huge I can't provide a replicable example. I'm just looking for someone to point me to the right direction.
Running the cloud's output in mplayer wields this result (This is when the video is the right size but is 0 seconds long, the most common one.):
MPlayer 1.4 (Debian), built with gcc-11 (C) 2000-2019 MPlayer Team
do_connect: could not connect to socket
connect: No such file or directory
Failed to open LIRC support. You will not be able to use your remote control.
Playing /media/c36c2633-d4ee-4d37-825f-88ae54b86100.
libavformat version 58.76.100 (external)
libavformat file format detected.
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7f2cba1168e0]moov atom not found
LAVF_header: av_open_input_stream() failed
libavformat file format detected.
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7f2cba1168e0]moov atom not found
LAVF_header: av_open_input_stream() failed
RAWDV file format detected.
VIDEO: [DVSD] 720x480 24bpp 29.970 fps 0.0 kbps ( 0.0 kbyte/s)
X11 error: BadMatch (invalid parameter attributes)
Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared object file: No such file or directory
[vdpau] Error when calling vdp_device_create_x11: 1
==========================================================================
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
libavcodec version 58.134.100 (external)
[dvvideo # 0x7f2cb987a380]Requested frame threading with a custom get_buffer2() implementation which is not marked as thread safe. This is not supported anymore, make your callback thread-safe.
Selected video codec: [ffdv] vfm: ffmpeg (FFmpeg DV)
==========================================================================
Load subtitles in /media/
==========================================================================
Opening audio decoder: [libdv] Raw DV Audio Decoder
Unknown/missing audio format -> no sound
ADecoder init failed :(
Opening audio decoder: [ffmpeg] FFmpeg/libavcodec audio decoders
[dvaudio # 0x7f2cb987a380]Decoder requires channel count but channels not set
Could not open codec.
ADecoder init failed :(
ADecoder init failed :(
Cannot find codec for audio format 0x56444152.
Audio: no sound
Starting playback...
[dvvideo # 0x7f2cb987a380]could not find dv frame profile
Error while decoding frame!
[dvvideo # 0x7f2cb987a380]could not find dv frame profile
Error while decoding frame!
V: 0.0 2/ 2 ??% ??% ??,?% 0 0
Exiting... (End of file)
Edit: Since the code runs on a VM, I'm using xvfb-run ro start my application, but again even when using xvfb-run it works completely fine on when not encoding to the cloud.

Apparently, I'm assuming for security reasons, the google cloud storage does not allow us to do multiple continuous operations on a file, just a singular read/write operation. So I found a workaround by encoding my video to a local file inside the pod and then doing a copy operation to the cloud.

Use Source Reader to get H264 samples from webcam source

When using the Source Reader I can use it to get decoded YUV samples using an mp4 file source (example code).
How can I do the opposite with a webcam source? Use the Source Reader to provide encoded H264 samples? My webcam supports RGB24 and I420 pixel formats and I can get H264 samples if I manually wire up the H264 MFT transform. But it seems as is the Source Reader should be able to take care of the transform for me. I get an error whenever I attempt to set MF_MT_SUBTYPE of MFVideoFormat_H264 on the Source Reader.
Sample snippet is shown below and the full example is here.
// Get the first available webcam.
CHECK_HR(MFCreateAttributes(&videoConfig, 1), "Error creating video configuration.");
// Request video capture devices.
CHECK_HR(videoConfig->SetGUID(
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE,
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUID), "Error initialising video configuration object.");
CHECK_HR(videoConfig->SetGUID(MF_MT_SUBTYPE, WMMEDIASUBTYPE_I420),
"Failed to set video sub type to I420.");
CHECK_HR(MFEnumDeviceSources(videoConfig, &videoDevices, &videoDeviceCount), "Error enumerating video devices.");
CHECK_HR(videoDevices[WEBCAM_DEVICE_INDEX]->GetAllocatedString(MF_DEVSOURCE_ATTRIBUTE_FRIENDLY_NAME, &webcamFriendlyName, &nameLength),
"Error retrieving video device friendly name.\n");
wprintf(L"First available webcam: %s\n", webcamFriendlyName);
CHECK_HR(videoDevices[WEBCAM_DEVICE_INDEX]->ActivateObject(IID_PPV_ARGS(&pVideoSource)),
"Error activating video device.");
CHECK_HR(MFCreateAttributes(&pAttributes, 1),
"Failed to create attributes.");
// Adding this attribute creates a video source reader that will handle
// colour conversion and avoid the need to manually convert between RGB24 and RGB32 etc.
CHECK_HR(pAttributes->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, 1),
"Failed to set enable video processing attribute.");
CHECK_HR(pAttributes->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video), "Failed to set major video type.");
// Create a source reader.
CHECK_HR(MFCreateSourceReaderFromMediaSource(
pVideoSource,
pAttributes,
&pVideoReader), "Error creating video source reader.");
MFCreateMediaType(&pSrcOutMediaType);
CHECK_HR(pSrcOutMediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video), "Failed to set major video type.");
CHECK_HR(pSrcOutMediaType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264), "Error setting video sub type.");
CHECK_HR(pSrcOutMediaType->SetUINT32(MF_MT_AVG_BITRATE, 240000), "Error setting average bit rate.");
CHECK_HR(pSrcOutMediaType->SetUINT32(MF_MT_INTERLACE_MODE, 2), "Error setting interlace mode.");
CHECK_HR(pVideoReader->SetCurrentMediaType((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, NULL, pSrcOutMediaType),
"Failed to set media type on source reader.");
CHECK_HR(pVideoReader->GetCurrentMediaType((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, &pFirstOutputType),
"Error retrieving current media type from first video stream.");
std::cout << "Source reader output media type: " << GetMediaTypeDescription(pFirstOutputType) << std::endl << std::endl;
Output:
bind returned success
First available webcam: Logitech QuickCam Pro 9000
Failed to set media type on source reader. Error: C00D5212.
finished.

Source Reader does not look like suitable API here. It is API to implement "half of pipeline" which includes necessary decoding but not encoding. The other half is Sink Writer API which is capable to handle encoding, and which can encode H.264.
Or your another option, unless you are developing a UWP project, is Media Session API which implements a pipeline end to end.
Even though technically (in theory) you could have an encoding MFT as a part of Source Reader pipeline, Source Reader API itself is insufficiently flexible to add encoding style tansforms based on requested media types.
So, one solution could be to have Source Reader to read with necessary decoding (such as up to having RGB32 or NV12 video frames), then Sink Writer to manage encoding with respectively appropriate media sink on its end (or Sample Grabber as media sink). Another solution is to put Media Foundation primitives into Media Session pipeline which can manage both decoding and encoding parts, connected together.

Now, your use case is clearer.
For me, your MFWebCamRtp is the best optimized way of doing : WebCam Source Reader -> Encoding -> RTP Streaming.
But you are experiencing presentation clock issues, synchronization issues, or unsynchronized audio video issues. Am I right ?
So you tried Sample Grabber Sink, and now Source Reader, like I suggested to you. Of course, you can think that a Media Session will be able to do it better.
I think so, but extra work will be needed.
Here is what I would do in your case :
Code a custom RTP Sink
Create a topology with webcam source, h264 encoder, your custom RTP Sink
Add your topology to a MediaSession
Use the MediaSession to play the process
If you want a networkstream sink sample, see this : MFSkJpegHttpStreamer
This is old, but it's a good start. This program also uses winsock, like your.
You should be aware that RTP protocol uses UDP. A very good way to have synchronization issues... Definitely your main problem, as I guess.
What I think. You are trying to compensate for the weaknesses of the RTP protocol (UDP), with a management of the audio / video synchronization of MediaFoundation. I think you will just fail with this approach.
I think your main problem is RTP protocol.
EDIT
No I'm not having synchronisation issues. The Source Reader and Sample Grabber both provide correct timestamps which I can use in the RTP header. Likewise no problems with RTP/UDP etc. that's the bit I do know about. My questions are originating from a desire to understand the most efficient (least amount of plumbing code) and flexible solution. And yes it does look like a custom sink writer is the optimal solution.
Again things are clearer. If you need help with a custom RTP sink, I'll be there.

How can i properly configure ASF media sink in Media Foundation

Here is how i'm trying to configure the ASF media sink:
// Create media type
ComPtr<IMFMediaType> videoOutputType;
Try(MFCreateMediaType(&videoOutputType));
Try(MFSetAttributeSize(videoOutputType.Get(), MF_MT_FRAME_SIZE, 400, 300));
Try(videoOutputType->SetUINT32(MF_MT_AVG_BITRATE, 626000));
Try(videoOutputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video));
Try(videoOutputType->SetUINT32(MF_MT_VIDEO_ROTATION, 0));
Try(MFSetAttributeRatio(videoOutputType.Get(), MF_MT_FRAME_RATE, 30000, 1001));
Try(MFSetAttributeRatio(videoOutputType.Get(), MF_MT_PIXEL_ASPECT_RATIO, 1, 1));
Try(videoOutputType->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive));
Try(videoOutputType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_WMV3));
// Create profile
ComPtr<IMFASFProfile> asfProfile;
Try(MFCreateASFProfile(&asfProfile));
ComPtr<IMFASFStreamConfig> streamConfig;
Try(asfProfile->CreateStream(videoOutputType.Get(), &streamConfig));
Try(streamConfig->SetStreamNumber(0));
Try(asfProfile->SetStream(streamConfig.Get()));
// Create media sink
ComPtr<IMFMediaSink> asfMediaSink;
ComPtr<IMFByteStream> outputByteStream(new NetworkOutputByteStream(stream));
Try(MFCreateASFStreamingMediaSink(outputByteStream.Get(), &asfMediaSink));
// Set content info
ComPtr<IMFASFContentInfo> asfContentInfo;
Try(asfMediaSink.As(&asfContentInfo));
Try(asfContentInfo->SetProfile(asfProfile.Get()));
// Create sink writer
Try(MFCreateSinkWriterFromMediaSink(asfMediaSink.Get(), NULL, &this->sinkWriter));
But the method SetProfile is returning the following error: E_INVALIDARG One or more arguments are invalid. So I assume that i am configuring it in a bad way. How can I do it right? I'm not sure how to use ASF media sink because i can't find any good samples about it.

I can say that in your code there are at least two big mistakes:
1. you index stream from 0:
streamConfig->SetStreamNumber(0)
It is a mistake - in Tutorial: 1-Pass Windows Media Encoding it is written that:
if (wStreamNumber < 1 || wStreamNumber > 127 )
{
return MF_E_INVALIDSTREAMNUMBER;
}
In ASF there are max 128 streams, but stream with index 0 is reserved for format needs. You must use index more that 0.
You try create media type by filling attributes - it does not good idea - firstly, you do not know all attributes, which are needed by MediaSink; secondly, you try create MediaType for Windows Video encoder - originally it is a DMO encoder which is changed for Media Foundation - it needs add special codec private data for MediaType via MF_MT_USER_DATA, Configuring a WMV Encoder - it means that MediaSink will try find such type data for Windows Media codec, but will not find it.
These are two mistakes, which are markable for me - I think that you should research tutorials on MSDN.
Regards.