IMFTransfomer::ProcessInput() and MF_E_TRANSFORM_NEED_MORE_INPUT - c++

I have code that decodes AAC-encoded audio using IMFTransform. It works well for various test inputs. But I observed that in some cases IMFTransform::ProcessOutput() returns MF_E_TRANSFORM_NEED_MORE_INPUT when according to my reading of MS documentation it should return a valid data sample.
Basically the code has the following structure:
IMFTransform* transformer;
MFT_OUTPUT_DATA_BUFFER output_data_buffer;
...
bool try_to_get_output = false;
for (;;) {
if (try_to_get_output) {
// Try to get the outpu sample.
try_to_get_output = false;
output_data_buffer.dwStatus = 0;
...
hr = transformer->ProcessOutput(...&output_data_buffer);
if (success) {
// process sample
if (output_data_buffer.dwStatus & MFT_OUTPUT_DATA_BUFFER_INCOMPLETE) {
// We have more data
try_to_get_output = true;
}
} else if (hr == MF_E_TRANSFORM_NEED_MORE_INPUT) {
Log("Unnecessary ProcessOutput()");
} else {
// Process other errors
}
continue;
}
// Send more encoded AAC data to MFT.
hr->ProcessInput();
}
What happens is that ProcessOutput() sets MFT_OUTPUT_DATA_BUFFER_INCOMPLETE in MFT_OUTPUT_DATA_BUFFER.dwStatus but then the following ProcessOutput() always returns MF_E_TRANSFORM_NEED_MORE_INPUT contradicting the documentation.
Again, so far it seems harmless and things works. But then what exactly does AAC decoder want to tell the caller via setting MFT_OUTPUT_DATA_BUFFER_INCOMPLETE?

This might be a small glitch in the decoder implementation. Quite possible that if you happen to drain the MFT it would spit out some data, so the incompletion flag migth indicate, a bit confusingly, some data even though not immediately accessible.
However overall the idea is to do ProcessOutput sucking the output data for as long as possible until yщu get MF_E_TRANSFORM_NEED_MORE_INPUT, and then proceed with feeding new input (or draining). That is, I would say MF_E_TRANSFORM_NEED_MORE_INPUT is much more important compared to MFT_OUTPUT_DATA_BUFFER_INCOMPLETE. After all this is what Microsoft's own code over MFTs does.
Also keep in mind that AAC decoder is an "old", "first generation" MFT and so over years its update could be such that it diverted a bit from the current docs.

Related

C++/C FFmpeg artifact build up across video frames

Context:
I am building a recorder for capturing video and audio in separate threads (using Boost thread groups) using FFmpeg 2.8.6 on Ubuntu 16.04. I followed the demuxing_decoding example here: https://www.ffmpeg.org/doxygen/2.8/demuxing_decoding_8c-example.html
Video capture specifics:
I am reading H264 off a Logitech C920 webcam and writing the video to a raw file. The issue I notice with the video is that there seems to be a build-up of artifacts across frames until a particular frame resets. Here is my frame grabbing, and decoding functions:
// Used for injecting decoding functions for different media types, allowing
// for a generic decode loop
typedef std::function<int(AVPacket*, int*, int)> PacketDecoder;
/**
* Decodes a video packet.
* If the decoding operation is successful, returns the number of bytes decoded,
* else returns the result of the decoding process from ffmpeg
*/
int decode_video_packet(AVPacket *packet,
int *got_frame,
int cached){
int ret = 0;
int decoded = packet->size;
*got_frame = 0;
//Decode video frame
ret = avcodec_decode_video2(video_decode_context,
video_frame, got_frame, packet);
if (ret < 0) {
//FFmpeg users should use av_err2str
char errbuf[128];
av_strerror(ret, errbuf, sizeof(errbuf));
std::cerr << "Error decoding video frame " << errbuf << std::endl;
decoded = ret;
} else {
if (*got_frame) {
video_frame->pts = av_frame_get_best_effort_timestamp(video_frame);
//Write to log file
AVRational *time_base = &video_decode_context->time_base;
log_frame(video_frame, time_base,
video_frame->coded_picture_number, video_log_stream);
#if( DEBUG )
std::cout << "Video frame " << ( cached ? "(cached)" : "" )
<< " coded:" << video_frame->coded_picture_number
<< " pts:" << pts << std::endl;
#endif
/*Copy decoded frame to destination buffer:
*This is required since rawvideo expects non aligned data*/
av_image_copy(video_dest_attr.video_destination_data,
video_dest_attr.video_destination_linesize,
(const uint8_t **)(video_frame->data),
video_frame->linesize,
video_decode_context->pix_fmt,
video_decode_context->width,
video_decode_context->height);
//Write to rawvideo file
fwrite(video_dest_attr.video_destination_data[0],
1,
video_dest_attr.video_destination_bufsize,
video_out_file);
//Unref the refcounted frame
av_frame_unref(video_frame);
}
}
return decoded;
}
/**
* Grabs frames in a loop and decodes them using the specified decoding function
*/
int process_frames(AVFormatContext *context,
PacketDecoder packet_decoder) {
int ret = 0;
int got_frame;
AVPacket packet;
//Initialize packet, set data to NULL, let the demuxer fill it
av_init_packet(&packet);
packet.data = NULL;
packet.size = 0;
// read frames from the file
for (;;) {
ret = av_read_frame(context, &packet);
if (ret < 0) {
if (ret == AVERROR(EAGAIN)) {
continue;
} else {
break;
}
}
//Convert timing fields to the decoder timebase
unsigned int stream_index = packet.stream_index;
av_packet_rescale_ts(&packet,
context->streams[stream_index]->time_base,
context->streams[stream_index]->codec->time_base);
AVPacket orig_packet = packet;
do {
ret = packet_decoder(&packet, &got_frame, 0);
if (ret < 0) {
break;
}
packet.data += ret;
packet.size -= ret;
} while (packet.size > 0);
av_free_packet(&orig_packet);
if(stop_recording == true) {
break;
}
}
//Flush cached frames
std::cout << "Flushing frames" << std::endl;
packet.data = NULL;
packet.size = 0;
do {
packet_decoder(&packet, &got_frame, 1);
} while (got_frame);
av_log(0, AV_LOG_INFO, "Done processing frames\n");
return ret;
}
Questions:
How do I go about debugging the underlying issue?
Is it possible that running the decoding code in a thread other than the one in which the decoding context was opened is causing the problem?
Am I doing something wrong in the decoding code?
Things I have tried/found:
I found this thread that is about the same problem here: FFMPEG decoding artifacts between keyframes
(I cannot post samples of my corrupted frames due to privacy issues, but the image linked to in that question depicts the same issue I have)
However, the answer to the question is posted by the OP without specific details about how the issue was fixed. The OP only mentions that he wasn't 'preserving the packets correctly', but nothing about what was wrong or how to fix it. I do not have enough reputation to post a comment seeking clarification.
I was initially passing the packet into the decoding function by value, but switched to passing by pointer on the off chance that the packet freeing was being done incorrectly.
I found another question about debugging decoding issues, but couldn't find anything conclusive: How is video decoding corruption debugged?
I'd appreciate any insight. Thanks a lot!
[EDIT] In response to Ronald's answer, I am adding a little more information that wouldn't fit in a comment:
I am only calling decode_video_packet() from the thread processing video frames; the other thread processing audio frames calls a similar decode_audio_packet() function. So only one thread calls the function. I should mention that I have set the thread_count in the decoding context to 1, failing which I would get a segfault in malloc.c while flushing the cached frames.
I can see this being a problem if the process_frames and the frame decoder function were run on separate threads, which is not the case. Is there a specific reason why it would matter if the freeing is done within the function, or after it returns? I believe the freeing function is passed a copy of the original packet because multiple decode calls would be required for audio packet in case the decoder doesnt decode the entire audio packet.
A general problem is that the corruption does not occur all the time. I can debug better if it is deterministic. Otherwise, I can't even say if a solution works or not.
A few things to check:
are you running multiple threads that are calling decode_video_packet()? If you are: don't do that! FFmpeg has built-in support for multi-threaded decoding, and you should let FFmpeg do threading internally and transparently.
you are calling av_free_packet() right after calling the frame decoder function, but at that point it may not yet have had a chance to copy the contents. You should probably let decode_video_packet() free the packet instead, after calling avcodec_decode_video2().
General debugging advice:
run it without any threading and see if that works;
if it does, and with threading it fails, use thread debuggers such as tsan or helgrind to help in finding race conditions that point to your code.
it can also help to know whether the output you're getting is reproduceable (this suggests a non-threading-related bug in your code) or changes from one run to the other (this suggests a race condition in your code).
And yes, the periodic clean-ups are because of keyframes.

Windows MFT (Media Foundation Transform) decoder not returning proper sample time or duration

To decode a H264 stream with the Windows Media foundation Transform, the work flow is currently something like this:
IMFSample sample;
sample->SetTime(time_in_ns);
sample->SetDuration(duration_in_ns);
sample->AddBuffer(buffer);
// Feed IMFSample to decoder
mDecoder->ProcessInput(0, sample, 0);
// Get output from decoder.
/* create outputsample that will receive content */ { ... }
MFT_OUTPUT_DATA_BUFFER output = {0};
output.pSample = outputsample;
DWORD status = 0;
HRESULT hr = mDecoder->ProcessOutput(0, 1, &output, &status);
DWORD status = 0;
hr = mDecoder->ProcessOutput(0, 1, &output, &status);
if (output.pEvents) {
// We must release this, as per the IMFTransform::ProcessOutput()
// MSDN documentation.
output.pEvents->Release();
output.pEvents = nullptr;
}
if (hr == MF_E_TRANSFORM_STREAM_CHANGE) {
// Type change, probably geometric aperture change.
// Reconfigure decoder output type, so that GetOutputMediaType()
} else if (hr == MF_E_TRANSFORM_NEED_MORE_INPUT) {
// Not enough input to produce output.
} else if (!output.pSample) {
return S_OK;
} else }
// Process output
}
}
When we have fed all data to the MFT decoder, we must drain it:
mDecoder->ProcessMessage(MFT_MESSAGE_COMMAND_DRAIN, 0);
Now, one thing with the WMF H264 decoder, is that it will typically not output anything before having been called with over 30 compressed h264 frames regardless of the size of the h264 sliding window. Latency is very high...
I'm encountering an issue that is very troublesome.
With a video made only of keyframes, and which has only 15 frames, each being 2s long, the first frame having a presentation time of non-zero (this stream is from live content, so first frame is typically in epos time)
So without draining the decoder, nothing will come out of the decoder as it hasn't received enough frames.
However, once the decoder is drained, the decoded frame will come out. HOWEVER, the MFT decoder has set all durations to 33.6ms only and the presentation time of the first sample coming out is always 0.
The original duration and presentation time have been lost.
If you provide over 30 frames to the h264 decoder, then both duration and pts are valid...
I haven't yet found a way to get the WMF decoder to output samples with the proper value.
It appears that if you have to drain the decoder before it has output any samples by itself, then it's totally broken...
Has anyone experienced such problems? How did you get around it?
Thank you in advance
Edit: a sample of the video is available on http://people.mozilla.org/~jyavenard/mediatest/fragmented/1301869.mp4
Playing this video with Firefox will causes it to play extremely quickly due to the problems described above.
I'm not sure that your work flow is correct. I think you should do something like this:
do
{
...
hr = mDecoder->ProcessInput(0, sample, 0);
if(FAILED(hr))
break;
...
hr = mDecoder->ProcessOutput(0, 1, &output, &status);
if(FAILED(hr) && hr != MF_E_TRANSFORM_NEED_MORE_INPUT)
break;
}
while(hr == MF_E_TRANSFORM_NEED_MORE_INPUT);
if(SUCCEEDED(hr))
{
// You have a valid decoded frame here
}
The idea is to keep calling ProcessInput/ProcessOuptut while ProcessOutput returns MF_E_TRANSFORM_NEED_MORE_INPUT. MF_E_TRANSFORM_NEED_MORE_INPUT means that decoder needs more input. I think that with this loop you won't need to drain the decoder.

How to edit SIM800l library to ensure that a call is established

I use SIM800l to make calls with arduino UNO with AT commands. By using this library I make calls with gprsTest.callUp(number) function. The problem is that it returns true even the number is wrong or there is no credit.
It is clear on this part code from GPRS_Shield_Arduino.cpp library why it is happening. It doesnt check the return of ATDnumberhere;
bool GPRS::callUp(char *number)
{
//char cmd[24];
if(!sim900_check_with_cmd("AT+COLP=1\r\n","OK\r\n",CMD)) {
return false;
}
delay(1000);
//HACERR quitar SPRINTF para ahorar memoria ???
//sprintf(cmd,"ATD%s;\r\n", number);
//sim900_send_cmd(cmd);
sim900_send_cmd("ATD");
sim900_send_cmd(number);
sim900_send_cmd(";\r\n");
return true;
}
The return of ATDnumberhere; on software serial communication is:
If number is wrong
ERROR
If there is no credit
`MO CONNECTED //instant response
+COLP: "003069XXXXXXXX",129,"",0,"" // after 3 sec
OK`
If it is calling and no answer
MO RING //instant response, it is ringing
NO ANSWER // after some sec
If it is calling and hang up
MO RING //instant response
NO CARRIER // after some sec
If the receiver has not carrier
ATD6985952400;
NO CARRIER
If it is calling , answer and hang up
MO RING
MO CONNECTED
+COLP: "69XXXXXXXX",129,"",0,""
OK
NO CARRIER
The question is how to use different returns for every case by this function gprsTest.callUp(number) , or at least how to return true if it is ringing ?
This library code seems better than the worst I have seen at first glance, but it still have some issues. The most severe is its Final result code handling.
The sim900_check_with_cmd function is conceptually almost there, however only checking for OK is in no way acceptable. It should check for every single possible final result code the modem might send.
From your output examples you have the following final result codes
OK
ERROR
NO CARRIER
NO ANSWER
but there exists a few more as well. You can look at the code for atinout for an example of a is_final_result_code function (you can also compare to isFinalResponseError and isFinalResponseSuccess1 in ST-Ericsson's U300 RIL).
The unconditional return true; at the end of GPRS::callUp is an error, but it might be deliberate due to lack of ideas for implementing a better API so that the calling client could check the intermediate result codes. But that is such a wrong way to do it.
The library really should do all the stateful command line invocation and final result code parsing with no exceptions. Just doing parts of that in the library and leaving some of it up to the client is just bad design.
When clients want to inspect or act on intermediate result codes or information text that comes between the command line and the final result code, the correct way to do it is to let the library "deframe" everything it receives from the modem into individual complete lines, and for everything that is not a final result code provide this to the client through a callback function.
The following is from an unfinished update to my atinout program:
bool send_commandline(
const char *cmdline,
const char *prefix,
void (*handler)(const char *response_line, void *ptr),
void *ptr,
FILE *modem)
{
int res;
char response_line[1024];
DEBUG(DEBUG_MODEM_WRITE, ">%s\n", cmdline);
res = fputs(cmdline, modem);
if (res < 0) {
error(ERR "failed to send '%s' to modem (res = %d)", cmdline, res);
return false;
}
/*
* Adding a tiny delay here to avoid losing input data which
* sometimes happens when immediately jumping into reading
* responses from the modem.
*/
sleep_milliseconds(200);
do {
const char *line;
line = fgets(response_line, (int)sizeof(response_line), modem);
if (line == NULL) {
error(ERR "EOF from modem");
return false;
}
DEBUG(DEBUG_MODEM_READ, "<%s\n", line);
if (prefix[0] == '\0') {
handler(response_line, ptr);
} else if (STARTS_WITH(response_line, prefix)) {
handler(response_line + strlen(prefix) + strlen(" "), ptr);
}
} while (! is_final_result(response_line));
return strcmp(response_line, "OK\r\n") == 0;
}
You can use that as a basis for implementing proper handling. If you want to
get error responses out of the function, add an additional callback argument and change to
success = strcmp(response_line, "OK\r\n") == 0;
if (!success) {
error_handler(response_line, ptr);
}
return success;
Tip: Read all of chapter 5 in the V.250 specification, it will teach you almost everything you need to know about command lines, result codes and response handling. Like for instance that a command line should also be terminated with \r only, not \r\n-
1 Note that CONNECT is not a final result code, it is an intermediate result code, so the name isFinalResponseSuccess is strictly speaking not 100% correct.

How to read YUV8 data from avi file?

I have avi file that contains uncompressed gray video data. I need to extract frames from it. The size of file is 22 Gb.
How do i do that?
I have already tried ffmpeg, but it gives me "could not find codec parameters for video stream" message - because there is no codec at work, just frames.
Since Opencv just uses ffmpeg to read video, that rules out opencv as well.
The only path that seems to be left is to try and dig into the raw data, but i do not know how.
Edit: this is the code i use to read from the file with opencv. The failure occurs inside the second if. Running ffmpeg binary on the file also fails with the message above (could not find codec aprameters etc)
/* register all formats and codecs */
av_register_all();
/* open input file, and allocate format context */
if (avformat_open_input(&fmt_ctx, src_filename, NULL, NULL) < 0) {
fprintf(stderr, "Could not open source file %s\n", src_filename);
ret = 1;
goto end;
}
fmt_ctx->seek2any = true;
/* retrieve stream information */
int res = avformat_find_stream_info(fmt_ctx, NULL);
if (res < 0) {
fprintf(stderr, "Could not find stream information\n");
ret = 1;
goto end;
}
Edit:
Here is sample code i have tried to make the extraction: pastebin. The result i get is an unchanging buffer after every call to AVIStreamRead.
If you do not need cross platform functionality Video for Windows (VFW) API is a good alternative (http://msdn.microsoft.com/en-us/library/windows/desktop/dd756808(v=vs.85).aspx), i will not put an entire code block, since there's quite much to do, but you should be able to figure it out from the reference link. Basically, you do a AVIFileOpen, then get the video stream via AVIFileGetStream with streamtypeVIDEO, or alternatively do it at once with AVIStreamOpenFromFile and then read samples from the stream with AVIStreamRead. If you get to a point where you fail I can try to help, but it should be pretty straightforward.
Also, not sure why ffmpeg is failing, I have been doing raw AVI reading with ffmpeg without any codecs involved, can you post what call to ffpeg actually fails?
EDIT:
For the issue that you are seeing when the read data size is 0. The AVI file has N slots for frames in each second where N is the fps of the video. In real life the samples won't come exactly at that speed (e.g. IP surveillance cameras) so the actual data sample indexes can be non continuous like 1,5,11,... and VFW would insert empty samples between them (that is from where you read a sample with a zero size). What you have to do is call AVIStreamRead with NULL as buffer and 0 as size until the bRead is not 0 or you run past last sample. When you get an actual size, then you can again call AVIStreamRead on that sample index with the buffer pointer and size. I usually do compressed video so i don't use the suggested size, but at least according to your code snipplet I would do something like this:
...
bRead = 0;
do
{
aviOpRes = AVIStreamRead(ppavi,smpS,1,NULL,0,&bRead,&smpN);
} while (bRead == 0 && ++smpS < si.dwLength + si.dwStart);
if(smpS >= si.dwLength + si.dwStart)
break;
PUCHAR tempBuffer = new UCHAR[bRead];
aviOpRes = AVIStreamRead(ppavi,smpS,1,tempBuffer,bRead,&bRead,&smpN);
/* do whatever you need */
delete tempBuffer;
...
EDIT 2:
Since this may come in handy to someone or yourself to make a choice between VFW and FFMPEG I also updated your FFMPEG example so that it parsed the same file (sorry for the code quality since it lacks error checking but i guess you can see the logical flow):
/* register all formats and codecs */
av_register_all();
AVFormatContext* fmt_ctx = NULL;
/* open input file, and allocate format context */
const char *src_filename = "E:\\Output.avi";
if (avformat_open_input(&fmt_ctx, src_filename, NULL, NULL) < 0) {
fprintf(stderr, "Could not open source file %s\n", src_filename);
abort();
}
/* retrieve stream information */
int res = avformat_find_stream_info(fmt_ctx, NULL);
if (res < 0) {
fprintf(stderr, "Could not find stream information\n");
abort();
}
int video_stream_index = 0; /* video stream is usualy 0 but still better to lookup in case it's not present */
for(; video_stream_index < fmt_ctx->nb_streams; ++video_stream_index)
{
if(fmt_ctx->streams[video_stream_index]->codec->codec_type == AVMEDIA_TYPE_VIDEO)
break;
}
if(video_stream_index == fmt_ctx->nb_streams)
abort();
AVPacket *packet = new AVPacket;
while(av_read_frame(fmt_ctx, packet) == 0)
{
if (packet->stream_index == video_stream_index)
printf("Sample nr %d\n", packet->pts);
av_free_packet(packet);
}
Basically you open the context and read packets from it. You will get both audio and video packets so you should check if the packet belongs to the stream of interest. FFMPEG will save you the trouble with empty frames and give only those samples that have data in them.

Detect end of Video with IMediaSeeking

I am playing a video to get some screens using DirectShow.
I am doing this in a loop by calling IMediaControl->Run, IVMRWindowlessControl->GetCurrentImage and then a IMediaSeeking->SetPositions.
The problem is that I cannot detect when the video is over. IMediaSeeking->SetPositions returns always same value (S_FALSE). IMediaControl->Runalso returns always S_FALSE. I have also tried IMediaEvent->GetEvent after the call to IMediaControl->Run to check for EC_COMPLETE but instead returns (always) EC_CLOCK_CHANGED.
How can I detect the end of video ? Thanks
UPDATE: Doing something like
long eventCode = 0;
LONG_PTR ptrParam1 = 0;
LONG_PTR ptrParam2 = 0;
long timeoutMs = INFINITE;
while (SUCCEEDED(pEvent->GetEvent(&eventCode, &ptrParam1, &ptrParam1, timeoutMs)))
{
if (eventCode == EC_COMPLETE)
{
break;
}
// Free the event data.
hr = pEvent->FreeEventParams(eventCode, ptrParam1, ptrParam1);
if (FAILED(hr))
{
break;
}
}
blocks after few events: 0x53 (EC_VMR_RENDERDEVICE_SET), 0x0D (EC_CLOCK_CHANGED), 0x0E (EC_PAUSED), next call to GetEvent is blocking and the video is rendered (played frame by frame) in my IVideoWindow
You should be doing IMediaEvent->GetEvent, however note you will be receiving various events, not only EC_CLOCK_CHANGED. Keep receiving and you are to get EC_COMPLETE. Step 6: Handle Graph Events on MSDN explains this in detail.
Check the state of the filter graph with IMediaControl::GetState and see if it is stopped. You can also get the duration of the video from IMediaSeeking::GetDuration that you may find helpful.
Another option is to use event signaling. This event processing can be off-threaded.