C++ FFMPEG not writing AVCC box information - c++

I'm trying to encode raw H264 into an mp4 container using the FFMPEG API in C++. It all works fine, however the AVCC box is empty, and it returns the error:
[iso file] Box "avcC" size 8 invalid
If I then use the command line tool on the output file:
ffmpeg -i output.mp4 -vcodec copy fixed.mp4
The output file works and AVCC is populated with the required information. I'm at a loss as to why this command line argument works but I'm unable to produce the same result using the API.
What I do in the C++ code (also do things in between the function calls):
outputFormat_ = av_guess_format( "mp4", NULL, NULL ); //AV_CODEC_H264
formatContext_ = avformat_alloc_context();
formatContext_->oformat = outputFormat_;
...
AVDictionary *opts = NULL;
char tmpstr[50]; sprintf(tmpstr, "%i", muxRate * KILOBYTESTOBYTES);
av_dict_set(&opts, "muxrate", tmpstr, 0);
avformat_write_header( formatContext_, &opts);
av_write_trailer(formatContext_);
The output of this is correct, except it's missing the AVCC information. Adding this is manually (and fixing the box lengths accordingly) lets me playback the video fine. Any idea why the API calls are not generating the AVCC info?
For reference, here's the chars from the mp4 before the fix:
.avc1.........................€.8.H...H..........................................ÿÿ....avcC....stts
and after:
avc1.........................€.8.H...H..........................................ÿÿ...!avcC.B€(ÿá..gB€(Ú.à.—•...hÎ<€....stts

I had the problem with empty AVCC boxes with my MP4 files too. It turned out I was setting CODEC_FLAG_GLOBAL_HEADER flag on the AVCodecContext instance after calling avcodec_open2. The flag should be set before calling avcodec_open2.

Solved it. The data that was required was the SPS and PPS components of the AVCC codec. As the raw H264 stream was in annex b format, this was present at the start of every I-frame, in the NAL units starting 0x00 0x00 0x00 0x01 0x67 and 0x00 0x00 0x00 0x01 0x68. So what was needed was to copy that information into the AVStream codec's extradata field:
codecContext = stream->codec;
...
// videoSeqHeader contains the PPS and SPS NAL unit data
codecContext->extradata = (uint8_t*)malloc( sizeof(uint8_t) * videoSeqHeader_.size() );
for( unsigned int index = 0; index < videoSeqHeader_.size(); index++ )
{
codecContext->extradata[index] = videoSeqHeader_[index];
}
codecContext->extradata_size = (int)videoSeqHeader_.size();
This resulted in the AVCC box being correctly populated.

Related

Media Foundation set video interlacing and decode

I have an MOV file and I want to decode it and have all frames as separate images.
So I try to configure an uncompressed media type in the following way:
// configure the source reader
IMFSourceReader* m_pReader;
MFCreateSourceReaderFromURL(filePath, NULL, &m_pReader);
// get the compressed media type
IMFMediaType* pFileVideoMediaType;
m_pReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, &pFileVideoMediaType);
// create new media type for uncompressed type
IMFMediaType* pTypeUncomp;
MFCreateMediaType(&pTypeUncomp);
// copy all settings from compressed to uncompressed type
pFileVideoMediaType->CopyAllItems(pTypeUncomp);
// set the uncompressed video attributes
pTypeUncomp->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB8);
pTypeUncomp->SetUINT32(MF_MT_ALL_SAMPLES_INDEPENDENT, TRUE);
pTypeUncomp->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive);
// set the new uncompressed type to source reader
m_pReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, 0, pTypeUncomp);
// get the full uncompressed media type
m_pReader->GetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, &pTypeUncomp);
I noticed that even I explicitly set the MF_MT_INTERLACE_MODE to MFVideoInterlace_Progressive the final configuration is still configured with the old mode MFVideoInterlace_MixedInterlaceOrProgressive.
Afterwards, I loop through all samples and look at their size:
IMFSample* videoSample = nullptr;
IMFMediaBuffer* mbuffer = nullptr;
LONGLONG llTimeStamp;
DWORD streamIndex, flags;
m_pReader->ReadSample(
MF_SOURCE_READER_FIRST_VIDEO_STREAM,
0, // Flags.
&streamIndex, // Receives the actual stream index.
&flags, // Receives status flags.
&llTimeStamp, // Receives the time stamp.
&videoSample) // Receives the sample or NULL.
videoSample->ConvertToContiguousBuffer(&mbuffer);
BYTE* videoData = nullptr;
DWORD sampleBufferLength = 0;
mbuffer->Lock(&videoData, nullptr, &sampleBufferLength);
cout << sampleBufferLength << endl;
And I get quite different sizes for the samples: from 31bytes to 18000bytes.
Even changing the format to MFVideoFormat_RGB32 does not change affect the sample sizes.
This question seems to have the same issue but the solution is not fixing it.
Any help on why I can't change the interlacing and how to properly decode video frames and get image data out of samples?
Many thanks in advance.
In order to make SourceReader convert the samples to RGB you need to create it like this:
IMFAttributes* pAttr = NULL;
MFCreateAttributes(&pAttr, 1);
pAttr->SetUINT32(MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS, TRUE);
pAttr->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, TRUE);
IMFSourceReader* m_pReader;
throwIfFailed(MFCreateSourceReaderFromURL(filePath, pAttr, &m_pReader), Can't create source reader from url");
pAttr->Release();
Later, you shouldn't break from the cycle when MF_SOURCE_READERF_CURRENTMEDIATYPECHANGED occurs. Now you'll have all samples with the same size.
Otherwise you can use MFVideoFormat_NV12 subtype and then you won't need to specify MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING attribute when creating the reader.

AVCodecContext::channel_layout 0 for WAV files

I have been successfully loading compressed audio files using FFmpeg and querying their channel_layouts using some code I've written:
AVFormatContext* fmtCxt = nullptr;
avformat_open_input( &fmtCxt, "###/440_sine.wav", nullptr, nullptr );
avformat_find_stream_info( fmtCxt, nullptr );
av_find_best_stream( fmtCxt, AVMEDIA_TYPE_AUDIO, -1, -1, nullptr, 0 );
AVCodecContext* codecCxt = fmtCxt->streams[ret]->codec;
AVCodec* codec = avcodec_find_decoder( codecCxt->codec_id );
avcodec_open2( codecCxt, codec, nullptr );
std::cout << "Channel Layout: " << codecCxt->channel_layout << std::endl;
av_dump_format( fmtCxt, 0, "###/440_sine.wav", 0 );
I've removed all error checking for brevity. However for Microsoft WAV files (mono or stereo) the AVCodecContext::channel_layout member is always 0 - despite ffprobe and av_dump_format(..) both returning valid information:
Input #0, wav, from '###/440_sine.wav':
Duration: 00:00:00.01, bitrate: 740 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels, s16, 705 kb/s
Also codecCxt->channels returns the correct value. Using a flac file (with exactly the same audio data generated from the same application), gives a channel_layout of 0x4 (AV_CH_FRONT_CENTER).
Your WAV file uses FFmpeg's pcm_s16le codec, which have no information on channel layout. You can only have the number of channels. A lot of explanations can be found here
You have the correct channel_layout with the flac file because FFmpeg's flac codec fills this field. You can find the correspondence table on libavcodec/flac.c file, the flac_channel_layouts array.
If you need to fill channel_layout manually, you can call:
codecCxt->channel_layout = av_get_default_channel_layout( codecCxt->channels );

How to read YUV8 data from avi file?

I have avi file that contains uncompressed gray video data. I need to extract frames from it. The size of file is 22 Gb.
How do i do that?
I have already tried ffmpeg, but it gives me "could not find codec parameters for video stream" message - because there is no codec at work, just frames.
Since Opencv just uses ffmpeg to read video, that rules out opencv as well.
The only path that seems to be left is to try and dig into the raw data, but i do not know how.
Edit: this is the code i use to read from the file with opencv. The failure occurs inside the second if. Running ffmpeg binary on the file also fails with the message above (could not find codec aprameters etc)
/* register all formats and codecs */
av_register_all();
/* open input file, and allocate format context */
if (avformat_open_input(&fmt_ctx, src_filename, NULL, NULL) < 0) {
fprintf(stderr, "Could not open source file %s\n", src_filename);
ret = 1;
goto end;
}
fmt_ctx->seek2any = true;
/* retrieve stream information */
int res = avformat_find_stream_info(fmt_ctx, NULL);
if (res < 0) {
fprintf(stderr, "Could not find stream information\n");
ret = 1;
goto end;
}
Edit:
Here is sample code i have tried to make the extraction: pastebin. The result i get is an unchanging buffer after every call to AVIStreamRead.
If you do not need cross platform functionality Video for Windows (VFW) API is a good alternative (http://msdn.microsoft.com/en-us/library/windows/desktop/dd756808(v=vs.85).aspx), i will not put an entire code block, since there's quite much to do, but you should be able to figure it out from the reference link. Basically, you do a AVIFileOpen, then get the video stream via AVIFileGetStream with streamtypeVIDEO, or alternatively do it at once with AVIStreamOpenFromFile and then read samples from the stream with AVIStreamRead. If you get to a point where you fail I can try to help, but it should be pretty straightforward.
Also, not sure why ffmpeg is failing, I have been doing raw AVI reading with ffmpeg without any codecs involved, can you post what call to ffpeg actually fails?
EDIT:
For the issue that you are seeing when the read data size is 0. The AVI file has N slots for frames in each second where N is the fps of the video. In real life the samples won't come exactly at that speed (e.g. IP surveillance cameras) so the actual data sample indexes can be non continuous like 1,5,11,... and VFW would insert empty samples between them (that is from where you read a sample with a zero size). What you have to do is call AVIStreamRead with NULL as buffer and 0 as size until the bRead is not 0 or you run past last sample. When you get an actual size, then you can again call AVIStreamRead on that sample index with the buffer pointer and size. I usually do compressed video so i don't use the suggested size, but at least according to your code snipplet I would do something like this:
...
bRead = 0;
do
{
aviOpRes = AVIStreamRead(ppavi,smpS,1,NULL,0,&bRead,&smpN);
} while (bRead == 0 && ++smpS < si.dwLength + si.dwStart);
if(smpS >= si.dwLength + si.dwStart)
break;
PUCHAR tempBuffer = new UCHAR[bRead];
aviOpRes = AVIStreamRead(ppavi,smpS,1,tempBuffer,bRead,&bRead,&smpN);
/* do whatever you need */
delete tempBuffer;
...
EDIT 2:
Since this may come in handy to someone or yourself to make a choice between VFW and FFMPEG I also updated your FFMPEG example so that it parsed the same file (sorry for the code quality since it lacks error checking but i guess you can see the logical flow):
/* register all formats and codecs */
av_register_all();
AVFormatContext* fmt_ctx = NULL;
/* open input file, and allocate format context */
const char *src_filename = "E:\\Output.avi";
if (avformat_open_input(&fmt_ctx, src_filename, NULL, NULL) < 0) {
fprintf(stderr, "Could not open source file %s\n", src_filename);
abort();
}
/* retrieve stream information */
int res = avformat_find_stream_info(fmt_ctx, NULL);
if (res < 0) {
fprintf(stderr, "Could not find stream information\n");
abort();
}
int video_stream_index = 0; /* video stream is usualy 0 but still better to lookup in case it's not present */
for(; video_stream_index < fmt_ctx->nb_streams; ++video_stream_index)
{
if(fmt_ctx->streams[video_stream_index]->codec->codec_type == AVMEDIA_TYPE_VIDEO)
break;
}
if(video_stream_index == fmt_ctx->nb_streams)
abort();
AVPacket *packet = new AVPacket;
while(av_read_frame(fmt_ctx, packet) == 0)
{
if (packet->stream_index == video_stream_index)
printf("Sample nr %d\n", packet->pts);
av_free_packet(packet);
}
Basically you open the context and read packets from it. You will get both audio and video packets so you should check if the packet belongs to the stream of interest. FFMPEG will save you the trouble with empty frames and give only those samples that have data in them.

streaming H.264 over RTP with libavformat

I've been trying over the past week to implement H.264 streaming over RTP, using x264 as an encoder and libavformat to pack and send the stream. Problem is, as far as I can tell it's not working correctly.
Right now I'm just encoding random data (x264_picture_alloc) and extracting NAL frames from libx264. This is fairly simple:
x264_picture_t pic_out;
x264_nal_t* nals;
int num_nals;
int frame_size = x264_encoder_encode(this->encoder, &nals, &num_nals, this->pic_in, &pic_out);
if (frame_size <= 0)
{
return frame_size;
}
// push NALs into the queue
for (int i = 0; i < num_nals; i++)
{
// create a NAL storage unit
NAL nal;
nal.size = nals[i].i_payload;
nal.payload = new uint8_t[nal.size];
memcpy(nal.payload, nals[i].p_payload, nal.size);
// push the storage into the NAL queue
{
// lock and push the NAL to the queue
boost::mutex::scoped_lock lock(this->nal_lock);
this->nal_queue.push(nal);
}
}
nal_queue is used for safely passing frames over to a Streamer class which will then send the frames out. Right now it's not threaded, as I'm just testing to try to get this to work. Before encoding individual frames, I've made sure to initialize the encoder.
But I don't believe x264 is the issue, as I can see frame data in the NALs it returns back.
Streaming the data is accomplished with libavformat, which is first initialized in a Streamer class:
Streamer::Streamer(Encoder* encoder, string rtp_address, int rtp_port, int width, int height, int fps, int bitrate)
{
this->encoder = encoder;
// initalize the AV context
this->ctx = avformat_alloc_context();
if (!this->ctx)
{
throw runtime_error("Couldn't initalize AVFormat output context");
}
// get the output format
this->fmt = av_guess_format("rtp", NULL, NULL);
if (!this->fmt)
{
throw runtime_error("Unsuitable output format");
}
this->ctx->oformat = this->fmt;
// try to open the RTP stream
snprintf(this->ctx->filename, sizeof(this->ctx->filename), "rtp://%s:%d", rtp_address.c_str(), rtp_port);
if (url_fopen(&(this->ctx->pb), this->ctx->filename, URL_WRONLY) < 0)
{
throw runtime_error("Couldn't open RTP output stream");
}
// add an H.264 stream
this->stream = av_new_stream(this->ctx, 1);
if (!this->stream)
{
throw runtime_error("Couldn't allocate H.264 stream");
}
// initalize codec
AVCodecContext* c = this->stream->codec;
c->codec_id = CODEC_ID_H264;
c->codec_type = AVMEDIA_TYPE_VIDEO;
c->bit_rate = bitrate;
c->width = width;
c->height = height;
c->time_base.den = fps;
c->time_base.num = 1;
// write the header
av_write_header(this->ctx);
}
This is where things seem to go wrong. av_write_header above seems to do absolutely nothing; I've used wireshark to verify this. For reference, I use Streamer streamer(&enc, "10.89.6.3", 49990, 800, 600, 30, 40000); to initialize the Streamer instance, with enc being a reference to an Encoder object used to handle x264 previously.
Now when I want to stream out a NAL, I use this:
// grab a NAL
NAL nal = this->encoder->nal_pop();
cout << "NAL popped with size " << nal.size << endl;
// initalize a packet
AVPacket p;
av_init_packet(&p);
p.data = nal.payload;
p.size = nal.size;
p.stream_index = this->stream->index;
// send it out
av_write_frame(this->ctx, &p);
At this point, I can see RTP data appearing over the network, and it looks like the frames I've been sending, even including a little copyright blob from x264. But, no player I've used has been able to make any sense of the data. VLC quits wanting an SDP description, which apparently isn't required.
I then tried to play it through gst-launch:
gst-launch udpsrc port=49990 ! rtph264depay ! decodebin ! xvimagesink
This will sit waiting for UDP data, but when it is received, I get:
ERROR: element /GstPipeline:pipeline0/GstRtpH264Depay:rtph264depay0: No RTP
format was negotiated. Additional debug info:
gstbasertpdepayload.c(372): gst_base_rtp_depayload_chain ():
/GstPipeline:pipeline0/GstRtpH264Depay:rtph264depay0: Input buffers
need to have RTP caps set on them. This is usually achieved by setting
the 'caps' property of the upstream source element (often udpsrc or
appsrc), or by putting a capsfilter element before the depayloader and
setting the 'caps' property on that. Also see
http://cgit.freedesktop.org/gstreamer/gst-plugins-good/tree/gst/rtp/README
As I'm not using GStreamer to stream itself, I'm not quite sure what it means with RTP caps. But, it makes me wonder if I'm not sending enough information over RTP to describe the stream. I'm pretty new to video and I feel like there's some key thing I'm missing here. Any hints?
h264 is an encoding standard. It specifies how video data is compressed and stored in a format that can be decompressed into a video stream at later point.
RTP is a transmission protocol. It specifies format and order of packets that can carry audio-video data that was encoded by an arbitrary encoder.
GStreamer expects to receive data that conforms to the RTP procotol. Is your expectation that libaformat will produce the RTP packets immediately readable by GStreamer warranted? Maybe GStreamers expect an additional stream description that would enable it to accept and decode the streamed packets using the proper decoder? Maybe it requires an additional RTSP exchange or the SDP stream descriptor file?
The error message states pretty clearly that an RTP format has not been negotiated. caps are short-hand for capabilities. Receiver needs to know transmitter's capabilities to set up the receiver/decoding machinery correctly.
I strongly suggest trying at least to create an SDP file for your RTP stream. libavformat should be able to do it for you.

Filling CMediaType and IMediaSample from AVPacket for h264 video

I have searched and have found almost nothing, so I would really appreciate some help with my question.
I am writting a DirectShow source filter which uses libav to read and send downstream h264 packets from youtube's FLV file. But I can't find appropriate libav structure's fields to implement correctly filter's GetMediType() and FillBuffer(). Some libav fields is null. In consequence h264 decoder crashes in attempt to process received data.
Where am I wrong? In working with libav or with DirectShow interfaces? Maybe h264 requires additional processing when working with libav or I fill reference time incorrectly? Does someone have any links useful for writing DirectShow h264 source filter with libav?
Part of GetMediaType():
VIDEOINFOHEADER *pvi = (VIDEOINFOHEADER*) toMediaType->AllocFormatBuffer(sizeof(VIDEOINFOHEADER));
pvi->AvgTimePerFrame = UNITS_PER_SECOND / m_pFormatContext->streams[m_streamNo]->codec->sample_rate; //sample_rate is 0
pvi->dwBitRate = m_pFormatContext->bit_rate;
pvi->rcSource = videoRect;
pvi->rcTarget = videoRect;
//Bitmap
pvi->bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
pvi->bmiHeader.biWidth = videoRect.right;
pvi->bmiHeader.biHeight = videoRect.bottom;
pvi->bmiHeader.biPlanes = 1;
pvi->bmiHeader.biBitCount = m_pFormatContext->streams[m_streamNo]->codec->bits_per_raw_sample;//or should here be bits_per_coded_sample
pvi->bmiHeader.biCompression = FOURCC_H264;
pvi->bmiHeader.biSizeImage = GetBitmapSize(&pvi->bmiHeader);
Part of FillBuffer():
//Get buffer pointer
BYTE* pBuffer = NULL;
if (pSamp->GetPointer(&pBuffer) < 0)
return S_FALSE;
//Get next packet
AVPacket* pPacket = m_mediaFile.getNextPacket();
if (pPacket->data == NULL)
return S_FALSE;
//Check packet and buffer size
if (pSamp->GetSize() < pPacket->size)
return S_FALSE;
//Copy from packet to sample buffer
memcpy(pBuffer, pPacket->data, pPacket->size);
//Set media sample time
REFERENCE_TIME start = m_mediaFile.timeStampToReferenceTime(pPacket->pts);
REFERENCE_TIME duration = m_mediaFile.timeStampToReferenceTime(pPacket->duration);
REFERENCE_TIME end = start + duration;
pSamp->SetTime(&start, &end);
pSamp->SetMediaTime(&start, &end);
P.S. I've debugged my filter with hax264 decoder and it crashes on call to libav deprecated function img_convert().
Here is the MSDN link you need to build a correct H.264 media type: H.264 Video Types
You have to fill the right fields with the right values.
The AM_MEDIA_TYPE should contain the right MEDIASUBTYPE for h264.
And these are plain wrong :
pvi->bmiHeader.biWidth = videoRect.right;
pvi->bmiHeader.biHeight = videoRect.bottom;
You should use a width/height which is independent of the rcSource/rcTarget, due to the them being indicators, and maybe completely zero if you take them from some other filter.
pvi->bmiHeader.biBitCount = m_pFormatContext->streams[m_streamNo]->codec->bits_per_raw_sample;//or should here be bits_per_coded_sample
This only makes sense if biWidth*biHeight*biBitCount/8 are the true size of the sample. I do not think so ...
pvi->bmiHeader.biCompression = FOURCC_H264;
This must also be passed in the AM_MEDIA_TYPE in the subtype parameter.
pvi->bmiHeader.biSizeImage = GetBitmapSize(&pvi->bmiHeader);
This fails, because the fourcc is unknown to the function and the bitcount is plain wrong for this sample, due to not being a full frame.
You have to take a look at how the data stream is handled by the downstream h264 filter. This seems to be flawed.