FFMPEG RTSP stream to MPEG4/H264 file using libx264 - c++

Heyo folks,
I'm attempting to transcode/remux an RTSP stream in H264 format into a MPEG4 container, containing just the H264 video stream. Basically, webcam output into a MP4 container.
I can get a poorly coded MP4 produced, using this code:
// Variables here for demo
AVFormatContext * video_file_output_format = nullptr;
AVFormatContext * rtsp_format_context = nullptr;
AVCodecContext * video_file_codec_context = nullptr;
AVCodecContext * rtsp_vidstream_codec_context = nullptr;
AVPacket packet = {0};
AVStream * video_file_stream = nullptr;
AVCodec * rtsp_decoder_codec = nullptr;
int errorNum = 0, video_stream_index = 0;
std::string outputMP4file = "D:\\somemp4file.mp4";
// begin
AVDictionary * opts = nullptr;
av_dict_set(&opts, "rtsp_transport", "tcp", 0);
if ((errorNum = avformat_open_input(&rtsp_format_context, uriANSI.c_str(), NULL, &opts)) < 0) {
errOut << "Connection failed: avformat_open_input failed with error " << errorNum << ":\r\n" << ErrorRead(errorNum);
TacticalAbort();
return;
}
rtsp_format_context->max_analyze_duration = 50000;
if ((errorNum = avformat_find_stream_info(rtsp_format_context, NULL)) < 0) {
errOut << "Connection failed: avformat_find_stream_info failed with error " << errorNum << ":\r\n" << ErrorRead(errorNum);
TacticalAbort();
return;
}
video_stream_index = errorNum = av_find_best_stream(rtsp_format_context, AVMEDIA_TYPE_VIDEO, -1, -1, NULL, 0);
if (video_stream_index < 0) {
errOut << "Connection in unexpected state; made a connection, but there was no video stream.\r\n"
"Attempts to find a video stream resulted in error " << errorNum << ": " << ErrorRead(errorNum);
TacticalAbort();
return;
}
rtsp_vidstream_codec_context = rtsp_format_context->streams[video_stream_index]->codec;
av_init_packet(&packet);
if (!(video_file_output_format = av_guess_format(NULL, outputMP4file.c_str(), NULL))) {
TacticalAbort();
throw std::exception("av_guess_format");
}
if (!(rtsp_decoder_codec = avcodec_find_decoder(rtsp_vidstream_codec_context->codec_id))) {
errOut << "Connection failed: connected, but avcodec_find_decoder returned null.\r\n"
"Couldn't find codec with an AV_CODEC_ID value of " << rtsp_vidstream_codec_context->codec_id << ".";
TacticalAbort();
return;
}
video_file_format_context = avformat_alloc_context();
video_file_format_context->oformat = video_file_output_format;
if (strcpy_s(video_file_format_context->filename, sizeof(video_file_format_context->filename), outputMP4file.c_str())) {
errOut << "Couldn't open video file: strcpy_s failed with error " << errno << ".";
std::string log = errOut.str();
TacticalAbort();
throw std::exception("strcpy_s");
}
if (!(video_file_encoder_codec = avcodec_find_encoder(video_file_output_format->video_codec))) {
TacticalAbort();
throw std::exception("avcodec_find_encoder");
}
// MARKER ONE
if (!outputMP4file.empty() &&
!(video_file_output_format->flags & AVFMT_NOFILE) &&
(errorNum = avio_open2(&video_file_format_context->pb, outputMP4file.c_str(), AVIO_FLAG_WRITE, nullptr, &opts)) < 0) {
errOut << "Couldn't open video file \"" << outputMP4file << "\" for writing : avio_open2 failed with error " << errorNum << ": " << ErrorRead(errorNum);
TacticalAbort();
return;
}
// Create stream in MP4 file
if (!(video_file_stream = avformat_new_stream(video_file_format_context, video_file_encoder_codec))) {
TacticalAbort();
return;
}
AVCodecContext * video_file_codec_context = video_file_stream->codec;
// MARKER TWO
// error -22/-21 in avio_open2 if this is skipped
if ((errorNum = avcodec_copy_context(video_file_codec_context, rtsp_vidstream_codec_context)) != 0) {
TacticalAbort();
throw std::exception("avcodec_copy_context");
}
//video_file_codec_context->codec_tag = 0;
/*
// MARKER 3 - is this not needed? Examples suggest not.
if ((errorNum = avcodec_open2(video_file_codec_context, video_file_encoder_codec, &opts)) < 0)
{
errOut << "Couldn't open video file codec context: avcodec_open2 failed with error " << errorNum << ": " << ErrorRead(errorNum);
std::string log = errOut.str();
TacticalAbort();
throw std::exception("avcodec_open2, video file");
}*/
//video_file_format_context->flags |= AVFMT_FLAG_GENPTS;
if (video_file_format_context->oformat->flags & AVFMT_GLOBALHEADER)
{
video_file_codec_context->flags |= CODEC_FLAG_GLOBAL_HEADER;
}
if ((errorNum = avformat_write_header(video_file_format_context, &opts)) < 0) {
errOut << "Couldn't open video file: avformat_write_header failed with error " << errorNum << ":\r\n" << ErrorRead(errorNum);
std::string log = errOut.str();
TacticalAbort();
return;
}
However, there are several issues:
I can't pass any x264 options to the output file. The output H264 matches the input H264's profile/level - switching cameras to a different model switches H264 level.
The timing of the output file is off, noticeably.
The duration of the output file is off, massively. A few seconds of footage becomes hours, although playtime doesn't match. (FWIW, I'm using VLC to play them.)
Passing x264 options
If I manually increment PTS per packet, and set DTS equal to PTS, it plays too fast, ~2-3 seconds' worth of footage in one second playtime, and duration is hours long. The footage also blurs past several seconds, about 10 seconds' footage in a second.
If I let FFMPEG decide (with or without GENPTS flag), the file has a variable frame rate (probably as expected), but it plays the whole file in an instant and has a long duration too (over forty hours for a few seconds). The duration isn't "real", as the file plays in an instant.
At Marker One, I try to set the profile by passing options to avio_open2. The options are simply ignored by libx264. I've tried:
av_dict_set(&opts, "vprofile", "main", 0);
av_dict_set(&opts, "profile", "main", 0); // error, missing '('
// FF_PROFILE_H264_MAIN equals 77, so I also tried
av_dict_set(&opts, "vprofile", "77", 0);
av_dict_set(&opts, "profile", "77", 0);
It does seem to read the profile setting, but it doesn't use them. At Marker Two, I tried to set it after the avio_open2, before avformat_write_header .
// I tried all 4 av_dict_set from earlier, passing it to avformat_write_header.
// None had any effect, they weren't consumed.
av_opt_set(video_file_codec_context, "profile", "77", 0);
av_opt_set(video_file_codec_context, "profile", "main", 0);
video_file_codec_context->profile = FF_PROFILE_H264_MAIN;
av_opt_set(video_file_codec_context->priv_data, "profile", "77", 0);
av_opt_set(video_file_codec_context->priv_data, "profile", "main", 0);
Messing with privdata made the program unstable, but I was trying anything at that point.
I'd like to solve issue 1 with passing settings, since I imagine it'd bottleneck any attempt to solve issues 2 or 3.
I've been fiddling with this for the better part of a month now. I've been through dozens of documentation, Q&As, examples. It doesn't help that quite a few are outdated.
Any help would be appreciated.
Cheers

Okay, firstly, I wasn't using ffmpeg, but a fork of ffmpeg called libav. Not to be confused, ffmpeg is more recent, and libav was used in some distributions of Linux.
Compiling for Visual Studio
Once I upgraded to the main branch, I had to compile it manually again, since I was using it in Visual Studio and the only static libraries are G++, so linking doesn't work nicely.
The official guide is https://trac.ffmpeg.org/wiki/CompilationGuide/MSVC.
First, compiling as such works fine:
Ensure VS is in PATH. Your PATH should read in this order:
C:\Program Files (x86)\Microsoft Visual Studio XX.0\VC\bin
D:\MinGW\msys64\mingw32\bin
D:\MinGW\msys64\usr\bin
D:\MinGW\bin
Then run Visual Studio x86 Native Tools prompt. Should be in your Start Menu.
In the CMD, run
(your path to MinGW)\msys64\msys2_shell.cmd -full-path
In the created MinGW window, run:
$ cd /your dev path/
$ git clone https://git.ffmpeg.org/ffmpeg.git ffmpeg
After about five minutes you'll have the FFMPEG source in the subfolder ffmpeg.
Access the source via:
$ cd ffmpeg
Then run:
$ which link
if it doesn't provide the VS path from PATH, but usr/link or usr/bin/link, rename similarly:
$ mv /usr/bin/link.exe /usr/bin/msys-link.exe
If it does skip the $ mv step.
Finally, run this command:
$ ./configure --toolchain=msvc and whatever other cmdlines you want
(you can see commandlines via ./configure --help)
It may appear inactive for a long time. Once it's done you'll get a couple pages of output.
Then run:
$ make
$ make install
Note for static builds (configure with --enable-static), although you get Windows static lib files, it'll produce them with extension *.a files. Just rename to .lib.
(you can use cmd: ren *.a *.lib)
Using FFMPEG
To just copy from FFMPEG RTSP to a file, using source profile, level etc, just use:
read network frame av_read_frame(rtsp_format_context)
pass to MP4 av_write_frame(video_file_format_context)
You don't need to open a AVCodecContext, a decoder or encoder; just avformat_open_input, and the video file AVFormatContext and AVIOContext.
If you want to re-encode, you have to:
read network frame
av_read_frame(rtsp_format_context)
pass packet to decoder
avcodec_send_packet(rtsp_decoder_context)
read frames from decoder (in loop)
avcodec_receive_frame(rtsp_decoder_context)
send each decoded frame to encoder
avcodec_send_frame(video_file_encoder_context)
read packets from encoder (in loop)
avcodec_receive_packet(video_file_encoder_context)
send each encoded packet to output video av_write_frame(video_file_format_context)
Some gotchas
Copy out the width, height, and pixel format manually. For H264 it's YUV420P.
As example, for level 3.1, profile high:
AVCodecParameters * video_file_codec_params = video_file_stream->codecpar;
video_file_codec_params->profile = FF_PROFILE_H264_HIGH;
video_file_codec_params->format = AV_PIX_FMT_YUV420P;
video_file_codec_params->level = 31;
video_file_codec_params->width = rtsp_vidstream->codecpar->width;
video_file_codec_params->height = rtsp_vidstream->codecpar->height;
libx264 accepts a H264 preset via the opts parameter in avcodec_open2. Example for "veryfast" preset:
AVDictionary * mydict;
av_dict_set(&mydict, "preset", "veryfast", 0);
avcodec_open2(video_file_encoder_context, video_file_encoder_codec, &opts)
// avcodec_open2 returns < 0 for errors.
// Recognised options will be removed from the mydict variable.
// If all are recognised, mydict will be NULL.
Output timing is a volatile thing. Use this before ``.
video_file_stream->avg_frame_rate = rtsp_vidstream->avg_frame_rate;
video_file_stream->r_frame_rate = rtsp_vidstream->r_frame_rate;
video_file_stream->time_base = rtsp_vidstream->time_base;
video_file_encoder_context->time_base = rtsp_vidstream_codec_context->time_base;
// Decreasing GOP size for more seek positions doesn't end well.
// libx264 forces the new GOP size.
video_file_encoder_context->gop_size = rtsp_vidstream_codec_context->gop_size;
if ((errorNum = avcodec_open2(video_file_encoder_context,...)) < 0) {
// an error...
}
H264 may write at double-speed to the file, so playback is doubly fast. To change this, go manual with encoded packets' timing:
packet->pts = packet->dts = frameNum++;
av_packet_rescale_ts(packet, video_file_encoder_context->time_base, video_file_stream->time_base);
packet->pts *= 2;
packet->dts *= 2;
av_interleaved_write_frame(video_file_format_context, packet)
// av_interleaved_write_frame returns < 0 for errors.
Note we switch av_write_frame to av_interleaved_write_frame, and set both PTS and DTS. frameNum should be a int64_t, and should start from 0 (although that's not required).
Also note the av_rescale_ts call's parameters are the video file encoder context, and the video file stream - RTSP isn't involved.
VLC media player won't play H.264 streams encoded with an FPS of 4 or lower.
So if your RTSP streams are showing the first decoded frame and never progressing, or showing pure green until the video ends, make sure your FPS is high enough. (that's VLC v2.2.4)

Related

Live555 truncates encoded data of FFMpeg

I am trying to stream H264 based data using Live555 over RTSP.
I am capturing data using V4L2, and then encodes it using FFMPEG and then passing data to Live555's DeviceSource file, in that I using H264VideoStreamFramer class,
Below is my codec settings to configure AVCodecContext of encoder,
codec = avcodec_find_encoder_by_name(CODEC_NAME);
if (!codec) {
cerr << "Codec " << codec_name << " not found\n";
exit(1);
}
c = avcodec_alloc_context3(codec);
if (!c) {
cerr << "Could not allocate video codec context\n";
exit(1);
}
pkt = av_packet_alloc();
if (!pkt)
exit(1);
/* put sample parameters */
c->bit_rate = 400000;
/* resolution must be a multiple of two */
c->width = PIC_HEIGHT;
c->height = PIC_WIDTH;
/* frames per second */
c->time_base = (AVRational){1, FPS};
c->framerate = (AVRational){FPS, 1};
c->gop_size = 10;
c->max_b_frames = 1;
c->pix_fmt = AV_PIX_FMT_YUV420P;
c->rtp_payload_size = 30000;
if (codec->id == AV_CODEC_ID_H264)
av_opt_set(c->priv_data, "preset", "fast", 0);
av_opt_set_int(c->priv_data, "slice-max-size", 30000, 0);
/* open it */
ret = avcodec_open2(c, codec, NULL);
if (ret < 0) {
cerr << "Could not open codec\n";
exit(1);
}
And I am getting encoded data using avcodec_receive_packet() function. which will return AVPacket.
And I am passing AVPacket's data into DeviceSource file below is code snippet of my Live555 code:
void DeviceSource::deliverFrame() {
if (!isCurrentlyAwaitingData()) return; // we're not ready for the data yet
u_int8_t* newFrameDataStart = (u_int8_t*) pkt->data;
unsigned newFrameSize = pkt->size; //%%% TO BE WRITTEN %%%
// Deliver the data here:
if (newFrameSize > fMaxSize) { // Condition becomes true many times
fFrameSize = fMaxSize;
fNumTruncatedBytes = newFrameSize - fMaxSize;
} else {
fFrameSize = newFrameSize;
}
gettimeofday(&fPresentationTime, NULL); // If you have a more accurate time - e.g., from an encoder - then use that instead.
// If the device is *not* a 'live source' (e.g., it comes instead from a file or buffer), then set "fDurationInMicroseconds" here.
memmove(fTo, newFrameDataStart, fFrameSize);
}
But here, sometimes my packet's size is getting more than fMaxSize value and as per LIVE555 logic it will truncate frame data, so that sometimes I am getting bad frames on my VLC,
From Live555 forum, I get to know that encoder should not send packet whose size is more than fMaxSize value, so my question is:
How to restrict encoder to limit size of packet?
Thanks in Advance,
Harshil
You can increase the maximum allowed sample size by changing "maxSize" in the OutPacketBuffer class in MediaSink.cpp. This worked for me. There are cases we may require high-quality video to be streamed, I don't think we will always be able to restrict the encoder to not to produce samples of size more than a particular value which would result in video quality issues. In fact, the samples are fragmented by the UDP sink live555 to match the default MTU (1500), so increasing the max sample size limit has no side effects.

Fragmented MP4 - problem playing in browser

I try to create fragmented MP4 from raw H264 video data so I could play it in internet browser's player. My goal is to create live streaming system, where media server would send fragmented MP4 pieces to browser. The server would buffer input data from RaspberryPi camera, which sends video as H264 frames. It would then mux that video data and make it available for client. The browser would play media data (that were muxed by server and sent i.e. through websocket) by using Media Source Extensions.
For test purpose I wrote the following pieces of code (using many examples I found in the intenet):
C++ application using avcodec which muxes raw H264 video to fragmented MP4 and saves it to a file:
#define READBUFSIZE 4096
#define IOBUFSIZE 4096
#define ERRMSGSIZE 128
#include <cstdint>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
extern "C"
{
#include <libavformat/avformat.h>
#include <libavutil/error.h>
#include <libavutil/opt.h>
}
enum NalType : uint8_t
{
//NALs containing stream metadata
SEQ_PARAM_SET = 0x7,
PIC_PARAM_SET = 0x8
};
std::vector<uint8_t> outputData;
int mediaMuxCallback(void *opaque, uint8_t *buf, int bufSize)
{
outputData.insert(outputData.end(), buf, buf + bufSize);
return bufSize;
}
std::string getAvErrorString(int errNr)
{
char errMsg[ERRMSGSIZE];
av_strerror(errNr, errMsg, ERRMSGSIZE);
return std::string(errMsg);
}
int main(int argc, char **argv)
{
if(argc < 2)
{
std::cout << "Missing file name" << std::endl;
return 1;
}
std::fstream file(argv[1], std::ios::in | std::ios::binary);
if(!file.is_open())
{
std::cout << "Couldn't open file " << argv[1] << std::endl;
return 2;
}
std::vector<uint8_t> inputMediaData;
do
{
char buf[READBUFSIZE];
file.read(buf, READBUFSIZE);
int size = file.gcount();
if(size > 0)
inputMediaData.insert(inputMediaData.end(), buf, buf + size);
} while(!file.eof());
file.close();
//Initialize avcodec
av_register_all();
uint8_t *ioBuffer;
AVCodec *codec = avcodec_find_decoder(AV_CODEC_ID_H264);
AVCodecContext *codecCtxt = avcodec_alloc_context3(codec);
AVCodecParserContext *parserCtxt = av_parser_init(AV_CODEC_ID_H264);
AVOutputFormat *outputFormat = av_guess_format("mp4", nullptr, nullptr);
AVFormatContext *formatCtxt;
AVIOContext *ioCtxt;
AVStream *videoStream;
int res = avformat_alloc_output_context2(&formatCtxt, outputFormat, nullptr, nullptr);
if(res < 0)
{
std::cout << "Couldn't initialize format context; the error was: " << getAvErrorString(res) << std::endl;
return 3;
}
if((videoStream = avformat_new_stream( formatCtxt, avcodec_find_encoder(formatCtxt->oformat->video_codec) )) == nullptr)
{
std::cout << "Couldn't initialize video stream" << std::endl;
return 4;
}
else if(!codec)
{
std::cout << "Couldn't initialize codec" << std::endl;
return 5;
}
else if(codecCtxt == nullptr)
{
std::cout << "Couldn't initialize codec context" << std::endl;
return 6;
}
else if(parserCtxt == nullptr)
{
std::cout << "Couldn't initialize parser context" << std::endl;
return 7;
}
else if((ioBuffer = (uint8_t*)av_malloc(IOBUFSIZE)) == nullptr)
{
std::cout << "Couldn't allocate I/O buffer" << std::endl;
return 8;
}
else if((ioCtxt = avio_alloc_context(ioBuffer, IOBUFSIZE, 1, nullptr, nullptr, mediaMuxCallback, nullptr)) == nullptr)
{
std::cout << "Couldn't initialize I/O context" << std::endl;
return 9;
}
//Set video stream data
videoStream->id = formatCtxt->nb_streams - 1;
videoStream->codec->width = 1280;
videoStream->codec->height = 720;
videoStream->time_base.den = 60; //FPS
videoStream->time_base.num = 1;
videoStream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
formatCtxt->pb = ioCtxt;
//Retrieve SPS and PPS for codec extdata
const uint32_t synchMarker = 0x01000000;
unsigned int i = 0;
int spsStart = -1, ppsStart = -1;
uint16_t spsSize = 0, ppsSize = 0;
while(spsSize == 0 || ppsSize == 0)
{
uint32_t *curr = (uint32_t*)(inputMediaData.data() + i);
if(*curr == synchMarker)
{
unsigned int currentNalStart = i;
i += sizeof(uint32_t);
uint8_t nalType = inputMediaData.data()[i] & 0x1F;
if(nalType == SEQ_PARAM_SET)
spsStart = currentNalStart;
else if(nalType == PIC_PARAM_SET)
ppsStart = currentNalStart;
if(spsStart >= 0 && spsSize == 0 && spsStart != i)
spsSize = currentNalStart - spsStart;
else if(ppsStart >= 0 && ppsSize == 0 && ppsStart != i)
ppsSize = currentNalStart - ppsStart;
}
++i;
}
videoStream->codec->extradata = inputMediaData.data() + spsStart;
videoStream->codec->extradata_size = ppsStart + ppsSize;
//Write main header
AVDictionary *options = nullptr;
av_dict_set(&options, "movflags", "frag_custom+empty_moov", 0);
res = avformat_write_header(formatCtxt, &options);
if(res < 0)
{
std::cout << "Couldn't write container main header; the error was: " << getAvErrorString(res) << std::endl;
return 10;
}
//Retrieve frames from input video and wrap them in container
int currentInputIndex = 0;
int framesInSecond = 0;
while(currentInputIndex < inputMediaData.size())
{
uint8_t *frameBuffer;
int frameSize;
res = av_parser_parse2(parserCtxt, codecCtxt, &frameBuffer, &frameSize, inputMediaData.data() + currentInputIndex,
inputMediaData.size() - currentInputIndex, AV_NOPTS_VALUE, AV_NOPTS_VALUE, 0);
if(frameSize == 0) //No more frames while some data still remains (is that even possible?)
{
std::cout << "Some data left unparsed: " << std::to_string(inputMediaData.size() - currentInputIndex) << std::endl;
break;
}
//Prepare packet with video frame to be dumped into container
AVPacket packet;
av_init_packet(&packet);
packet.data = frameBuffer;
packet.size = frameSize;
packet.stream_index = videoStream->index;
currentInputIndex += frameSize;
//Write packet to the video stream
res = av_write_frame(formatCtxt, &packet);
if(res < 0)
{
std::cout << "Couldn't write packet with video frame; the error was: " << getAvErrorString(res) << std::endl;
return 11;
}
if(++framesInSecond == 60) //We want 1 segment per second
{
framesInSecond = 0;
res = av_write_frame(formatCtxt, nullptr); //Flush segment
}
}
res = av_write_frame(formatCtxt, nullptr); //Flush if something has been left
//Write media data in container to file
file.open("my_mp4.mp4", std::ios::out | std::ios::binary);
if(!file.is_open())
{
std::cout << "Couldn't open output file " << std::endl;
return 12;
}
file.write((char*)outputData.data(), outputData.size());
if(file.fail())
{
std::cout << "Couldn't write to file" << std::endl;
return 13;
}
std::cout << "Media file muxed successfully" << std::endl;
return 0;
}
(I hardcoded a few values, such as video dimensions or framerate, but as I said this is just a test code.)
Simple HTML webpage using MSE to play my fragmented MP4
<!DOCTYPE html>
<html>
<head>
<title>Test strumienia</title>
</head>
<body>
<video width="1280" height="720" controls>
</video>
</body>
<script>
var vidElement = document.querySelector('video');
if (window.MediaSource) {
var mediaSource = new MediaSource();
vidElement.src = URL.createObjectURL(mediaSource);
mediaSource.addEventListener('sourceopen', sourceOpen);
} else {
console.log("The Media Source Extensions API is not supported.")
}
function sourceOpen(e) {
URL.revokeObjectURL(vidElement.src);
var mime = 'video/mp4; codecs="avc1.640028"';
var mediaSource = e.target;
var sourceBuffer = mediaSource.addSourceBuffer(mime);
var videoUrl = 'my_mp4.mp4';
fetch(videoUrl)
.then(function(response) {
return response.arrayBuffer();
})
.then(function(arrayBuffer) {
sourceBuffer.addEventListener('updateend', function(e) {
if (!sourceBuffer.updating && mediaSource.readyState === 'open') {
mediaSource.endOfStream();
}
});
sourceBuffer.appendBuffer(arrayBuffer);
});
}
</script>
</html>
Output MP4 file generated by my C++ application can be played i.e. in MPC, but it doesn't play in any web browser I tested it with. It also doesn't have any duration (MPC keeps showing 00:00).
To compare output MP4 file I got from my C++ application described above, I also used FFMPEG to create fragmented MP4 file from the same source file with raw H264 stream. I used the following command:
ffmpeg -r 60 -i input.h264 -c:v copy -f mp4 -movflags empty_moov+default_base_moof+frag_keyframe test.mp4
This file generated by FFMPEG is played correctly by every web browser I used for tests. It also has correct duration (but also it has trailing atom, which wouldn't be present in my live stream anyway, and as I need a live stream, it won't have any fixed duration in the first place).
MP4 atoms for both files look very similiar (they have identical avcc section for sure). What's interesting (but not sure if it's of any importance), both files have different NALs format than input file (RPI camera produces video stream in Annex-B format, while output MP4 files contain NALs in AVCC format... or at least it looks like it's the case when I compare mdat atoms with input H264 data).
I assume there is some field (or a few fields) I need to set for avcodec to make it produce video stream that would be properly decoded and played by browsers players. But what field(s) do I need to set? Or maybe problem lies somewhere else? I ran out of ideas.
EDIT 1:
As suggested, I investigated binary content of both MP4 files (generated by my app and FFMPEG tool) with hex editor. What I can confirm:
both files have identical avcc section (they match perfectly and are in AVCC format, I analyzed it byte after byte and there's no mistake about it)
both files have NALs in AVCC format (I looked closely at mdat atoms and they don't differ between both MP4 files)
So I guess there's nothing wrong with the extradata creation in my code - avcodec takes care of it properly, even if I just feed it with SPS and PPS NALs. It converts them by itself, so no need for me to do it by hand. Still, my original problem remains.
EDIT 2: I achieved partial success - MP4 generated by my app now plays in Firefox. I added this line to the code (along with rest of stream initialization):
videoStream->codec->time_base = videoStream->time_base;
So now this section of my code looks like this:
//Set video stream data
videoStream->id = formatCtxt->nb_streams - 1;
videoStream->codec->width = 1280;
videoStream->codec->height = 720;
videoStream->time_base.den = 60; //FPS
videoStream->time_base.num = 1;
videoStream->codec->time_base = videoStream->time_base;
videoStream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
formatCtxt->pb = ioCtxt;
I finally found the solution. My MP4 now plays in Chrome (while still playing in other tested browsers).
In Chrome chrome://media-internals/ shows MSE logs (of a sort). When I looked there, I found a few of following warnings for my test player:
ISO-BMFF container metadata for video frame indicates that the frame is not a keyframe, but the video frame contents indicate the opposite.
That made me think and encouraged to set AV_PKT_FLAG_KEY for packets with keyframes. I added following code to section with filling AVPacket structure:
//Check if keyframe field needs to be set
int allowedNalsCount = 3; //In one packet there would be at most three NALs: SPS, PPS and video frame
packet.flags = 0;
for(int i = 0; i < frameSize && allowedNalsCount > 0; ++i)
{
uint32_t *curr = (uint32_t*)(frameBuffer + i);
if(*curr == synchMarker)
{
uint8_t nalType = frameBuffer[i + sizeof(uint32_t)] & 0x1F;
if(nalType == KEYFRAME)
{
std::cout << "Keyframe detected at frame nr " << framesTotal << std::endl;
packet.flags = AV_PKT_FLAG_KEY;
break;
}
else
i += sizeof(uint32_t) + 1; //We parsed this already, no point in doing it again
--allowedNalsCount;
}
}
A KEYFRAME constant turns out to be 0x5 in my case (Slice IDR).
MP4 atoms for both files look very similiar (they have identical avcc
section for sure)
Double check that, The code supplied suggests otherwise to me.
What's interesting (but not sure if it's of any importance), both
files have different NALs format than input file (RPI camera produces
video stream in Annex-B format, while output MP4 files contain NALs in
AVCC format... or at least it looks like it's the case when I compare
mdat atoms with input H264 data).
It is very important, mp4 will not work with annex b.
You need to fill in extradata with AVC Decoder Configuration Record, not just SPS/PPS
Here's how the record should look like:
AVCDCR
We can find this explanation in [Chrome Source]
(https://chromium.googlesource.com/chromium/src/+/refs/heads/master/media/formats/mp4/mp4_stream_parser.cc#799) "chrome media source code":
// Copyright 2014 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
// Use |analysis.is_keyframe|, if it was actually determined, for logging
// if the analysis mismatches the container's keyframe metadata for
// |frame_buf|.
if (analysis.is_keyframe.has_value() &&
is_keyframe != analysis.is_keyframe.value()) {
LIMITED_MEDIA_LOG(DEBUG, media_log_, num_video_keyframe_mismatches_,
kMaxVideoKeyframeMismatchLogs)
<< "ISO-BMFF container metadata for video frame indicates that the "
"frame is "
<< (is_keyframe ? "" : "not ")
<< "a keyframe, but the video frame contents indicate the "
"opposite.";
// As of September 2018, it appears that all of Edge, Firefox, Safari
// work with content that marks non-avc-keyframes as a keyframe in the
// container. Encoders/muxers/old streams still exist that produce
// all-keyframe mp4 video tracks, though many of the coded frames are
// not keyframes (likely workaround due to the impact on low-latency
// live streams until https://crbug.com/229412 was fixed). We'll trust
// the AVC frame's keyframe-ness over the mp4 container's metadata if
// they mismatch. If other out-of-order codecs in mp4 (e.g. HEVC, DV)
// implement keyframe analysis in their frame_bitstream_converter, we'll
// similarly trust that analysis instead of the mp4.
is_keyframe = analysis.is_keyframe.value();
}
As the code comment show, chrome trust the AVC frame's keyframe-ness over the mp4 container's metadata. So nalu type in H264/HEVC should be more important than mp4 container box sdtp & trun description.

C++ ffmpeg video missing frames and won't play in Quicktime

I wrote some C++ code that uses ffmpeg to encode a video. I'm having two strange issues:
The final video is always missing 1 frame. That is, if I have it encode 10 frames the final video only has 9 (at least that's what ffprobe -show_frames -pretty $VIDEO | grep -F '[FRAME]' | wc -l tells me.
The final video plays fine in some players (mpv and vlc) but not in Quicktime. Quicktime just shows a completely black screen.
My code is roughly this (modified a bit to remove types that are unique to our code base):
First, I open the video file, write the headers and initialize things:
template <class PtrT>
using UniquePtrWithDeleteFunction = std::unique_ptr<PtrT, std::function<void (PtrT*)>>;
std::unique_ptr<FfmpegEncodingFrameSink> FfmpegEncodingFrameSink::Create(
const std::string& dest_url) {
AVFormatContext* tmp_format_ctxt;
auto alloc_format_res = avformat_alloc_output_context2(&tmp_format_ctxt, nullptr, "mp4", dest_url.c_str());
if (alloc_format_res < 0) {
throw FfmpegException("Error opening output file.");
}
auto format_ctxt = UniquePtrWithDeleteFunction<AVFormatContext>(
tmp_format_ctxt, CloseAvFormatContext);
AVStream* out_stream_video = avformat_new_stream(format_ctxt.get(), nullptr);
if (out_stream_video == nullptr) {
throw FfmpegException("Could not create outputstream");
}
auto codec_context = GetCodecContext(options);
out_stream_video->time_base = codec_context->time_base;
auto ret = avcodec_parameters_from_context(out_stream_video->codecpar, codec_context.get());
if (ret < 0) {
throw FfmpegException("Failed to copy encoder parameters to outputstream");
}
if (!(format_ctxt->oformat->flags & AVFMT_NOFILE)) {
ret = avio_open(&format_ctxt->pb, dest_url.c_str(), AVIO_FLAG_WRITE);
if (ret < 0) {
throw VideoDecodeException("Could not open output file: " + dest_url);
}
}
ret = avformat_init_output(format_ctxt.get(), nullptr);
if (ret < 0) {
throw FfmpegException("Unable to initialize the codec.");
}
ret = avformat_write_header(format_ctxt.get(), nullptr);
if (ret < 0) {
throw FfmpegException("Error occurred writing format header");
}
return std::unique_ptr<FfmpegEncodingFrameSink>(
new FfmpegEncodingFrameSink(std::move(format_ctxt), std::move(codec_context)));
}
Then, every time I get a new frame to encode I pass it to this function (the frames are being decoded via ffmpeg from another mp4 file which Quicktime plays just fine):
// If frame == nullptr then we're done and we're just flushing the encoder
// otherwise encode an actual frame
void FfmpegEncodingFrameSink::EncodeAndWriteFrame(
const AVFrame* frame) {
auto ret = avcodec_send_frame(codec_ctxt_.get(), frame);
if (ret < 0) {
throw FfmpegException("Error encoding the frame.");
}
AVPacket enc_packet;
enc_packet.data = nullptr;
enc_packet.size = 0;
av_init_packet(&enc_packet);
do {
ret = avcodec_receive_packet(codec_ctxt_.get(), &enc_packet);
if (ret == AVERROR(EAGAIN)) {
CHECK(frame != nullptr);
break;
} else if (ret == AVERROR_EOF) {
CHECK(frame == nullptr);
break;
} else if (ret < 0) {
throw FfmpegException("Error putting the encoded frame into the packet.");
}
assert(ret == 0);
enc_packet.stream_index = 0;
LOG(INFO) << "Writing packet to stream.";
av_interleaved_write_frame(format_ctxt_.get(), &enc_packet);
av_packet_unref(&enc_packet);
} while (ret == 0);
}
Finally, in my destructor I close everything up like so:
FfmpegEncodingFrameSink::~FfmpegEncodingFrameSink() {
// Pass a nullptr to EncodeAndWriteFrame so it flushes the encoder
EncodeAndWriteFrame(nullptr);
// write mp4 trailer
av_write_trailer(format_ctxt_.get());
}
If I run this passing n frames to EncodeAndWriteFrame line LOG(INFO) << "Writing packet to stream."; gets run n times indicating the n packets were written to the stream. But ffprobe always shows only n - 1 frames int he video. And the final video doesn't play on quicktime.
What am I doing wrong??
Sorry for the delay but as i just had the same problem and noticed that this question deserves an answer, here how i solved this.
Up in front, the Problem only occured for me when using mov, mp4, 3gp as format. It worked frame accurate when using e.g. avi format. When i wrote uncompressed video frames to the container, i saw that the avi and mov had the same count of frames stored but the mov obviously had some problem in it's header.
Counting the number of frames in the mov using header metadata showed one frame is missing:
ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of default=nokey=1:noprint_wrappers=1 c:\temp\myinput.mov
While ignoring the index showed the correct number of frames:
-ignore_editlist 1
The solution for me was, set the timebase to the AVStream->CodeContext of the video stream.
The code above attempts to do this in this line:
out_stream_video->time_base = codec_context->time_base;
But the problem is that the posted code above does not expose the function GetCodecContext so we do not know if the time_base is correctly set for "codec_context". So it is my believe that the author's problem was that his function GetCodecContext did not set the time_base correctly.

How to read YUV8 data from avi file?

I have avi file that contains uncompressed gray video data. I need to extract frames from it. The size of file is 22 Gb.
How do i do that?
I have already tried ffmpeg, but it gives me "could not find codec parameters for video stream" message - because there is no codec at work, just frames.
Since Opencv just uses ffmpeg to read video, that rules out opencv as well.
The only path that seems to be left is to try and dig into the raw data, but i do not know how.
Edit: this is the code i use to read from the file with opencv. The failure occurs inside the second if. Running ffmpeg binary on the file also fails with the message above (could not find codec aprameters etc)
/* register all formats and codecs */
av_register_all();
/* open input file, and allocate format context */
if (avformat_open_input(&fmt_ctx, src_filename, NULL, NULL) < 0) {
fprintf(stderr, "Could not open source file %s\n", src_filename);
ret = 1;
goto end;
}
fmt_ctx->seek2any = true;
/* retrieve stream information */
int res = avformat_find_stream_info(fmt_ctx, NULL);
if (res < 0) {
fprintf(stderr, "Could not find stream information\n");
ret = 1;
goto end;
}
Edit:
Here is sample code i have tried to make the extraction: pastebin. The result i get is an unchanging buffer after every call to AVIStreamRead.
If you do not need cross platform functionality Video for Windows (VFW) API is a good alternative (http://msdn.microsoft.com/en-us/library/windows/desktop/dd756808(v=vs.85).aspx), i will not put an entire code block, since there's quite much to do, but you should be able to figure it out from the reference link. Basically, you do a AVIFileOpen, then get the video stream via AVIFileGetStream with streamtypeVIDEO, or alternatively do it at once with AVIStreamOpenFromFile and then read samples from the stream with AVIStreamRead. If you get to a point where you fail I can try to help, but it should be pretty straightforward.
Also, not sure why ffmpeg is failing, I have been doing raw AVI reading with ffmpeg without any codecs involved, can you post what call to ffpeg actually fails?
EDIT:
For the issue that you are seeing when the read data size is 0. The AVI file has N slots for frames in each second where N is the fps of the video. In real life the samples won't come exactly at that speed (e.g. IP surveillance cameras) so the actual data sample indexes can be non continuous like 1,5,11,... and VFW would insert empty samples between them (that is from where you read a sample with a zero size). What you have to do is call AVIStreamRead with NULL as buffer and 0 as size until the bRead is not 0 or you run past last sample. When you get an actual size, then you can again call AVIStreamRead on that sample index with the buffer pointer and size. I usually do compressed video so i don't use the suggested size, but at least according to your code snipplet I would do something like this:
...
bRead = 0;
do
{
aviOpRes = AVIStreamRead(ppavi,smpS,1,NULL,0,&bRead,&smpN);
} while (bRead == 0 && ++smpS < si.dwLength + si.dwStart);
if(smpS >= si.dwLength + si.dwStart)
break;
PUCHAR tempBuffer = new UCHAR[bRead];
aviOpRes = AVIStreamRead(ppavi,smpS,1,tempBuffer,bRead,&bRead,&smpN);
/* do whatever you need */
delete tempBuffer;
...
EDIT 2:
Since this may come in handy to someone or yourself to make a choice between VFW and FFMPEG I also updated your FFMPEG example so that it parsed the same file (sorry for the code quality since it lacks error checking but i guess you can see the logical flow):
/* register all formats and codecs */
av_register_all();
AVFormatContext* fmt_ctx = NULL;
/* open input file, and allocate format context */
const char *src_filename = "E:\\Output.avi";
if (avformat_open_input(&fmt_ctx, src_filename, NULL, NULL) < 0) {
fprintf(stderr, "Could not open source file %s\n", src_filename);
abort();
}
/* retrieve stream information */
int res = avformat_find_stream_info(fmt_ctx, NULL);
if (res < 0) {
fprintf(stderr, "Could not find stream information\n");
abort();
}
int video_stream_index = 0; /* video stream is usualy 0 but still better to lookup in case it's not present */
for(; video_stream_index < fmt_ctx->nb_streams; ++video_stream_index)
{
if(fmt_ctx->streams[video_stream_index]->codec->codec_type == AVMEDIA_TYPE_VIDEO)
break;
}
if(video_stream_index == fmt_ctx->nb_streams)
abort();
AVPacket *packet = new AVPacket;
while(av_read_frame(fmt_ctx, packet) == 0)
{
if (packet->stream_index == video_stream_index)
printf("Sample nr %d\n", packet->pts);
av_free_packet(packet);
}
Basically you open the context and read packets from it. You will get both audio and video packets so you should check if the packet belongs to the stream of interest. FFMPEG will save you the trouble with empty frames and give only those samples that have data in them.

GCC - how to compile FFMpeg with FLV1 (aka H.263), MP3 (format and codec) and FLV (container format) only?

So I want to compile FFMpeg C libraries (as static) not in its full default glory but for my purposes only - so that it would contain:
2 codec formats MP3 and FLV1 (H.263)
2 file (container) formats MP3 and FLV
Of course I only want to limit only range of formats and codecs. Not functions of FFmpeg.
How to do such thing?
What is my main point:
To have such ffmpeg build that when I call function like
multiplexer::multiplexer(std::string container)
{
av_register_all();
this->format_context = avformat_alloc_context();
if (this->format_context == NULL)
{
std::cout << "Multiplexer: format context is empty." << std::endl;
throw internal_exception();
}
this->format_context->oformat = av_guess_format(container.c_str(), NULL, NULL);
if (this->format_context->oformat == NULL)
{
std::cout << "Multiplexer: invalid container format (" << container << ")." << std::endl;
throw internal_exception();
}
}
I would be limited to FLV only
The configure script has a list of options. Run ./configure --help to see them. I'm not sure how options stack, but you could try --disable-everything followed by --enable-demuxer=flv etc.