Decoding YUYV422 raw images using FFmpeg - c++

I have a collection of sequential YUYV422 raw images that I wish to turn into a video. The problem seems to be that when the frame is created in the avcodec_receive_frame. The frame only contains one channel instead of four in the YUYV format. This results in Input picture width <640> is greater then stride (0) since only the zeroth index of data and linesize is set in the frame. I don't know if this is a ffmpeg bug or a misconfiguration on my part.
#include "icsFfmpegImageDecoder.h"
#include <stdexcept>
ImageDecoder::ImageDecoder(std::string filename)
{
AVInputFormat* iformat;
if (!(iformat = av_find_input_format("image2")))
throw std::invalid_argument(std::string("input Codec not found\n"));
this->fctx = NULL;
if (avformat_open_input(&this->fctx, filename.c_str(), iformat, NULL) < 0)
{
std::string error = "Failed to open input file ";
error += filename;
error += "\n";
throw std::invalid_argument(error);
}
#ifdef LIB_AVFORMAT_STREAM_CODEC_DEPRECATED
if (!(this->codec = avcodec_find_decoder(this->fctx->streams[0]->codecpar->codec_id)))
throw std::invalid_argument(std::string("Failed to find codec\n"));
if (!(this->cctx = avcodec_alloc_context3(this->codec)))
throw std::invalid_argument(std::string("could not create image read context codec"));
if (avcodec_parameters_to_context(this->cctx, this->fctx->streams[0]->codecpar) < 0)
throw std::invalid_argument(std::string("could not get contest codec from stream"));
#else
this->cctx = this->fctx->streams[0]->codec;
if (!(this->codec = avcodec_find_decoder(this->cctx->codec_id)))
throw std::invalid_argument(std::string("Failed to find codec\n"));
#endif
if (this->cctx->codec_id == AV_CODEC_ID_RAWVIDEO) {
// TODO Make Dynamic
this->cctx->pix_fmt = AV_PIX_FMT_YUYV422 ;
this->cctx->height = 800;
this->cctx->width = 1280;
}
if (avcodec_open2(this->cctx, this->codec, NULL) < 0)
throw std::invalid_argument(std::string("Failed to open codec\n"));
#ifdef USING_NEW_AVPACKET_SETUP
if (!(this->pkt = av_packet_alloc()))
throw std::invalid_argument(std::string("Failed to alloc frame\n"));
#else
this->pkt = new AVPacket();
av_init_packet(this->pkt);
#endif
read_file();
}
ImageDecoder::~ImageDecoder()
{
avcodec_close(this->cctx);
avformat_close_input(&this->fctx);
#ifdef USING_NEW_AVPACKET_SETUP
av_packet_free(&this->pkt);
#else
av_free_packet(this->pkt);
delete this->pkt;
#endif
}
void ImageDecoder::read_file()
{
if (av_read_frame(this->fctx, this->pkt) < 0)
throw std::invalid_argument(std::string("Failed to read frame from file\n"));
if (this->pkt->size == 0)
this->ret = -1;
}
#ifdef LIB_AVCODEC_USE_SEND_RECEIVE_NOTATION
void ImageDecoder::send_next_packet() {
if ((this->ret = avcodec_send_packet(this->cctx, this->pkt)) < 0)
throw std::invalid_argument("Error sending a packet for decoding\n");
}
bool ImageDecoder::receive_next_frame(AVFrame* frame)
{
if (this->ret >= 0)
{
this->ret = avcodec_receive_frame(this->cctx, frame);
if (this->ret == AVERROR_EOF)
return false;
else if (this->ret == AVERROR(11))//11 == EAGAIN builder sucks
return false;
else if (this->ret < 0)
throw std::invalid_argument("Error during decoding\n");
return true;
}
return false;
}
#else
void ImageDecoder::decode_frame(AVFrame* frame)
{
int got_frame = 0;
if (avcodec_decode_video2(this->cctx, frame, &got_frame, this->pkt) < 0)
throw std::invalid_argument("Error while decoding frame %d\n");
}
#endif

Related

Wrap audio data of the pcm_alaw type into an MKA audio file using the ffmpeg API

Imagine that in my project, I receive RTP packets with the payload type-8, for later saving this load as the Nth part of the audio track. I extract this load from the RTP packet and save it to a temporary buffer:
...
while ((rtp = receiveRtpPackets()).withoutErrors()) {
payloadData.push(rtp.getPayloadData());
}
audioGenerator.setPayloadData(payloadData);
audioGenerator.recordToFile();
...
After filling a temporary buffer of a certain size with this payload, I process this buffer, namely, extract the entire payload and encode it using ffmpeg for further saving to an audio file in Matroska format. But I have a problem. Since the payload of the RTP packet is type 8, I have to save the raw audio data of the pcm_alaw format to mka audio format. But when saving raw data pcm_alaw to an audio file, I get these messages from the library:
...
[libopus # 0x18eff60] Queue input is backward in time
[libopus # 0x18eff60] Queue input is backward in time
[libopus # 0x18eff60] Queue input is backward in time
[libopus # 0x18eff60] Queue input is backward in time
...
When you open an audio file in vlc, nothing is played (the audio track timestamp is missing).
The task of my project is to simply take pcm_alaw data and pack it in a container, in mka format. The best way to determine the codec is to use the av_guess_codec() function, which in turn automatically selects the desired codec ID. But how do I pack the raw data into the container correctly, I do not know.
It is important to note that I can get as raw data any format of this data (audio formats only) defined by the RTP packet type (All types of RTP packet payload). All I know is that in any case, I have to pack the audio data in an mka container.
I also attach the code (borrowed from this resource) that I use:
audiogenerater.h
extern "C"
{
#include "libavformat/avformat.h"
#include "libavcodec/avcodec.h"
#include "libswresample/swresample.h"
}
class AudioGenerater
{
public:
AudioGenerater();
~AudioGenerater() = default;
void generateAudioFileWithOptions(
QString fileName,
QByteArray pcmData,
int channel,
int bitRate,
int sampleRate,
AVSampleFormat format);
private:
// init Format
bool initFormat(QString audioFileName);
private:
AVCodec *m_AudioCodec = nullptr;
AVCodecContext *m_AudioCodecContext = nullptr;
AVFormatContext *m_FormatContext = nullptr;
AVOutputFormat *m_OutputFormat = nullptr;
};
audiogenerater.cpp
AudioGenerater::AudioGenerater()
{
av_register_all();
avcodec_register_all();
}
AudioGenerater::~AudioGenerater()
{
// ...
}
bool AudioGenerater::initFormat(QString audioFileName)
{
// Create an output Format context
int result = avformat_alloc_output_context2(&m_FormatContext, nullptr, nullptr, audioFileName.toLocal8Bit().data());
if (result < 0) {
return false;
}
m_OutputFormat = m_FormatContext->oformat;
// Create an audio stream
AVStream* audioStream = avformat_new_stream(m_FormatContext, m_AudioCodec);
if (audioStream == nullptr) {
avformat_free_context(m_FormatContext);
return false;
}
// Set the parameters in the stream
audioStream->id = m_FormatContext->nb_streams - 1;
audioStream->time_base = { 1, 8000 };
result = avcodec_parameters_from_context(audioStream->codecpar, m_AudioCodecContext);
if (result < 0) {
avformat_free_context(m_FormatContext);
return false;
}
// Print FormatContext information
av_dump_format(m_FormatContext, 0, audioFileName.toLocal8Bit().data(), 1);
// Open file IO
if (!(m_OutputFormat->flags & AVFMT_NOFILE)) {
result = avio_open(&m_FormatContext->pb, audioFileName.toLocal8Bit().data(), AVIO_FLAG_WRITE);
if (result < 0) {
avformat_free_context(m_FormatContext);
return false;
}
}
return true;
}
void AudioGenerater::generateAudioFileWithOptions(
QString _fileName,
QByteArray _pcmData,
int _channel,
int _bitRate,
int _sampleRate,
AVSampleFormat _format)
{
AVFormatContext* oc;
if (avformat_alloc_output_context2(
&oc, nullptr, nullptr, _fileName.toStdString().c_str())
< 0) {
qDebug() << "Error in line: " << __LINE__;
return;
}
if (!oc) {
printf("Could not deduce output format from file extension: using mka.\n");
avformat_alloc_output_context2(
&oc, nullptr, "mka", _fileName.toStdString().c_str());
}
if (!oc) {
qDebug() << "Error in line: " << __LINE__;
return;
}
AVOutputFormat* fmt = oc->oformat;
if (fmt->audio_codec == AV_CODEC_ID_NONE) {
qDebug() << "Error in line: " << __LINE__;
return;
}
AVCodecID codecID = av_guess_codec(
fmt, nullptr, _fileName.toStdString().c_str(), nullptr, AVMEDIA_TYPE_AUDIO);
// Find Codec
m_AudioCodec = avcodec_find_encoder(codecID);
if (m_AudioCodec == nullptr) {
qDebug() << "Error in line: " << __LINE__;
return;
}
// Create an encoder context
m_AudioCodecContext = avcodec_alloc_context3(m_AudioCodec);
if (m_AudioCodecContext == nullptr) {
qDebug() << "Error in line: " << __LINE__;
return;
}
// Setting parameters
m_AudioCodecContext->bit_rate = _bitRate;
m_AudioCodecContext->sample_rate = _sampleRate;
m_AudioCodecContext->sample_fmt = _format;
m_AudioCodecContext->channels = _channel;
m_AudioCodecContext->channel_layout = av_get_default_channel_layout(_channel);
m_AudioCodecContext->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
// Turn on the encoder
int result = avcodec_open2(m_AudioCodecContext, m_AudioCodec, nullptr);
if (result < 0) {
avcodec_free_context(&m_AudioCodecContext);
if (m_FormatContext != nullptr)
avformat_free_context(m_FormatContext);
return;
}
// Create a package
if (!initFormat(_fileName)) {
avcodec_free_context(&m_AudioCodecContext);
if (m_FormatContext != nullptr)
avformat_free_context(m_FormatContext);
return;
}
// write to the file header
result = avformat_write_header(m_FormatContext, nullptr);
if (result < 0) {
avcodec_free_context(&m_AudioCodecContext);
if (m_FormatContext != nullptr)
avformat_free_context(m_FormatContext);
return;
}
// Create Frame
AVFrame* frame = av_frame_alloc();
if (frame == nullptr) {
avcodec_free_context(&m_AudioCodecContext);
if (m_FormatContext != nullptr)
avformat_free_context(m_FormatContext);
return;
}
int nb_samples = 0;
if (m_AudioCodecContext->codec->capabilities & AV_CODEC_CAP_VARIABLE_FRAME_SIZE) {
nb_samples = 10000;
}
else {
nb_samples = m_AudioCodecContext->frame_size;
}
// Set the parameters of the Frame
frame->nb_samples = nb_samples;
frame->format = m_AudioCodecContext->sample_fmt;
frame->channel_layout = m_AudioCodecContext->channel_layout;
// Apply for data memory
result = av_frame_get_buffer(frame, 0);
if (result < 0) {
av_frame_free(&frame);
{
avcodec_free_context(&m_AudioCodecContext);
if (m_FormatContext != nullptr)
avformat_free_context(m_FormatContext);
return;
}
}
// Set the Frame to be writable
result = av_frame_make_writable(frame);
if (result < 0) {
av_frame_free(&frame);
{
avcodec_free_context(&m_AudioCodecContext);
if (m_FormatContext != nullptr)
avformat_free_context(m_FormatContext);
return;
}
}
int perFrameDataSize = frame->linesize[0];
int count = _pcmData.size() / perFrameDataSize;
bool needAddOne = false;
if (_pcmData.size() % perFrameDataSize != 0) {
count++;
needAddOne = true;
}
int frameCount = 0;
for (int i = 0; i < count; ++i) {
// Create a Packet
AVPacket* pkt = av_packet_alloc();
if (pkt == nullptr) {
avcodec_free_context(&m_AudioCodecContext);
if (m_FormatContext != nullptr)
avformat_free_context(m_FormatContext);
return;
}
av_init_packet(pkt);
if (i == count - 1)
perFrameDataSize = _pcmData.size() % perFrameDataSize;
// Synthesize WAV files
memset(frame->data[0], 0, perFrameDataSize);
memcpy(frame->data[0], &(_pcmData.data()[perFrameDataSize * i]), perFrameDataSize);
frame->pts = frameCount++;
// send Frame
result = avcodec_send_frame(m_AudioCodecContext, frame);
if (result < 0)
continue;
// Receive the encoded Packet
result = avcodec_receive_packet(m_AudioCodecContext, pkt);
if (result < 0) {
av_packet_free(&pkt);
continue;
}
// write to file
av_packet_rescale_ts(pkt, m_AudioCodecContext->time_base, m_FormatContext->streams[0]->time_base);
pkt->stream_index = 0;
result = av_interleaved_write_frame(m_FormatContext, pkt);
if (result < 0)
continue;
av_packet_free(&pkt);
}
// write to the end of the file
av_write_trailer(m_FormatContext);
// Close file IO
avio_closep(&m_FormatContext->pb);
// Release Frame memory
av_frame_free(&frame);
avcodec_free_context(&m_AudioCodecContext);
if (m_FormatContext != nullptr)
avformat_free_context(m_FormatContext);
}
main.cpp
int main(int argc, char **argv)
{
av_log_set_level(AV_LOG_TRACE);
QFile file("rawDataOfPcmAlawType.bin");
if (!file.open(QIODevice::ReadOnly)) {
return EXIT_FAILURE;
}
QByteArray rawData(file.readAll());
AudioGenerater generator;
generator.generateAudioFileWithOptions(
"test.mka",
rawData,
1,
64000,
8000,
AV_SAMPLE_FMT_S16);
return 0;
}
It is IMPORTANT you help me find the most appropriate way to record pcm_alaw or a different data format in an MKA audio file.
I ask everyone who knows anything to help (there is too little time left to implement this project)
These useful links will help you:
A good overview of the data processing sequence in libav: ffmpeg-libav-tutorial
Examples from the ffmpeg developers themselves: avio_reading, resampling_audio, transcode_aac

Trying to cancel execution and delete file using ffmpeg C API

The code below is a class that handle the conversion of multiples images, through add_frame() method, into a GIF with encode(). It also use a filter to generate and apply the palette. The usage is like this:
Code call example
std::unique_ptr<px::GIF::FFMPEG> gif_obj = nullptr;
try
{
gif_obj = std::make_unique<px::GIF::FFMPEG>({1000,1000}, 12, "C:/out.gif",
"format=pix_fmts=rgb24,split [a][b];[a]palettegen[p];[b][p]paletteuse");
// Example: a simple vector of images (usually process internally)
for(auto img : image_vector)
gif_obj->add_frame(img);
// Once all frame were added, encode the final GIF with the filter applied.
gif_obj->encode();
}
catch(const std::exception& e)
{
// An error occured! We must close FFMPEG properly and delete the created file.
gif_obj->cancel();
}
I have the following issue. If the code for any reason throw an exception, I call ffmpeg->cancel() and it supposes to delete the GIF file on disk. But this is never working, I assume there is a lock on the file or something like that. So here are my question:
What is the proper way to close/free ffmpeg object in order to remove the file afterward ?
Full class code below
Header
// C++ Standard includes
#include <memory>
#include <string>
#include <vector>
// 3rd Party incldues
#ifdef __cplusplus
extern "C" {
#include "libavformat/avformat.h"
#include "libavfilter/avfilter.h"
#include "libavutil/opt.h"
#include "libavfilter/buffersrc.h"
#include "libavfilter/buffersink.h"
#include "libswscale/swscale.h"
#include "libavutil/imgutils.h"
}
#endif
#define FFMPEG_MSG_LEN 2000
namespace px
{
namespace GIF
{
class FFMPEG
{
public:
FFMPEG(const px::Point2D<int>& dim,
const int framerate,
const std::string& filename,
const std::string& filter_cmd);
~FFMPEG();
void add_frame(pxImage * const img);
void encode();
void cancel();
private:
void init_filters(); // Init everything that needed to filter the input frame.
void init_muxer(); // The muxer that creates the output file.
void muxing_one_frame(AVFrame* frame);
void release();
int _ret = 0; // status code from FFMPEG.
char _err_msg[FFMPEG_MSG_LEN]; // Error message buffer.
int m_width = 0; // The width that all futur images must have to be accepted.
int m_height = 0; // The height that all futur images must have to be accepted.
int m_framerate = 0; // GIF Framerate.
std::string m_filename = ""; // The GIF filename (on cache?)
std::string m_filter_desc = ""; // The FFMPEG filter to apply over the frames.
bool as_frame = false;
AVFrame* picture_rgb24 = nullptr; // Temporary frame that will hold the pxImage in an RGB24 format (NOTE: TOP-LEFT origin)
AVFormatContext* ofmt_ctx = nullptr; // ouput format context associated to the
AVCodecContext* o_codec_ctx = nullptr; // output codec for the GIF
AVFilterGraph* filter_graph = nullptr; // filter graph associate with the string we want to execute
AVFilterContext* buffersrc_ctx = nullptr; // The buffer that will store all the frames in one place for the palette generation.
AVFilterContext* buffersink_ctx = nullptr; // The buffer that will store the result afterward (once the palette are used).
int64_t m_pts_increment = 0;
};
};
};
ctor
px::GIF::FFMPEG::FFMPEG(const px::Point2D<int>& dim,
const int framerate,
const std::string& filename,
const std::string& filter_cmd) :
m_width(dim.x()),
m_height(dim.y()),
m_framerate(framerate),
m_filename(filename),
m_filter_desc(filter_cmd)
{
#if !_DEBUG
av_log_set_level(AV_LOG_QUIET); // Set the FFMPEG log to quiet to avoid too much logs.
#endif
// Allocate the temporary buffer that hold the ffmpeg image (pxImage to AVFrame conversion).
picture_rgb24 = av_frame_alloc();
picture_rgb24->pts = 0;
picture_rgb24->data[0] = NULL;
picture_rgb24->linesize[0] = -1;
picture_rgb24->format = AV_PIX_FMT_RGB24;
picture_rgb24->height = m_height;
picture_rgb24->width = m_width;
if ((_ret = av_image_alloc(picture_rgb24->data, picture_rgb24->linesize, m_width, m_height, (AVPixelFormat)picture_rgb24->format, 24)) < 0)
throw px::GIF::Error("Failed to allocate the AVFrame for pxImage conversion with error: " +
std::string(av_make_error_string(_err_msg, FFMPEG_MSG_LEN, _ret)),
"GIF::FFMPEG CTOR");
//printf("allocated picture of size %d, linesize %d %d %d %d\n", _ret, picture_rgb24->linesize[0], picture_rgb24->linesize[1], picture_rgb24->linesize[2], picture_rgb24->linesize[3]);
init_muxer(); // Prepare the GIF encoder (open it on disk).
init_filters(); // Prepare the filter that will be applied over the frame.
// Instead of hardcoder {1,100} which is the GIF tbn, we collect it from its stream.
// This will avoid future problem if the codec change in ffmpeg.
if (ofmt_ctx && ofmt_ctx->nb_streams > 0)
m_pts_increment = av_rescale_q(1, { 1, m_framerate }, ofmt_ctx->streams[0]->time_base);
else
m_pts_increment = av_rescale_q(1, { 1, m_framerate }, { 1, 100 });
}
FFMPEG Initialization (Filter and muxer)
void px::GIF::FFMPEG::init_filters()
{
const AVFilter* buffersrc = avfilter_get_by_name("buffer");
const AVFilter* buffersink = avfilter_get_by_name("buffersink");
AVRational time_base = { 1, m_framerate };
AVRational aspect_pixel = { 1, 1 };
AVFilterInOut* inputs = avfilter_inout_alloc();
AVFilterInOut* outputs = avfilter_inout_alloc();
filter_graph = avfilter_graph_alloc();
try
{
if (!outputs || !inputs || !filter_graph)
throw px::GIF::Error("Failed to 'init_filters' could not allocated the graph/filters.", "GIF::FFMPEG init_filters");
char args[512];
snprintf(args, sizeof(args),
"video_size=%dx%d:pix_fmt=%d:time_base=%d/%d:pixel_aspect=%d/%d",
m_width, m_height,
picture_rgb24->format,
time_base.num, time_base.den,
aspect_pixel.num, aspect_pixel.den);
if (avfilter_graph_create_filter(&buffersrc_ctx, buffersrc, "in", args, nullptr, filter_graph) < 0)
throw px::GIF::Error("Failed to create the 'source buffer' in init_filer method.", "GIF::FFMPEG init_filters");
if (avfilter_graph_create_filter(&buffersink_ctx, buffersink, "out", nullptr, nullptr, filter_graph) < 0)
throw px::GIF::Error("Failed to create the 'sink buffer' in init_filer method.", "GIF::FFMPEG init_filters");
// GIF has possible output of PAL8.
enum AVPixelFormat pix_fmts[] = { AV_PIX_FMT_PAL8, AV_PIX_FMT_NONE };
if (av_opt_set_int_list(buffersink_ctx, "pix_fmts", pix_fmts, AV_PIX_FMT_NONE, AV_OPT_SEARCH_CHILDREN) < 0)
throw px::GIF::Error("Failed to set the output pixel format.", "GIF::FFMPEG init_filters");
outputs->name = av_strdup("in");
outputs->filter_ctx = buffersrc_ctx;
outputs->pad_idx = 0;
outputs->next = nullptr;
inputs->name = av_strdup("out");
inputs->filter_ctx = buffersink_ctx;
inputs->pad_idx = 0;
inputs->next = nullptr;
// GIF has possible output of PAL8.
if (avfilter_graph_parse_ptr(filter_graph, m_filter_desc.c_str(), &inputs, &outputs, nullptr) < 0)
throw px::GIF::Error("Failed to parse the filter graph (bad string!).", "GIF::FFMPEG init_filters");
if (avfilter_graph_config(filter_graph, nullptr) < 0)
throw px::GIF::Error("Failed to configure the filter graph (bad string!).", "GIF::FFMPEG init_filters");
avfilter_inout_free(&inputs);
avfilter_inout_free(&outputs);
}
catch (const std::exception& e)
{
// Catch exception to delete element.
avfilter_inout_free(&inputs);
avfilter_inout_free(&outputs);
throw e; // re-throuw
}
}
void px::GIF::FFMPEG::init_muxer()
{
AVOutputFormat* o_fmt = av_guess_format("gif", m_filename.c_str(), "video/gif");
if ((_ret = avformat_alloc_output_context2(&ofmt_ctx, o_fmt, "gif", m_filename.c_str())) < 0)
throw px::GIF::Error(std::string(av_make_error_string(_err_msg, FFMPEG_MSG_LEN, _ret)) + " allocate output format.", "GIF::FFMPEG init_muxer");
AVCodec* codec = avcodec_find_encoder(AV_CODEC_ID_GIF);
if (!codec) throw px::GIF::Error("Could to find the 'GIF' codec.", "GIF::FFMPEG init_muxer");
#if 0
const AVPixelFormat* p = codec->pix_fmts;
while (p != NULL && *p != AV_PIX_FMT_NONE) {
printf("supported pix fmt: %s\n", av_get_pix_fmt_name(*p));
++p;
}
#endif
AVStream* stream = avformat_new_stream(ofmt_ctx, codec);
AVCodecParameters* codec_paramters = stream->codecpar;
codec_paramters->codec_tag = 0;
codec_paramters->codec_id = codec->id;
codec_paramters->codec_type = AVMEDIA_TYPE_VIDEO;
codec_paramters->width = m_width;
codec_paramters->height = m_height;
codec_paramters->format = AV_PIX_FMT_PAL8;
o_codec_ctx = avcodec_alloc_context3(codec);
avcodec_parameters_to_context(o_codec_ctx, codec_paramters);
o_codec_ctx->time_base = { 1, m_framerate };
if (ofmt_ctx->oformat->flags & AVFMT_GLOBALHEADER)
o_codec_ctx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
if ((_ret = avcodec_open2(o_codec_ctx, codec, NULL)) < 0)
throw px::GIF::Error(std::string(av_make_error_string(_err_msg, FFMPEG_MSG_LEN, _ret)) + " open output codec.", "GIF::FFMPEG init_muxer");
if ((_ret = avio_open(&ofmt_ctx->pb, m_filename.c_str(), AVIO_FLAG_WRITE)) < 0)
throw px::GIF::Error(std::string(av_make_error_string(_err_msg, FFMPEG_MSG_LEN, _ret)) + " avio open error.", "GIF::FFMPEG init_muxer");
if ((_ret = avformat_write_header(ofmt_ctx, NULL)) < 0)
throw px::GIF::Error(std::string(av_make_error_string(_err_msg, FFMPEG_MSG_LEN, _ret)) + " write GIF header", "GIF::FFMPEG init_muxer");
#if _DEBUG
// This print the stream/output format.
av_dump_format(ofmt_ctx, -1, m_filename.c_str(), 1);
#endif
}
Add frame (usually in a loop)
void px::GIF::FFMPEG::add_frame(pxImage * const img)
{
if (img->getImageType() != PXT_BYTE || img->getNChannels() != 4)
throw px::GIF::Error("Failed to 'add_frame' since image is not PXT_BYTE and 4-channels.", "GIF::FFMPEG add_frame");
if (img->getWidth() != m_width || img->getHeight() != m_height)
throw px::GIF::Error("Failed to 'add_frame' since the size is not same to other inputs.", "GIF::FFMPEG add_frame");
const int pitch = picture_rgb24->linesize[0];
auto px_ptr = getImageAccessor<pxUChar_C4>(img);
for (int y = 0; y < m_height; y++)
{
const int px_row = img->getOrigin() == ORIGIN_BOT_LEFT ? m_height - y - 1 : y;
for (int x = 0; x < m_width; x++)
{
const int idx = y * pitch + 3 * x;
picture_rgb24->data[0][idx] = px_ptr[px_row][x].ch[PX_RE];
picture_rgb24->data[0][idx + 1] = px_ptr[px_row][x].ch[PX_GR];
picture_rgb24->data[0][idx + 2] = px_ptr[px_row][x].ch[PX_BL];
}
}
// palettegen need a whole stream, just add frame to buffer.
if ((_ret = av_buffersrc_add_frame_flags(buffersrc_ctx, picture_rgb24, AV_BUFFERSRC_FLAG_KEEP_REF)) < 0)
throw px::GIF::Error("Failed to 'add_frame' to global buffer with error: " +
std::string(av_make_error_string(_err_msg, FFMPEG_MSG_LEN, _ret)),
"GIF::FFMPEG add_frame");
// Increment the FPS of the picture for the next add-up to the buffer.
picture_rgb24->pts += m_pts_increment;
as_frame = true;
}
Encoder (final step)
void px::GIF::FFMPEG::encode()
{
if (!as_frame)
throw px::GIF::Error("Please 'add_frame' before running the Encoding().", "GIF::FFMPEG encode");
// end of buffer
if ((_ret = av_buffersrc_add_frame_flags(buffersrc_ctx, nullptr, AV_BUFFERSRC_FLAG_KEEP_REF)) < 0)
throw px::GIF::Error("error add frame to buffer source: " + std::string(av_make_error_string(_err_msg, FFMPEG_MSG_LEN, _ret)), "GIF::FFMPEG encode");
do {
AVFrame* filter_frame = av_frame_alloc();
_ret = av_buffersink_get_frame(buffersink_ctx, filter_frame);
if (_ret == AVERROR(EAGAIN) || _ret == AVERROR_EOF) {
av_frame_unref(filter_frame);
break;
}
// write the filter frame to output file
muxing_one_frame(filter_frame);
av_frame_unref(filter_frame);
} while (_ret >= 0);
av_write_trailer(ofmt_ctx);
}
void px::GIF::FFMPEG::muxing_one_frame(AVFrame* frame)
{
int ret = avcodec_send_frame(o_codec_ctx, frame);
AVPacket *pkt = av_packet_alloc();
av_init_packet(pkt);
while (ret >= 0) {
ret = avcodec_receive_packet(o_codec_ctx, pkt);
if (ret == AVERROR(EAGAIN) || ret == AVERROR_EOF) {
break;
}
av_write_frame(ofmt_ctx, pkt);
}
av_packet_unref(pkt);
}
DTOR, Release and Cancel
px::GIF::FFMPEG::~FFMPEG()
{
release();
}
void px::GIF::FFMPEG::release()
{
// Muxer stuffs
if (ofmt_ctx != nullptr) avformat_free_context(ofmt_ctx);
if (o_codec_ctx != nullptr) avcodec_close(o_codec_ctx);
if (o_codec_ctx != nullptr) avcodec_free_context(&o_codec_ctx);
ofmt_ctx = nullptr;
o_codec_ctx = nullptr;
// Filter stuffs
if (buffersrc_ctx != nullptr) avfilter_free(buffersrc_ctx);
if (buffersink_ctx != nullptr) avfilter_free(buffersink_ctx);
if (filter_graph != nullptr) avfilter_graph_free(&filter_graph);
buffersrc_ctx = nullptr;
buffersink_ctx = nullptr;
filter_graph = nullptr;
// Conversion image.
if (picture_rgb24 != nullptr) av_frame_free(&picture_rgb24);
picture_rgb24 = nullptr;
}
void px::GIF::FFMPEG::cancel()
{
// In-case of failure we must close ffmpeg and exit.
av_write_trailer(ofmt_ctx);
// Release and close all elements.
release();
// Delete the file on disk.
if (remove(m_filename.c_str()) != 0)
PX_LOG0(PX_LOGLEVEL_ERROR, "GIF::FFMPEG - On 'cancel' failed to remove the file.");
}
Took me a while but finally get it!
I was missing a avio_close(ofmt_ctx->pb); in my cancel method.
Once the file is released from ffmpeg, the std::remove() works like a charm.
Note, the av_write_trailer and avio_close should only be called if the init_muxer successfully executed, so I had a member variable to flag success or not. Then, I do the appropriate call in the cancel.

What to pass to avcodec_decode_video2 for H.264 Transport Stream?

I want to decode H.264 video from a collection of MPEG-2 Transport Stream packets but I am not clear what to pass to avcodec_decode_video2
The documentation says to pass "the input AVPacket containing the input buffer."
But what should be in the input buffer?
A PES packet will be spread across the payload portion of several TS packets, with NALU(s) inside the PES. So pass a TS fragment? The entire PES? PES payload only?
This Sample Code mentions:
BUT some other codecs (msmpeg4, mpeg4) are inherently frame based, so
you must call them with all the data for one frame exactly. You must
also initialize 'width' and 'height' before initializing them.
But I can find no info on what "all the data" means...
Passing a fragment of a TS packet payload is not working:
AVPacket avDecPkt;
av_init_packet(&avDecPkt);
avDecPkt.data = inbuf_ptr;
avDecPkt.size = esBufSize;
len = avcodec_decode_video2(mpDecoderContext, mpFrameDec, &got_picture, &avDecPkt);
if (len < 0)
{
printf(" TS PKT #%.0f. Error decoding frame #%04d [rc=%d '%s']\n",
tsPacket.pktNum, mDecodedFrameNum, len, av_make_error_string(errMsg, 128, len));
return;
}
output
[h264 # 0x81cd2a0] no frame!
TS PKT #2973. Error decoding frame #0001 [rc=-1094995529 'Invalid data found when processing input']
EDIT
Using the excellent hits from WLGfx, I made this simple program to try decoding TS packets. As input, I prepared a file containing only TS packets from the Video PID.
It feels close but I don't know how to set up the FormatContext. The code below segfaults at av_read_frame() (and internally at ret = s->iformat->read_packet(s, pkt)). s->iformat is zero.
Suggestions?
EDIT II - Sorry, for got post source code **
**EDIT III - Sample code updated to simulate reading TS PKT Queue
/*
* Test program for video decoder
*/
#include <stdio.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
extern "C" {
#ifdef __cplusplus
#define __STDC_CONSTANT_MACROS
#ifdef _STDINT_H
#undef _STDINT_H
#endif
#include <stdint.h>
#endif
}
extern "C" {
#include "libavcodec/avcodec.h"
#include "libavformat/avformat.h"
#include "libswscale/swscale.h"
#include "libavutil/imgutils.h"
#include "libavutil/opt.h"
}
class VideoDecoder
{
public:
VideoDecoder();
bool rcvTsPacket(AVPacket &inTsPacket);
private:
AVCodec *mpDecoder;
AVCodecContext *mpDecoderContext;
AVFrame *mpDecodedFrame;
AVFormatContext *mpFmtContext;
};
VideoDecoder::VideoDecoder()
{
av_register_all();
// FORMAT CONTEXT SETUP
mpFmtContext = avformat_alloc_context();
mpFmtContext->flags = AVFMT_NOFILE;
// ????? WHAT ELSE ???? //
// DECODER SETUP
mpDecoder = avcodec_find_decoder(AV_CODEC_ID_H264);
if (!mpDecoder)
{
printf("Could not load decoder\n");
exit(11);
}
mpDecoderContext = avcodec_alloc_context3(NULL);
if (avcodec_open2(mpDecoderContext, mpDecoder, NULL) < 0)
{
printf("Cannot open decoder context\n");
exit(1);
}
mpDecodedFrame = av_frame_alloc();
}
bool
VideoDecoder::rcvTsPacket(AVPacket &inTsPkt)
{
bool ret = true;
if ((av_read_frame(mpFmtContext, &inTsPkt)) < 0)
{
printf("Error in av_read_frame()\n");
ret = false;
}
else
{
// success. Decode the TS packet
int got;
int len = avcodec_decode_video2(mpDecoderContext, mpDecodedFrame, &got, &inTsPkt);
if (len < 0)
ret = false;
if (got)
printf("GOT A DECODED FRAME\n");
}
return ret;
}
int
main(int argc, char **argv)
{
if (argc != 2)
{
printf("Usage: %s tsInFile\n", argv[0]);
exit(1);
}
FILE *tsInFile = fopen(argv[1], "r");
if (!tsInFile)
{
perror("Could not open TS input file");
exit(2);
}
unsigned int tsPktNum = 0;
uint8_t tsBuffer[256];
AVPacket tsPkt;
av_init_packet(&tsPkt);
VideoDecoder vDecoder;
while (!feof(tsInFile))
{
tsPktNum++;
tsPkt.size = 188;
tsPkt.data = tsBuffer;
fread(tsPkt.data, 188, 1, tsInFile);
vDecoder.rcvTsPacket(tsPkt);
}
}
I've got some code snippets that might help you out as I've been working with MPEG-TS also.
Starting with my packet thread which checks each packet against the stream ID's which I've already found and got the codec contexts:
void *FFMPEG::thread_packet_function(void *arg) {
FFMPEG *ffmpeg = (FFMPEG*)arg;
for (int c = 0; c < MAX_PACKETS; c++)
ffmpeg->free_packets[c] = &ffmpeg->packet_list[c];
ffmpeg->packet_pos = MAX_PACKETS;
Audio.start_decoding();
Video.start_decoding();
Subtitle.start_decoding();
while (!ffmpeg->thread_quit) {
if (ffmpeg->packet_pos != 0 &&
Audio.okay_add_packet() &&
Video.okay_add_packet() &&
Subtitle.okay_add_packet()) {
pthread_mutex_lock(&ffmpeg->packet_mutex); // get free packet
AVPacket *pkt = ffmpeg->free_packets[--ffmpeg->packet_pos]; // pre decrement
pthread_mutex_unlock(&ffmpeg->packet_mutex);
if ((av_read_frame(ffmpeg->fContext, pkt)) >= 0) { // success
int id = pkt->stream_index;
if (id == ffmpeg->aud_stream.stream_id) Audio.add_packet(pkt);
else if (id == ffmpeg->vid_stream.stream_id) Video.add_packet(pkt);
else if (id == ffmpeg->sub_stream.stream_id) Subtitle.add_packet(pkt);
else { // unknown packet
av_packet_unref(pkt);
pthread_mutex_lock(&ffmpeg->packet_mutex); // put packet back
ffmpeg->free_packets[ffmpeg->packet_pos++] = pkt;
pthread_mutex_unlock(&ffmpeg->packet_mutex);
//LOGI("Dumping unknown packet, id %d", id);
}
} else {
av_packet_unref(pkt);
pthread_mutex_lock(&ffmpeg->packet_mutex); // put packet back
ffmpeg->free_packets[ffmpeg->packet_pos++] = pkt;
pthread_mutex_unlock(&ffmpeg->packet_mutex);
//LOGI("No packet read");
}
} else { // buffers full so yield
//LOGI("Packet reader on hold: Audio-%d, Video-%d, Subtitle-%d",
// Audio.packet_pos, Video.packet_pos, Subtitle.packet_pos);
usleep(1000);
//sched_yield();
}
}
return 0;
}
Each decoder for audio, video and subtitles have their own threads which receive the packets from the above thread in ring buffers. I've had to separate the decoders into their own threads because CPU usage was increasing when I started using the deinterlace filter.
My video decoder reads the packets from the buffers and when it has finished with the packet sends it back to be unref'd and can be used again. Balancing the packet buffers doesn't take that much time once everything is running.
Here's the snipped from my video decoder:
void *VideoManager::decoder(void *arg) {
LOGI("Video decoder started");
VideoManager *mgr = (VideoManager *)arg;
while (!ffmpeg.thread_quit) {
pthread_mutex_lock(&mgr->packet_mutex);
if (mgr->packet_pos != 0) {
// fetch first packet to decode
AVPacket *pkt = mgr->packets[0];
// shift list down one
for (int c = 1; c < mgr->packet_pos; c++) {
mgr->packets[c-1] = mgr->packets[c];
}
mgr->packet_pos--;
pthread_mutex_unlock(&mgr->packet_mutex); // finished with packets array
int got;
AVFrame *frame = ffmpeg.vid_stream.frame;
avcodec_decode_video2(ffmpeg.vid_stream.context, frame, &got, pkt);
ffmpeg.finished_with_packet(pkt);
if (got) {
#ifdef INTERLACE_ALL
if (!frame->interlaced_frame) mgr->add_av_frame(frame, 0);
else {
if (!mgr->filter_initialised) mgr->init_filter_graph(frame);
av_buffersrc_add_frame_flags(mgr->filter_src_ctx, frame, AV_BUFFERSRC_FLAG_KEEP_REF);
int c = 0;
while (true) {
AVFrame *filter_frame = ffmpeg.vid_stream.filter_frame;
int result = av_buffersink_get_frame(mgr->filter_sink_ctx, filter_frame);
if (result == AVERROR(EAGAIN) ||
result == AVERROR(AVERROR_EOF) ||
result < 0)
break;
mgr->add_av_frame(filter_frame, c++);
av_frame_unref(filter_frame);
}
//LOGI("Interlaced %d frames, decode %d, playback %d", c, mgr->decode_pos, mgr->playback_pos);
}
#elif defined(INTERLACE_HALF)
if (!frame->interlaced_frame) mgr->add_av_frame(frame, 0);
else {
if (!mgr->filter_initialised) mgr->init_filter_graph(frame);
av_buffersrc_add_frame_flags(mgr->filter_src_ctx, frame, AV_BUFFERSRC_FLAG_KEEP_REF);
int c = 0;
while (true) {
AVFrame *filter_frame = ffmpeg.vid_stream.filter_frame;
int result = av_buffersink_get_frame(mgr->filter_sink_ctx, filter_frame);
if (result == AVERROR(EAGAIN) ||
result == AVERROR(AVERROR_EOF) ||
result < 0)
break;
mgr->add_av_frame(filter_frame, c++);
av_frame_unref(filter_frame);
}
//LOGI("Interlaced %d frames, decode %d, playback %d", c, mgr->decode_pos, mgr->playback_pos);
}
#else
mgr->add_av_frame(frame, 0);
#endif
}
//LOGI("decoded video packet");
} else {
pthread_mutex_unlock(&mgr->packet_mutex);
}
}
LOGI("Video decoder ended");
}
As you can see, I'm using a mutex when passing packets back and forth.
Once a frame has been got I just copy the YUV buffers from the frame for later use into another buffer list. I don't convert the YUV, I use a shader which converts the YUV to RGB on the GPU.
The next snippet adds my decoded frame to my buffer list. This may help understand how to deal with the data.
void VideoManager::add_av_frame(AVFrame *frame, int field_num) {
int y_linesize = frame->linesize[0];
int u_linesize = frame->linesize[1];
int hgt = frame->height;
int y_buffsize = y_linesize * hgt;
int u_buffsize = u_linesize * hgt / 2;
int buffsize = y_buffsize + u_buffsize + u_buffsize;
VideoBuffer *buffer = &buffers[decode_pos];
if (ffmpeg.is_network && playback_pos == decode_pos) { // patched 25/10/16 wlgfx
buffer->used = false;
if (!buffer->data) buffer->data = (char*)mem.alloc(buffsize);
if (!buffer->data) {
LOGI("Dropped frame, allocation error");
return;
}
} else if (playback_pos == decode_pos) {
LOGI("Dropped frame, ran out of decoder frame buffers");
return;
} else if (!buffer->data) {
buffer->data = (char*)mem.alloc(buffsize);
if (!buffer->data) {
LOGI("Dropped frame, allocation error.");
return;
}
}
buffer->y_frame = buffer->data;
buffer->u_frame = buffer->y_frame + y_buffsize;
buffer->v_frame = buffer->y_frame + y_buffsize + u_buffsize;
buffer->wid = frame->width;
buffer->hgt = hgt;
buffer->y_linesize = y_linesize;
buffer->u_linesize = u_linesize;
int64_t pts = av_frame_get_best_effort_timestamp(frame);
buffer->pts = pts;
buffer->buffer_size = buffsize;
double field_add = av_q2d(ffmpeg.vid_stream.context->time_base) * field_num;
buffer->frame_time = av_q2d(ts_stream) * pts + field_add;
memcpy(buffer->y_frame, frame->data[0], (size_t) (buffer->y_linesize * buffer->hgt));
memcpy(buffer->u_frame, frame->data[1], (size_t) (buffer->u_linesize * buffer->hgt / 2));
memcpy(buffer->v_frame, frame->data[2], (size_t) (buffer->u_linesize * buffer->hgt / 2));
buffer->used = true;
decode_pos = (++decode_pos) % MAX_VID_BUFFERS;
//if (field_num == 0) LOGI("Video %.2f, %d - %d",
// buffer->frame_time - Audio.pts_start_time, decode_pos, playback_pos);
}
If there's anything else that I may be able to help with just give me a shout. :-)
EDIT:
The snippet how I open my video stream context which automatically determines the codec, whether it is h264, mpeg2, or another:
void FFMPEG::open_video_stream() {
vid_stream.stream_id = av_find_best_stream(fContext, AVMEDIA_TYPE_VIDEO,
-1, -1, &vid_stream.codec, 0);
if (vid_stream.stream_id == -1) return;
vid_stream.context = fContext->streams[vid_stream.stream_id]->codec;
if (!vid_stream.codec || avcodec_open2(vid_stream.context,
vid_stream.codec, NULL) < 0) {
vid_stream.stream_id = -1;
return;
}
vid_stream.frame = av_frame_alloc();
vid_stream.filter_frame = av_frame_alloc();
}
EDIT2:
This is how I've opened the input stream, whether it be file or URL. The AVFormatContext is the main context for the stream.
bool FFMPEG::start_stream(char *url_, float xtrim, float ytrim, int gain) {
aud_stream.stream_id = -1;
vid_stream.stream_id = -1;
sub_stream.stream_id = -1;
this->url = url_;
this->xtrim = xtrim;
this->ytrim = ytrim;
Audio.volume = gain;
Audio.init();
Video.init();
fContext = avformat_alloc_context();
if ((avformat_open_input(&fContext, url_, NULL, NULL)) != 0) {
stop_stream();
return false;
}
if ((avformat_find_stream_info(fContext, NULL)) < 0) {
stop_stream();
return false;
}
// network stream will overwrite packets if buffer is full
is_network = url.substr(0, 4) == "udp:" ||
url.substr(0, 4) == "rtp:" ||
url.substr(0, 5) == "rtsp:" ||
url.substr(0, 5) == "http:"; // added for wifi broadcasting ability
// determine if stream is audio only
is_mp3 = url.substr(url.size() - 4) == ".mp3";
LOGI("Stream: %s", url_);
if (!open_audio_stream()) {
stop_stream();
return false;
}
if (is_mp3) {
vid_stream.stream_id = -1;
sub_stream.stream_id = -1;
} else {
open_video_stream();
open_subtitle_stream();
if (vid_stream.stream_id == -1) { // switch to audio only
close_subtitle_stream();
is_mp3 = true;
}
}
LOGI("Audio: %d, Video: %d, Subtitle: %d",
aud_stream.stream_id,
vid_stream.stream_id,
sub_stream.stream_id);
if (aud_stream.stream_id != -1) {
LOGD("Audio stream time_base {%d, %d}",
aud_stream.context->time_base.num,
aud_stream.context->time_base.den);
}
if (vid_stream.stream_id != -1) {
LOGD("Video stream time_base {%d, %d}",
vid_stream.context->time_base.num,
vid_stream.context->time_base.den);
}
LOGI("Starting packet and decode threads");
thread_quit = false;
pthread_create(&thread_packet, NULL, &FFMPEG::thread_packet_function, this);
Display.set_overlay_timout(3.0);
return true;
}
EDIT: (constructing an AVPacket)
Construct an AVPacket to send to the decoder...
AVPacket packet;
av_init_packet(&packet);
packet.data = myTSpacketdata; // pointer to the TS packet
packet.size = 188;
You should be able to reuse the packet. And it might need unref'ing.
You must first use the avcodec library to get the compressed frames out of the file. Then you can decode them using avcodec_decode_video2. look at this tutorial http://dranger.com/ffmpeg/

How to seek by msec with ffmpeg?

I am trying to seek in video by milliseconds with ffmpeg. I have been trying to use code from this question, which uses avformat_seek_file (i use it with -1 for stream number and AVSEEK_FLAG_ANY flag).
After that is called, i try to read next frames, that is:
if (av_read_frame(fmt_ctx, &pkt) >= 0)
{
int ret = 0;
if (pkt.stream_index == video_stream_idx) {
/* decode video frame */
ret = avcodec_decode_video2(video_dec_ctx, frame, got_frame, &pkt);
if (ret < 0) {
fprintf(stderr, "Error decoding video frame\n");
return ret;
}
//do something with frame
}
However, the frame->pts of retrieved frame always holds the time of the frame that was immediatly after last frame that was read before seeking.
Edit: In spite of frame->pts forming unbroken sequence, seeking does occur. For some bizarre reason next frame i read is the first one. In fact, after i run:
int got_frame = 0;
do
if (av_read_frame(fmt_ctx, &pkt) >= 0) {
decode_packet_ro(&got_frame, 0);
av_free_packet(&pkt);
}
else
{
read_cache = true;
pkt.data = NULL;
pkt.size = 0;
break;
}
while(!got_frame || this->frame->pts*av_q2d(video_dec_ctx->time_base) * 1000 < tsms);
next frame i read is always the first one.
In the end, i was able to seek with the following code:
/*!
* \brief ffmpeg_reader::seekMs seek to millisecond
* \param tsms timestamp
* \return success of seeking
*/
bool ffmpeg_reader::seekFrame(int s_frame)
{
if (!isOk())
return false;
printf("\t avformat_seek_file to %d\n",s_frame);
int flags = AVSEEK_FLAG_FRAME;
if (s_frame < this->frame->pkt_dts)
{
flags |= AVSEEK_FLAG_BACKWARD;
}
if(av_seek_frame(fmt_ctx,video_stream_idx,s_frame,flags))
{
printf("\nFailed to seek for time %d",s_frame);
return false;
}
avcodec_flush_buffers(video_dec_ctx);
/*read frame without converting it*/
int got_frame = 0;
do
if (av_read_frame(fmt_ctx, &pkt) == 0) {
decode_packet(&got_frame, 0, false);
av_free_packet(&pkt);
}
else
{
read_cache = true;
pkt.data = NULL;
pkt.size = 0;
break;
}
while(!(got_frame && this->frame->pkt_dts >= s_frame));
return true;
}
I did not came up with it myself, but i (sadly) can't remember where the credit is due.

Audio output with video processing with opencv

I am processing video with opencv, but at the same time I need to play audio and simply control it, like loud or current frame number.
I think I should create a parallel process with ffmpeg, but I don't know how to do so. Can you explain what to do?
Or do you know another solution?
I think ffmpeg should be used to play audio and SDL for video in this case.
After opening the file with OpenCV and processing the frame, you can use OpenCV -> SDL to display it while retrieving the audio frames through ffmpeg and playing them with SDL.
Here is a nice collection of ffmpeg/SDL tutorials!
I also found a nice post that shows how to capture frames from a video file using ffmpeg, store them in OpenCV cv::Mat and display the result in a OpenCV window. But this way you can't play audio since OpenCV doesn't deal with that.
You might be interested in reading this post as well: How to avoid a growing delay with ffmpeg between sound and raw video data ?
EDIT:
I spent the last 4hrs coding a prototype to demonstrate how it's done. This demo reads video frames through OpenCV (so you can process them) and audio through ffmpeg, and SDL is used to play both! There are 2 limitations in this demo you must be aware: 1 - it assumes you are working with an OpenCV image packed as BGR (24bits), and 2 - audio and video are not being sync! Yes, I left have some work for you to do (yeeeey). But don't panic, page 6 has some ideas!
It's important to sync audio and video because you will be doing some processing on the frames, and that will certainly make the video and audio go out of sync real fast since they are being played independently of each other.
The ffmpeg tutorials I suggested above are very very important to understand the code, a lot of code from this demo came from there. They show how to deal with SDL, and how to read packets of audio/video streams.
#include <highgui.h>
#include <cv.h>
extern "C"
{
#include <SDL.h>
#include <SDL_thread.h>
#include <avcodec.h>
#include <avformat.h>
}
#include <iostream>
#include <stdio.h>
//#include <malloc.h>
using namespace cv;
#define SDL_AUDIO_BUFFER_SIZE 1024
typedef struct PacketQueue
{
AVPacketList *first_pkt, *last_pkt;
int nb_packets;
int size;
SDL_mutex *mutex;
SDL_cond *cond;
} PacketQueue;
PacketQueue audioq;
int audioStream = -1;
int videoStream = -1;
int quit = 0;
SDL_Surface* screen = NULL;
SDL_Surface* surface = NULL;
AVFormatContext* pFormatCtx = NULL;
AVCodecContext* aCodecCtx = NULL;
AVCodecContext* pCodecCtx = NULL;
void show_frame(IplImage* img)
{
if (!screen)
{
screen = SDL_SetVideoMode(img->width, img->height, 0, 0);
if (!screen)
{
fprintf(stderr, "SDL: could not set video mode - exiting\n");
exit(1);
}
}
// Assuming IplImage packed as BGR 24bits
SDL_Surface* surface = SDL_CreateRGBSurfaceFrom((void*)img->imageData,
img->width,
img->height,
img->depth * img->nChannels,
img->widthStep,
0xff0000, 0x00ff00, 0x0000ff, 0
);
SDL_BlitSurface(surface, 0, screen, 0);
SDL_Flip(screen);
}
void packet_queue_init(PacketQueue *q)
{
memset(q, 0, sizeof(PacketQueue));
q->mutex = SDL_CreateMutex();
q->cond = SDL_CreateCond();
}
int packet_queue_put(PacketQueue *q, AVPacket *pkt)
{
AVPacketList *pkt1;
if (av_dup_packet(pkt) < 0)
{
return -1;
}
//pkt1 = (AVPacketList*) av_malloc(sizeof(AVPacketList));
pkt1 = (AVPacketList*) malloc(sizeof(AVPacketList));
if (!pkt1) return -1;
pkt1->pkt = *pkt;
pkt1->next = NULL;
SDL_LockMutex(q->mutex);
if (!q->last_pkt)
q->first_pkt = pkt1;
else
q->last_pkt->next = pkt1;
q->last_pkt = pkt1;
q->nb_packets++;
q->size += pkt1->pkt.size;
SDL_CondSignal(q->cond);
SDL_UnlockMutex(q->mutex);
return 0;
}
static int packet_queue_get(PacketQueue *q, AVPacket *pkt, int block)
{
AVPacketList *pkt1;
int ret;
SDL_LockMutex(q->mutex);
for (;;)
{
if( quit)
{
ret = -1;
break;
}
pkt1 = q->first_pkt;
if (pkt1)
{
q->first_pkt = pkt1->next;
if (!q->first_pkt)
q->last_pkt = NULL;
q->nb_packets--;
q->size -= pkt1->pkt.size;
*pkt = pkt1->pkt;
//av_free(pkt1);
free(pkt1);
ret = 1;
break;
}
else if (!block)
{
ret = 0;
break;
}
else
{
SDL_CondWait(q->cond, q->mutex);
}
}
SDL_UnlockMutex(q->mutex);
return ret;
}
int audio_decode_frame(AVCodecContext *aCodecCtx, uint8_t *audio_buf, int buf_size)
{
static AVPacket pkt;
static uint8_t *audio_pkt_data = NULL;
static int audio_pkt_size = 0;
int len1, data_size;
for (;;)
{
while (audio_pkt_size > 0)
{
data_size = buf_size;
len1 = avcodec_decode_audio2(aCodecCtx, (int16_t*)audio_buf, &data_size,
audio_pkt_data, audio_pkt_size);
if (len1 < 0)
{
/* if error, skip frame */
audio_pkt_size = 0;
break;
}
audio_pkt_data += len1;
audio_pkt_size -= len1;
if (data_size <= 0)
{
/* No data yet, get more frames */
continue;
}
/* We have data, return it and come back for more later */
return data_size;
}
if (pkt.data)
av_free_packet(&pkt);
if (quit) return -1;
if (packet_queue_get(&audioq, &pkt, 1) < 0) return -1;
audio_pkt_data = pkt.data;
audio_pkt_size = pkt.size;
}
}
void audio_callback(void *userdata, Uint8 *stream, int len)
{
AVCodecContext *aCodecCtx = (AVCodecContext *)userdata;
int len1, audio_size;
static uint8_t audio_buf[(AVCODEC_MAX_AUDIO_FRAME_SIZE * 3) / 2];
static unsigned int audio_buf_size = 0;
static unsigned int audio_buf_index = 0;
while (len > 0)
{
if (audio_buf_index >= audio_buf_size)
{
/* We have already sent all our data; get more */
audio_size = audio_decode_frame(aCodecCtx, audio_buf, sizeof(audio_buf));
if(audio_size < 0)
{
/* If error, output silence */
audio_buf_size = 1024; // arbitrary?
memset(audio_buf, 0, audio_buf_size);
}
else
{
audio_buf_size = audio_size;
}
audio_buf_index = 0;
}
len1 = audio_buf_size - audio_buf_index;
if (len1 > len)
len1 = len;
memcpy(stream, (uint8_t *)audio_buf + audio_buf_index, len1);
len -= len1;
stream += len1;
audio_buf_index += len1;
}
}
void setup_ffmpeg(char* filename)
{
if (av_open_input_file(&pFormatCtx, filename, NULL, 0, NULL) != 0)
{
fprintf(stderr, "FFmpeg failed to open file %s!\n", filename);
exit(-1);
}
if (av_find_stream_info(pFormatCtx) < 0)
{
fprintf(stderr, "FFmpeg failed to retrieve stream info!\n");
exit(-1);
}
// Dump information about file onto standard error
dump_format(pFormatCtx, 0, filename, 0);
// Find the first video stream
int i = 0;
for (i; i < pFormatCtx->nb_streams; i++)
{
if (pFormatCtx->streams[i]->codec->codec_type == CODEC_TYPE_VIDEO && videoStream < 0)
{
videoStream = i;
}
if (pFormatCtx->streams[i]->codec->codec_type == CODEC_TYPE_AUDIO && audioStream < 0)
{
audioStream = i;
}
}
if (videoStream == -1)
{
fprintf(stderr, "No video stream found in %s!\n", filename);
exit(-1);
}
if (audioStream == -1)
{
fprintf(stderr, "No audio stream found in %s!\n", filename);
exit(-1);
}
// Get a pointer to the codec context for the audio stream
aCodecCtx = pFormatCtx->streams[audioStream]->codec;
// Set audio settings from codec info
SDL_AudioSpec wanted_spec;
wanted_spec.freq = aCodecCtx->sample_rate;
wanted_spec.format = AUDIO_S16SYS;
wanted_spec.channels = aCodecCtx->channels;
wanted_spec.silence = 0;
wanted_spec.samples = SDL_AUDIO_BUFFER_SIZE;
wanted_spec.callback = audio_callback;
wanted_spec.userdata = aCodecCtx;
SDL_AudioSpec spec;
if (SDL_OpenAudio(&wanted_spec, &spec) < 0)
{
fprintf(stderr, "SDL_OpenAudio: %s\n", SDL_GetError());
exit(-1);
}
AVCodec* aCodec = avcodec_find_decoder(aCodecCtx->codec_id);
if (!aCodec)
{
fprintf(stderr, "Unsupported codec!\n");
exit(-1);
}
avcodec_open(aCodecCtx, aCodec);
// audio_st = pFormatCtx->streams[index]
packet_queue_init(&audioq);
SDL_PauseAudio(0);
// Get a pointer to the codec context for the video stream
pCodecCtx = pFormatCtx->streams[videoStream]->codec;
// Find the decoder for the video stream
AVCodec* pCodec = avcodec_find_decoder(pCodecCtx->codec_id);
if (pCodec == NULL)
{
fprintf(stderr, "Unsupported codec!\n");
exit(-1); // Codec not found
}
// Open codec
if (avcodec_open(pCodecCtx, pCodec) < 0)
{
fprintf(stderr, "Unsupported codec!\n");
exit(-1); // Could not open codec
}
}
int main(int argc, char* argv[])
{
if (argc < 2)
{
std::cout << "Usage: " << argv[0] << " <video>" << std::endl;
return -1;
}
av_register_all();
// Init SDL
if (SDL_Init(SDL_INIT_VIDEO | SDL_INIT_AUDIO | SDL_INIT_TIMER))
{
fprintf(stderr, "Could not initialize SDL - %s\n", SDL_GetError());
return -1;
}
// Init ffmpeg and setup some SDL stuff related to Audio
setup_ffmpeg(argv[1]);
VideoCapture cap(argv[1]); // open the default camera
if (!cap.isOpened()) // check if we succeeded
{
std::cout << "Failed to load file!" << std::endl;
return -1;
}
AVPacket packet;
while (av_read_frame(pFormatCtx, &packet) >= 0)
{
if (packet.stream_index == videoStream)
{
// Actually this is were SYNC between audio/video would happen.
// Right now I assume that every VIDEO packet contains an entire video frame, and that's not true. A video frame can be made by multiple packets!
// But for the time being, assume 1 video frame == 1 video packet,
// so instead of reading the frame through ffmpeg, I read it through OpenCV.
Mat frame;
cap >> frame; // get a new frame from camera
// do some processing on the frame, either as a Mat or as IplImage.
// For educational purposes, applying a lame grayscale conversion
IplImage ipl_frame = frame;
for (int i = 0; i < ipl_frame.width * ipl_frame.height * ipl_frame.nChannels; i += ipl_frame.nChannels)
{
ipl_frame.imageData[i] = (ipl_frame.imageData[i] + ipl_frame.imageData[i+1] + ipl_frame.imageData[i+2])/3; //B
ipl_frame.imageData[i+1] = (ipl_frame.imageData[i] + ipl_frame.imageData[i+1] + ipl_frame.imageData[i+2])/3; //G
ipl_frame.imageData[i+2] = (ipl_frame.imageData[i] + ipl_frame.imageData[i+1] + ipl_frame.imageData[i+2])/3; //R
}
// Display it on SDL window
show_frame(&ipl_frame);
av_free_packet(&packet);
}
else if (packet.stream_index == audioStream)
{
packet_queue_put(&audioq, &packet);
}
else
{
av_free_packet(&packet);
}
SDL_Event event;
SDL_PollEvent(&event);
switch (event.type)
{
case SDL_QUIT:
SDL_FreeSurface(surface);
SDL_Quit();
break;
default:
break;
}
}
// the camera will be deinitialized automatically in VideoCapture destructor
// Close the codec
avcodec_close(pCodecCtx);
// Close the video file
av_close_input_file(pFormatCtx);
return 0;
}
On my Mac I compiled it with:
g++ ffmpeg_snd.cpp -o ffmpeg_snd -D_GNU_SOURCE=1 -D_THREAD_SAFE -I/usr/local/include/opencv -I/usr/local/include -I/usr/local/include/SDL -Wl,-framework,Cocoa -L/usr/local/lib -lopencv_core -lopencv_imgproc -lopencv_highgui -lopencv_ml -lopencv_video -lopencv_features2d -lopencv_calib3d -lopencv_objdetect -lopencv_contrib -lopencv_legacy -lopencv_flann -lSDLmain -lSDL -L/usr/local/lib -lavfilter -lavcodec -lavformat -I/usr/local/Cellar/ffmpeg/HEAD/include/libavcodec -I/usr/local/Cellar/ffmpeg/HEAD/include/libavformat