AVCodecContext::channel_layout 0 for WAV files - c++

I have been successfully loading compressed audio files using FFmpeg and querying their channel_layouts using some code I've written:
AVFormatContext* fmtCxt = nullptr;
avformat_open_input( &fmtCxt, "###/440_sine.wav", nullptr, nullptr );
avformat_find_stream_info( fmtCxt, nullptr );
av_find_best_stream( fmtCxt, AVMEDIA_TYPE_AUDIO, -1, -1, nullptr, 0 );
AVCodecContext* codecCxt = fmtCxt->streams[ret]->codec;
AVCodec* codec = avcodec_find_decoder( codecCxt->codec_id );
avcodec_open2( codecCxt, codec, nullptr );
std::cout << "Channel Layout: " << codecCxt->channel_layout << std::endl;
av_dump_format( fmtCxt, 0, "###/440_sine.wav", 0 );
I've removed all error checking for brevity. However for Microsoft WAV files (mono or stereo) the AVCodecContext::channel_layout member is always 0 - despite ffprobe and av_dump_format(..) both returning valid information:
Input #0, wav, from '###/440_sine.wav':
Duration: 00:00:00.01, bitrate: 740 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 1 channels, s16, 705 kb/s
Also codecCxt->channels returns the correct value. Using a flac file (with exactly the same audio data generated from the same application), gives a channel_layout of 0x4 (AV_CH_FRONT_CENTER).

Your WAV file uses FFmpeg's pcm_s16le codec, which have no information on channel layout. You can only have the number of channels. A lot of explanations can be found here
You have the correct channel_layout with the flac file because FFmpeg's flac codec fills this field. You can find the correspondence table on libavcodec/flac.c file, the flac_channel_layouts array.
If you need to fill channel_layout manually, you can call:
codecCxt->channel_layout = av_get_default_channel_layout( codecCxt->channels );

Related

Obtaining decoder MFT for H.264 video

i am trying to get a hardware decoder from media foundation. i know for sure my gpu supports nvdec hardware decoding. i found an example on github which gets the encoder, nvenc without any problem. but when i switch the params to decoder, i either get a bad hresult or a crash. i tried even getting a software decoder by changing the hardware flag, and still bad hresult. any one have an idea what is wrong? i cant think of anything else left for me to try or change
HRESULT get_decoder(CComPtr<IMFTransform>& out_transform, CComPtr<IMFActivate>& out_activate,
CComPtr<IMFAttributes>& out_attributes)
{
HRESULT hr = S_OK;
// Find the decoder
CComHeapPtr<IMFActivate*> activate_raw;
uint32_t activateCount = 0;
// Input & output types
const MFT_REGISTER_TYPE_INFO in_info = { MFMediaType_Video, MFVideoFormat_H264 };
const MFT_REGISTER_TYPE_INFO out_info = { MFMediaType_Video, MFVideoFormat_NV12 };
// Get decoders matching the specified attributes
if (FAILED(hr = MFTEnum2(MFT_CATEGORY_VIDEO_DECODER, MFT_ENUM_FLAG_SYNCMFT | MFT_ENUM_FLAG_SORTANDFILTER, &in_info, &out_info,
nullptr, &activate_raw, &activateCount)))
return hr;
// Choose the first returned decoder
out_activate = activate_raw[0];
// Memory management
for (int i = 1; i < activateCount; i++)
activate_raw[i]->Release();
// Activate
if (FAILED(hr = out_activate->ActivateObject(IID_PPV_ARGS(&out_transform))))
return hr;
// Get attributes
if (FAILED(hr = out_transform->GetAttributes(&out_attributes)))
return hr;
std::cout << "- get_decoder() Found " << activateCount << " decoders" << std::endl;
return hr;
}
There might be no dedicated decoder MFT for hardware decoding (even though some vendors supply those). Hardware video decoding, in contrast to encoding, is available via DXVA 2 API, and - in turn - is covered by Microsoft H264 Video Decoder MFT.
This stock MFT is capable to decode using hardware and is also compatible with D3D9 and D3D11 enabled pipelines.
Microsoft H264 Video Decoder MFT
6 Attributes:
MFT_TRANSFORM_CLSID_Attribute: {62CE7E72-4C71-4D20-B15D-452831A87D9D} (Type VT_CLSID, CLSID_CMSH264DecoderMFT)
MF_TRANSFORM_FLAGS_Attribute: MFT_ENUM_FLAG_SYNCMFT
MFT_INPUT_TYPES_Attributes: MFVideoFormat_H264, MFVideoFormat_H264_ES
MFT_OUTPUT_TYPES_Attributes: MFVideoFormat_NV12, MFVideoFormat_YV12, MFVideoFormat_IYUV, MFVideoFormat_I420, MFVideoFormat_YUY2
Attributes
MF_SA_D3D_AWARE: 1 (Type VT_UI4)
MF_SA_D3D11_AWARE: 1 (Type VT_UI4)
CODECAPI_AVDecVideoThumbnailGenerationMode: 0 (Type VT_UI4)
CODECAPI_AVDecVideoMaxCodedWidth: 7680 (Type VT_UI4)
CODECAPI_AVDecVideoMaxCodedHeight: 4320 (Type VT_UI4)
CODECAPI_AVDecNumWorkerThreads: 4294967295 (Type VT_UI4, -1)
CODECAPI_AVDecVideoAcceleration_H264: 1 (Type VT_UI4)
...
From MSDN:
CODECAPI_AVDecVideoAcceleration_H264 Enables or disables hardware acceleration.
...
Maximum Resolution 4096 × 2304 pixels
The maximum guaranteed resolution for DXVA acceleration is 1920 × 1088 pixels; at higher resolutions, decoding is done with DXVA, if it is supported by the underlying hardware, otherwise, decoding is done with software.
...
DXVA The decoder supports DXVA version 2, but not DXVA version 1. DXVA decoding is supported only for Main-compatible Baseline, Main, and High profile bitstreams. (Main-compatible Baseline bitstreams are defined as profile_idc=66 and constrained_set1_flag=1.)
To decode with hardware acceleration just use Microsoft H264 Video Decoder MFT.

FFMPEG error when finding stream information with custom AVIOContext

I am writing software that takes in a file as a stream and decodes it. I have the following custom AVIO code for stream input:
/* Allocate a 4kb buffer for copying. */
std::uint32_t bufSize = 4096;
struct vidBuf
{
std::byte* ptr;
int size;
};
vidBuf tmpVidBuf = { const_cast<std::byte*>(videoBuffer.data()),
static_cast<int>(videoBuffer.size()) };
AVIOContext *avioContext =
avio_alloc_context(reinterpret_cast<std::uint8_t*>(av_malloc(bufSize)),
bufSize, 0,
reinterpret_cast<void*>(&tmpVidBuf),
[](void *opaque, std::uint8_t *buf, int bufSize) -> int
{
auto &me = *reinterpret_cast<vidBuf*>(opaque);
bufSize = std::min(bufSize, me.size);
std::copy_n(me.ptr, bufSize, reinterpret_cast<std::byte*>(buf));
me.ptr += bufSize;
me.size -= bufSize;
return bufSize;
}, nullptr, nullptr);
auto avFormatPtr = avformat_alloc_context();
avFormatPtr->pb = avioContext;
avFormatPtr->flags |= AVFMT_FLAG_CUSTOM_IO;
//avFormatPtr->probesize = tmpVidBuf.size;
//avFormatPtr->max_analyze_duration = 5000000;
avformat_open_input(&avFormatPtr, nullptr, nullptr, nullptr);
if(auto ret = avformat_find_stream_info(avFormatPtr, nullptr);
ret < 0)
logerror << "Could not open the video file: " << makeAVError(ret) << '\n';
However, when I run this code I get the error:
[mov,mp4,m4a,3gp,3g2,mj2 # 0x55d10736d580] stream 0, offset 0x30: partial file
[mov,mp4,m4a,3gp,3g2,mj2 # 0x55d10736d580] Could not find codec parameters for stream 0 (Video: h264 (avc1 / 0x31637661), none(tv, bt709), 540x360, 649 kb/s): unspecified pixel format
Consider increasing the value for the 'analyzeduration' (0) and 'probesize' (5000000) options
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.76.100
Duration: 00:04:08.41, start: 0.000000, bitrate: N/A
Stream #0:0(und): Video: h264 (avc1 / 0x31637661), none(tv, bt709), 540x360, 649 kb/s, SAR 1:1 DAR 3:2, 29.97 fps, 29.97 tbr, 30k tbn, 60k tbc (default)
Metadata:
handler_name : ISO Media file produced by Google Inc. Created on: 01/10/2021.
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: aac (mp4a / 0x6134706D), 22050 Hz, mono, fltp, 69 kb/s (default)
Metadata:
handler_name : ISO Media file produced by Google Inc. Created on: 01/10/2021.
vendor_id : [0][0][0][0]
Assertion desc failed at libswscale/swscale_internal.h:677
Note the absence of the YUV420p part in the video stream data.
This is strange since if I run my program with a different mp4 file it works perfectly fine, this error only occurs with a specific mp4 file. I know that the mp4 file is valid since mpv can play it, and ffprobe is able to get its metadata:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'heard.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.76.100
Duration: 00:04:08.41, start: 0.000000, bitrate: 724 kb/s
Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 540x360 [SAR 1:1 DAR 3:2], 649 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
Metadata:
handler_name : ISO Media file produced by Google Inc. Created on: 01/10/2021.
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 22050 Hz, mono, fltp, 69 kb/s (default)
Metadata:
handler_name : ISO Media file produced by Google Inc. Created on: 01/10/2021.
vendor_id : [0][0][0][0]
As you can see by my code I also tried setting analyzeduration and probesize, but this did not fix the issue.
I also know that this error is because of my custom io because when I have avformat_open_input open the file directly, it is able to be decoded just fine. I am new to ffmpeg, so I might have missed something simple.
As SuRGeoNix pointed out, I had not implemented a seek function for the AVIO context; I think this messed up FFMPEG since it could not figure out the size of the buffer. This is my now working code:
std::uint32_t bufSize = 4096;
struct vidBuf
{
std::byte* ptr;
std::byte* origPtr;
int size;
int fullSize;
};
vidBuf tmpVidBuf = { const_cast<std::byte*>(videoBuffer.data()),
const_cast<std::byte*>(videoBuffer.data()),
static_cast<int>(videoBuffer.size()),
static_cast<int>(videoBuffer.size()), };
AVIOContext *avioContext =
avio_alloc_context(reinterpret_cast<std::uint8_t*>(av_malloc(bufSize)),
bufSize, 0,
reinterpret_cast<void*>(&tmpVidBuf),
[](void *opaque, std::uint8_t *buf, int bufSize) -> int
{
auto &me = *reinterpret_cast<vidBuf*>(opaque);
bufSize = std::min(bufSize, me.size);
std::copy_n(me.ptr, bufSize, reinterpret_cast<std::byte*>(buf));
me.ptr += bufSize;
me.size -= bufSize;
return bufSize;
},
nullptr,
[](void *opaque, std::int64_t where, int whence) -> std::int64_t
{
auto me = reinterpret_cast<vidBuf*>(opaque);
switch(whence)
{
case AVSEEK_SIZE:
/* Maybe size left? */
return me->fullSize;
break;
case SEEK_SET:
if(me->fullSize > where)
{
me->ptr = me->origPtr + where;
me->size = me->fullSize - where;
}
else
return EOF;
break;
case SEEK_CUR:
if(me->size > where)
{
me->ptr += where;
me->size -= where;
}
else
return EOF;
break;
case SEEK_END:
if(me->fullSize > where)
{
me->ptr = (me->origPtr + me->fullSize) - where;
int curPos = me->ptr - me->origPtr;
me->size = me->fullSize - curPos;
}
else
return EOF;
break;
default:
/* On error, do nothing, return current position of file. */
logerror << "Could not process buffer seek: "
<< whence << ".\n";
break;
}
return me->ptr - me->origPtr;
});

FFMPEG library- transcode raw image to h264 stream, and the output file does not contains pts and dts info

I am trying using ffmpeg c++ library to convert several raw yuyv image to h264 stream, the image come from memory and passed as string about 24fps, i do the convention as the following steps:
init AVFormatContext,AVCodec,AVCodecContext and create new AVStream. this step i mainly refer to ffmpeg-libav-tutorial, and AVFormatContext use customize write_buffer() function(refer to simplest_ffmpeg_mem_handler)
receive raw frame data, set width and height(1920x1080), and set pts and dts. here i manually set the output fps to 24, and use a global counter to count num of frames, and the pts is calculated by this counter, code snippet(video_avs is AVStream,output_fps is 24 and time_base is 1/24):
input_frame->width = w; // 1920
input_frame->height = h; // 1080
input_frame->pkt_dts = input_frame->pts = global_pts;
global_pts += video_avs->time_base.den/video_avs->time_base.num / output_fps.num * output_fps.den;
convert it from yuyv to yuv422(because h264 does not support yuyv) and resize it from 1920x1080 to 640x480(because i need this resolution output), use sws_scale()
use avcodec_send_frame() and avcodec_receive_packet() to get the output packet. set output_packet duration and stream_index, then use av_write_frame() to write frame data.
AVPacket *output_packet = av_packet_alloc();
int response = avcodec_send_frame(encoder->video_avcc, frame);
while (response >= 0) {
response = avcodec_receive_packet(encoder->video_avcc, output_packet); // !! here output_packet.size is calculated
if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) {
break;
}
else if (response < 0) {
printf("Error while sending packet to decoder"); // ??av_err2str(response)会报错
return response;
}
// duration = next_pts - this_pts = timescale / fps = 1 / timebase / fps
output_packet->duration = (encoder->video_avs->time_base.den / encoder->video_avs->time_base.num) / (output_fps.num / output_fps.den);
output_packet->stream_index = 0;
int response = av_write_frame(encoder->avfc, output_packet); // packet order are not ensure
if (response != 0) { printf("Error %d while receiving packet from decoder", response); return -1;}
}
av_packet_unref(output_packet);
av_packet_free(&output_packet);
in write_buffer() function, video stream output is stored to string variable, and then i write this string to file with ostream, and suffix mp4.
after all the above steps, the output.mp4 cannot be played, the ffprobe output.mp4 -show_frames output is
(image):
Input #0, h264, from '/Users/ming/code/dev/haomo/output.mp4':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: h264 (High 4:2:2), yuv422p(progressive), 640x480, 24.92 fps, 24 tbr, 1200k tbn, 48 tbc
[FRAME]
media_type=video
stream_index=0
key_frame=1
pkt_pts=N/A
pkt_pts_time=N/A
pkt_dts=N/A
pkt_dts_time=N/A
best_effort_timestamp=N/A
best_effort_timestamp_time=N/A
Note that before and after calling av_write_frame() in step 4, the passed argument output_packet contains correct pts and dts info, i cannot figure out why the output stream lost these info.
I figure it out, the output stream is a raw h264 stream, and I directly store this stream into a file with a suffix of ".mp4", so it is actually not a correct mp4 file.
Then it store the stream into output.h264 file, and use ffmpeg to convert it to mp4 file: ffmpeg -framerate 24 -i output.h264 -c copy output.mp4, finally this output.mp4 contains right pts data and can be played.

ALSA - samplerate conversion

I have a text-to-speech application, that generate an audio-stream (raw-data) with a samplerate of 22kHz.
I have a USB-SoundCard that support only 44kHz.
With my asound.conf I can play wav-files that contains 22kHz and 44kHz audio-stream without problems in aplay.
My Application use the alsa-libs and set the samplerate of the device.
In this case only 44kHz will succeed, because the hardware supports only this samplerate. But now, when i write the generated audio-stream to alsa, it sounds wrong, because the samplerates dosn't match. The audio-stream (raw-data) dosn't contain any header information, so I think alsa don't use any plugin zu convert the samplerate. Alsa don't know that the stream has a different samplerate
My question is now, what is the right way to tell alsa, that the generated audio-stream have a different samplerate, so the alsa-plugin convert it.
The following code works on the USB SoundCard only with sampleRate = 44100, otherwise an error occured (-22, invalid parameters).
void initAlsa()
{
const char* name = "default";
alsaAudio = true;
writeRiffAtClose = false;
int err = snd_pcm_open (&alsaPlaybackHandle, name, SND_PCM_STREAM_PLAYBACK, 0);
if (err < 0)
throw TtsException({"Alsa: cannot open playback audio device ", name, " (" , snd_strerror (err), ")"}, 0);
sampleRate = 44100;
err = snd_pcm_set_params(alsaPlaybackHandle, // pcm PCM handle
SND_PCM_FORMAT_S16_LE, // format required PCM format
SND_PCM_ACCESS_RW_INTERLEAVED, // access required PCM access
2, // channels required PCM channels (Stereo)
sampleRate, // rate required sample rate in Hz
1, // soft_resample 0 = disallow alsa-lib resample stream, 1 = allow resampling
250000); /* 0.25sec */ // latency required overall latency in us
if (err < 0)
throw TtsException({"Alsa: cannot set parameters (" , err, " = " , snd_strerror(err), ") on ", name}, 0);
LOG_DEBUG("Alsa audio initialized");
}
Other way are, I manuelly convert the sample rate, before i put it to alsa, but i think: why not use the alsa-plugin.
I don't have the option to get 44kHz audio-stream from the tts-engine (it's another software).
Or exist another way, that I don't see?
Best regards.

C++ FFMPEG not writing AVCC box information

I'm trying to encode raw H264 into an mp4 container using the FFMPEG API in C++. It all works fine, however the AVCC box is empty, and it returns the error:
[iso file] Box "avcC" size 8 invalid
If I then use the command line tool on the output file:
ffmpeg -i output.mp4 -vcodec copy fixed.mp4
The output file works and AVCC is populated with the required information. I'm at a loss as to why this command line argument works but I'm unable to produce the same result using the API.
What I do in the C++ code (also do things in between the function calls):
outputFormat_ = av_guess_format( "mp4", NULL, NULL ); //AV_CODEC_H264
formatContext_ = avformat_alloc_context();
formatContext_->oformat = outputFormat_;
...
AVDictionary *opts = NULL;
char tmpstr[50]; sprintf(tmpstr, "%i", muxRate * KILOBYTESTOBYTES);
av_dict_set(&opts, "muxrate", tmpstr, 0);
avformat_write_header( formatContext_, &opts);
av_write_trailer(formatContext_);
The output of this is correct, except it's missing the AVCC information. Adding this is manually (and fixing the box lengths accordingly) lets me playback the video fine. Any idea why the API calls are not generating the AVCC info?
For reference, here's the chars from the mp4 before the fix:
.avc1.........................€.8.H...H..........................................ÿÿ....avcC....stts
and after:
avc1.........................€.8.H...H..........................................ÿÿ...!avcC.B€(ÿá..gB€(Ú.à.—•...hÎ<€....stts
I had the problem with empty AVCC boxes with my MP4 files too. It turned out I was setting CODEC_FLAG_GLOBAL_HEADER flag on the AVCodecContext instance after calling avcodec_open2. The flag should be set before calling avcodec_open2.
Solved it. The data that was required was the SPS and PPS components of the AVCC codec. As the raw H264 stream was in annex b format, this was present at the start of every I-frame, in the NAL units starting 0x00 0x00 0x00 0x01 0x67 and 0x00 0x00 0x00 0x01 0x68. So what was needed was to copy that information into the AVStream codec's extradata field:
codecContext = stream->codec;
...
// videoSeqHeader contains the PPS and SPS NAL unit data
codecContext->extradata = (uint8_t*)malloc( sizeof(uint8_t) * videoSeqHeader_.size() );
for( unsigned int index = 0; index < videoSeqHeader_.size(); index++ )
{
codecContext->extradata[index] = videoSeqHeader_[index];
}
codecContext->extradata_size = (int)videoSeqHeader_.size();
This resulted in the AVCC box being correctly populated.