FFmpeg + OpenAL - playback streaming sound from video won't work - c++

I am decoding an OGG video (theora & vorbis as codecs) and want to show it on the screen (using Ogre 3D) while playing its sound. I can decode the image stream just fine and the video plays perfectly with the correct frame rate, etc.
However, I cannot get the sound to play at all with OpenAL.
Edit: I managed to make the playing sound resemble the actual audio in the video at least somewhat. Updated sample code.
Edit 2: I was able to get "almost" correct sound now. I had to set OpenAL to use AL_FORMAT_STEREO_FLOAT32 (after initializing the extension) instead of just STEREO16. Now the sound is "only" extremely high pitched and stuttering, but at the correct speed.
Here is how I decode audio packets (in a background thread, the equivalent works just fine for the image stream of the video file):
//------------------------------------------------------------------------------
int decodeAudioPacket( AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
// Decode audio frame
int got_frame = 0;
int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
if (decoded < 0)
{
p_videoInfo.error = "Error decoding audio frame.";
return decoded;
}
// Frame is complete, store it in audio frame queue
if (got_frame)
{
int bufferSize = av_samples_get_buffer_size(NULL, p_audioCodecContext->channels, p_frame->nb_samples,
p_audioCodecContext->sample_fmt, 0);
int64_t duration = p_frame->pkt_duration;
int64_t dts = p_frame->pkt_dts;
if (staticOgreLog)
{
staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: "
+ boost::lexical_cast<std::string>(bufferSize) + " / "
+ boost::lexical_cast<std::string>(duration) + " / "
+ boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
}
// Create the audio frame
AudioFrame* frame = new AudioFrame();
frame->dataSize = bufferSize;
frame->data = new uint8_t[bufferSize];
if (p_frame->channels == 2)
{
memcpy(frame->data, p_frame->data[0], bufferSize >> 1);
memcpy(frame->data + (bufferSize >> 1), p_frame->data[1], bufferSize >> 1);
}
else
{
memcpy(frame->data, p_frame->data, bufferSize);
}
double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
frame->lifeTime = duration * timeBase;
p_player->addAudioFrame(frame);
}
return decoded;
}
So, as you can see, I decode the frame, memcpy it to my own struct, AudioFrame. Now, when the sound is played, I use these audio frame like this:
int numBuffers = 4;
ALuint buffers[4];
alGenBuffers(numBuffers, buffers);
ALenum success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on alGenBuffers : " + Ogre::StringConverter::toString(success) + alGetString(success));
return;
}
// Fill a number of data buffers with audio from the stream
std::vector<AudioFrame*> audioBuffers;
std::vector<unsigned int> audioBufferSizes;
unsigned int numReturned = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffers, audioBuffers, audioBufferSizes);
// Assign the data buffers to the OpenAL buffers
for (unsigned int i = 0; i < numReturned; ++i)
{
alBufferData(buffers[i], _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on alBufferData : " + Ogre::StringConverter::toString(success) + alGetString(success)
+ " size: " + Ogre::StringConverter::toString(audioBufferSizes[i]));
return;
}
}
// Queue the buffers into OpenAL
alSourceQueueBuffers(_source, numReturned, buffers);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error queuing streaming buffers: " + Ogre::StringConverter::toString(success) + alGetString(success));
return;
}
}
alSourcePlay(_source);
The format and frequency I give to OpenAL are AL_FORMAT_STEREO_FLOAT32 (it is a stereo sound stream, and I did initialize the FLOAT32 extension) and 48000 (which is the sample rate of the AVCodecContext of the audio stream).
And during playback, I do the following to refill OpenAL's buffers:
ALint numBuffersProcessed;
// Check if OpenAL is done with any of the queued buffers
alGetSourcei(_source, AL_BUFFERS_PROCESSED, &numBuffersProcessed);
if(numBuffersProcessed <= 0)
return;
// Fill a number of data buffers with audio from the stream
std::vector<AudiFrame*> audioBuffers;
std::vector<unsigned int> audioBufferSizes;
unsigned int numFilled = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffersProcessed, audioBuffers, audioBufferSizes);
// Assign the data buffers to the OpenAL buffers
ALuint buffer;
for (unsigned int i = 0; i < numFilled; ++i)
{
// Pop the oldest queued buffer from the source,
// fill it with the new data, then re-queue it
alSourceUnqueueBuffers(_source, 1, &buffer);
ALenum success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error Unqueuing streaming buffers: " + Ogre::StringConverter::toString(success));
return;
}
alBufferData(buffer, _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on re- alBufferData: " + Ogre::StringConverter::toString(success));
return;
}
alSourceQueueBuffers(_source, 1, &buffer);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error re-queuing streaming buffers: " + Ogre::StringConverter::toString(success) + " "
+ alGetString(success));
return;
}
}
// Make sure the source is still playing,
// and restart it if needed.
ALint playStatus;
alGetSourcei(_source, AL_SOURCE_STATE, &playStatus);
if(playStatus != AL_PLAYING)
alSourcePlay(_source);
As you can see, I do quite heavy error checking. But I do not get any errors, neither from OpenAL nor from FFmpeg.
Edit: What I hear somewhat resembles the actual audio from the video, but VERY high pitched and stuttering VERY much. Also, it seems to be playing on top of TV noise. Very strange. Plus, it is playing much slower than the correct audio would.
Edit: 2 After using AL_FORMAT_STEREO_FLOAT32, the sound plays at the correct speed, but is still very high pitched and stuttering (though less than before).
The video itself is not broken, it can be played fine on any player. OpenAL can also play *.way files just fine in the same application, so it is also working.
Any ideas what could be wrong here or how to do this correctly?
My only guess is that somehow, FFmpeg's decode function does not produce data OpenGL can read. But this is as far as the FFmpeg decode example goes, so I don't know what's missing. As I understand it, the decode_audio4 function decodes the frame to raw data. And OpenAL should be able to work with RAW data (or rather, doesn't work with anything else).

So, I finally figured out how to do it. Gee, what a mess. It was a hint from a user on the libav-users mailing list that put me on the correct path.
Here are my mistakes:
Using the wrong format in the alBufferData function. I used AL_FORMAT_STEREO16 (as that is what every single streaming example with OpenAL uses). I should have used AL_FORMAT_STEREO_FLOAT32, as the video I stream is Ogg and vorbis is stored in floating points. And using swr_convert to convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16 just crashes. No idea why.
Not using swr_convert to convert the decoded audio frame to the target format. After I was trying to use swr_convert to convert from FLTP to S16, and it would simply crash without a reason given, I assumed it was broken. But after figuring out my first mistake, I tried again, converting from FLTP to FLT (non-planar) and then it worked! So OpenAL uses interleaved format, not planar. Good to know.
So here is the decodeAudioPacket function that is working for me with Ogg video, vorbis audio stream:
int decodeAudioPacket( AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
SwrContext* p_swrContext, uint8_t** p_destBuffer, int p_destLinesize,
FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
// Decode audio frame
int got_frame = 0;
int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
if (decoded < 0)
{
p_videoInfo.error = "Error decoding audio frame.";
return decoded;
}
if(decoded <= p_packet.size)
{
/* Move the unread data to the front and clear the end bits */
int remaining = p_packet.size - decoded;
memmove(p_packet.data, &p_packet.data[decoded], remaining);
av_shrink_packet(&p_packet, remaining);
}
// Frame is complete, store it in audio frame queue
if (got_frame)
{
int outputSamples = swr_convert(p_swrContext,
p_destBuffer, p_destLinesize,
(const uint8_t**)p_frame->extended_data, p_frame->nb_samples);
int bufferSize = av_get_bytes_per_sample(AV_SAMPLE_FMT_FLT) * p_videoInfo.audioNumChannels
* outputSamples;
int64_t duration = p_frame->pkt_duration;
int64_t dts = p_frame->pkt_dts;
if (staticOgreLog)
{
staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: "
+ boost::lexical_cast<std::string>(bufferSize) + " / "
+ boost::lexical_cast<std::string>(duration) + " / "
+ boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
}
// Create the audio frame
AudioFrame* frame = new AudioFrame();
frame->dataSize = bufferSize;
frame->data = new uint8_t[bufferSize];
memcpy(frame->data, p_destBuffer[0], bufferSize);
double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
frame->lifeTime = duration * timeBase;
p_player->addAudioFrame(frame);
}
return decoded;
}
And here is how I initialize the context and the destination buffer:
// Initialize SWR context
SwrContext* swrContext = swr_alloc_set_opts(NULL,
audioCodecContext->channel_layout, AV_SAMPLE_FMT_FLT, audioCodecContext->sample_rate,
audioCodecContext->channel_layout, audioCodecContext->sample_fmt, audioCodecContext->sample_rate,
0, NULL);
int result = swr_init(swrContext);
// Create destination sample buffer
uint8_t** destBuffer = NULL;
int destBufferLinesize;
av_samples_alloc_array_and_samples( &destBuffer,
&destBufferLinesize,
videoInfo.audioNumChannels,
2048,
AV_SAMPLE_FMT_FLT,
0);

Related

FFmpeg Opus choppy sound UPDATED DESCRIPTION

I'm using FFmpeg and try to encode and decode a raw PCM sound to Opus using a built-in FFmpeg "opus" codec. My input samples are raw PCM 8000 Hz 16 bit mono, in AV_SAMPLE_FMT_S16 format. Since Opus requires sample format AV_SAMPLE_FMT_FLTP and sample rate 48000 Hz only, so I resample my samples before encode them.
I have two instances of ResamplerAudio class that does the work of resampling audio samples and has a member of SwrContext, I use the first instance of ResamplerAudio for resampling a raw PCM input audio before encoding and the second for resampling decoded audio to get it's format and sample rate the same as source values of input raw audio.
ResamplerAudio class has a function that init it's SwrContext member like this:
void ResamplerAudio::init(AVCodecContext *codecContext, int inSampleRate, int outSampleRate, AVSampleFormat inSampleFmt, AVSampleFormat outSampleFmt)
{
swrContext = swr_alloc();
if (!swrContext)
{
LOGE(TAG, "[init] Couldn't allocate swr context");
return;
}
av_opt_set_int(swrContext, "in_channel_layout", (int64_t) codecContext->channel_layout, 0);
av_opt_set_int(swrContext, "out_channel_layout", (int64_t) codecContext->channel_layout, 0);
av_opt_set_int(swrContext, "in_channel_count", codecContext->channels, 0);
av_opt_set_int(swrContext, "out_channel_count", codecContext->channels, 0);
av_opt_set_int(swrContext, "in_sample_rate", inSampleRate, 0);
av_opt_set_int(swrContext, "out_sample_rate", outSampleRate, 0);
av_opt_set_sample_fmt(swrContext, "in_sample_fmt", inSampleFmt, 0);
av_opt_set_sample_fmt(swrContext, "out_sample_fmt", outSampleFmt, 0);
int ret = swr_init(swrContext);
if (ret < 0)
{
LOGE(TAG, "[init] swr_init error: %s", av_err2str(ret));
return;
}
LOGD(TAG, "[init] success codecContext->channel_layout: %d; inSampleRate: %d; outSampleRate: %d; inSampleFmt: %d; outSampleFmt: %d", (int) codecContext->channel_layout, inSampleRate, outSampleRate, inSampleFmt, outSampleFmt);
}
And I call ResamplerAudio::init function for the first instance of ResamplerAudio (this instance do resamping a raw PCM input audio before encoding and I called it resamplerEncoder) with the following args:
resamplerEncoder->init(contextEncoder, 8000, 48000, AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_FLTP);
The second instance of ResamplerAudio (this instance do resamping after decoding audio from Opus and I called it resamplerDecoder) I init with the following args:
resamplerDecoder->init(contextDecoder, 48000, 8000, AV_SAMPLE_FMT_FLTP, AV_SAMPLE_FMT_S16);
The function of ResamplerAudio that does resampling looks like this:
std::vector<uint8_t> ResamplerAudio::convert(uint8_t **inData, int inSamplesCount, int outChannels, int outFormat)
{
std::vector<uint8_t> result;
uint8_t *dstData = NULL;
const int dstNbSamples = swr_get_out_samples(swrContext, inSamplesCount);
av_samples_alloc(&dstData, NULL, outChannels, dstNbSamples, AVSampleFormat(outFormat), 1);
int resampledSize = swr_convert(swrContext, &dstData, dstNbSamples, (const uint8_t **)inData, inSamplesCount);
int dstBufSize = av_samples_get_buffer_size(NULL, outChannels, resampledSize, AVSampleFormat(outFormat), 1);
if (dstBufSize <= 0) return result;
std::copy(&dstData[0], &dstData[dstBufSize], std::back_inserter(result));
return result;
}
And I call ResamplerAudio::convert function before encoding with the following args:
// data - an array of raw pcm audio
// dataLength - the length of data array
// getSamplesCount() - function that calculates samples count
// frameEncode - AVFrame that using for encode audio
std::vector<uint8_t> resampledData = resamplerEncoder->convert(&data, getSamplesCount(dataLength, frameEncode->channels, AV_SAMPLE_FMT_S16), frameEncode->channels, frameEncode->format);
getSamplesCount() function looks like this:
getSamplesCount(int bytesCount, int channels, AVSampleFormat format)
{
return bytesCount / av_get_bytes_per_sample(format) / channels;
}
After that I fill my frameEncode with resampled samples:
memcpy(&frame->data[0][0], &resampledData[0], sizeof(uint8_t) * resampledDataLength);
And pass frameEncode to encoding like this encodeFrame(resampledDataLength):
void encodeFrame(int dataLength)
{
/* send the frame for encoding */
int ret = avcodec_send_frame(contextEncoder, frameEncode);
if (ret < 0)
{
LOGE(TAG, "[encodeFrame] avcodec_send_frame error: %s", av_err2str(ret));
return;
}
/* read all the available output packets (in general there may be any number of them */
while (ret >= 0)
{
ret = avcodec_receive_packet(contextEncoder, packetEncode);
if (ret < 0 && ret != AVERROR(EAGAIN)) LOGE(TAG, "[encodeFrame] error in avcodec_receive_packet: %s", av_err2str(ret));
if (ret < 0) break;
// encodedData - std::vector<uint8_t> that stores encoded data
std::copy(&packetEncode->data[0], &packetEncode->data[dataLength], std::back_inserter(encodedData));
av_packet_unref(packetEncode);
}
}
Then I decode my encoded samples and do resampling to get back them in source sample format and sample rate so I call ResamplerAudio::convert function for resamplerDecoder with the following args:
// frameDecode - AVFrame that holds decoded audio
std::vector<uint8_t> resampledData = resamplerDecoder->convert(frameDecode->data, frameDecode->nb_samples, frameDecode->channels, AV_SAMPLE_FMT_S16);
And result sound is choppy and I also noticed that the decoded array size is bigger than the source array size with raw pcm audio.
Please any ideas what I'm doing wrong?
UPD 18.05.2020
I tested my resampling logic, I did resampling of raw pcm sound without any encoding and decoding routines. First I tried to convert the sample rate of input sound from 8000 Hz to 48000 Hz than I took resampled samples from step above and convert it's sample rate from 48000 Hz to 8000 Hz and the result sound is perfect and clean, also I did the same steps but I converted not a sample rate but a sample format from AV_SAMPLE_FMT_S16 to AV_SAMPLE_FMT_FLTP and vice versa and again the result sound is perfect and clean, also I got the same result when I coverted both a sample rate and a sample format.
So I assume that the problem of distorted and choppy sound is in my encoding or decoding routine, I think most likely in decoding routine because after decoding I ALWAYS get AVFrame with 960 nb_samples despite what was the size of input sound.
My decoding routine looks like this:
std::vector<uint8_t> decode(uint8_t *data, unsigned int dataLength)
{
decodedData.clear();
int dataSize = dataLength;
while (dataSize > 0)
{
if (!frameDecode)
{
frameDecode = av_frame_alloc();
if (!frameDecode)
{
LOGE(TAG, "[decode] Couldn't allocate the frame");
return EMPTY_DATA;
}
}
ret = av_parser_parse2(parser, contextDecoder, &packetDecode->data, &packetDecode->size, &data[0], dataSize, AV_NOPTS_VALUE, AV_NOPTS_VALUE, 0);
if (ret < 0) {
LOGE(TAG, "[decode] av_parser_parse2 error: %s", av_err2str(ret));
return EMPTY_DATA;
}
data += ret;
dataSize -= ret;
doDecode();
}
return decodedData;
}
void doDecode()
{
if (packetDecode->size) {
/* send the packet with the compressed data to the decoder */
int ret = avcodec_send_packet(contextDecoder, packetDecode);
if (ret < 0) LOGE(TAG, "[decode] avcodec_send_packet error: %s", av_err2str(ret));
/* read all the output frames (in general there may be any number of them */
while (ret >= 0)
{
ret = avcodec_receive_frame(contextDecoder, frameDecode);
if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF) LOGE(TAG, "[decode] avcodec_receive_frame error: %s", av_err2str(ret));
if (ret < 0) break;
std::vector<uint8_t> resampledData = resamplerDecoder->convert(frameDecode->data, frameDecode->nb_samples, frameDecode->channels, AV_SAMPLE_FMT_S16);
if (!resampledData.size()) continue;
std::copy(&resampledData.data()[0], &resampledData.data()[resampledData.size()], std::back_inserter(decodedData));
}
}
}
UPD 30.05.2020
I decided to refuse to use FFmpeg in my project and use libopus 1.3.1 instead, so I made a wrapper around it and it works fine.

Live555 truncates encoded data of FFMpeg

I am trying to stream H264 based data using Live555 over RTSP.
I am capturing data using V4L2, and then encodes it using FFMPEG and then passing data to Live555's DeviceSource file, in that I using H264VideoStreamFramer class,
Below is my codec settings to configure AVCodecContext of encoder,
codec = avcodec_find_encoder_by_name(CODEC_NAME);
if (!codec) {
cerr << "Codec " << codec_name << " not found\n";
exit(1);
}
c = avcodec_alloc_context3(codec);
if (!c) {
cerr << "Could not allocate video codec context\n";
exit(1);
}
pkt = av_packet_alloc();
if (!pkt)
exit(1);
/* put sample parameters */
c->bit_rate = 400000;
/* resolution must be a multiple of two */
c->width = PIC_HEIGHT;
c->height = PIC_WIDTH;
/* frames per second */
c->time_base = (AVRational){1, FPS};
c->framerate = (AVRational){FPS, 1};
c->gop_size = 10;
c->max_b_frames = 1;
c->pix_fmt = AV_PIX_FMT_YUV420P;
c->rtp_payload_size = 30000;
if (codec->id == AV_CODEC_ID_H264)
av_opt_set(c->priv_data, "preset", "fast", 0);
av_opt_set_int(c->priv_data, "slice-max-size", 30000, 0);
/* open it */
ret = avcodec_open2(c, codec, NULL);
if (ret < 0) {
cerr << "Could not open codec\n";
exit(1);
}
And I am getting encoded data using avcodec_receive_packet() function. which will return AVPacket.
And I am passing AVPacket's data into DeviceSource file below is code snippet of my Live555 code:
void DeviceSource::deliverFrame() {
if (!isCurrentlyAwaitingData()) return; // we're not ready for the data yet
u_int8_t* newFrameDataStart = (u_int8_t*) pkt->data;
unsigned newFrameSize = pkt->size; //%%% TO BE WRITTEN %%%
// Deliver the data here:
if (newFrameSize > fMaxSize) { // Condition becomes true many times
fFrameSize = fMaxSize;
fNumTruncatedBytes = newFrameSize - fMaxSize;
} else {
fFrameSize = newFrameSize;
}
gettimeofday(&fPresentationTime, NULL); // If you have a more accurate time - e.g., from an encoder - then use that instead.
// If the device is *not* a 'live source' (e.g., it comes instead from a file or buffer), then set "fDurationInMicroseconds" here.
memmove(fTo, newFrameDataStart, fFrameSize);
}
But here, sometimes my packet's size is getting more than fMaxSize value and as per LIVE555 logic it will truncate frame data, so that sometimes I am getting bad frames on my VLC,
From Live555 forum, I get to know that encoder should not send packet whose size is more than fMaxSize value, so my question is:
How to restrict encoder to limit size of packet?
Thanks in Advance,
Harshil
You can increase the maximum allowed sample size by changing "maxSize" in the OutPacketBuffer class in MediaSink.cpp. This worked for me. There are cases we may require high-quality video to be streamed, I don't think we will always be able to restrict the encoder to not to produce samples of size more than a particular value which would result in video quality issues. In fact, the samples are fragmented by the UDP sink live555 to match the default MTU (1500), so increasing the max sample size limit has no side effects.

how do i create a stereo mp3 file with latest version of ffmpeg?

I'm updating my code from the older version of ffmpeg (53) to the newer (54/55). Code that did work has now been deprecated or removed so i'm having problems updating it.
Previously I could create a stereo MP3 file using a sample format called:
SAMPLE_FMT_S16
That matched up perfectly with my source stream. This has now been replace with
AV_SAMPLE_FMT_S16
Which works fine for mono recordings but when I try to create a stereo MP3 file it bugs out at avcodec_open2 with:
"Specified sample_fmt is not supported."
Through trial and error I've found that using
AV_SAMPLE_FMT_S16P
...is accepted by avcodec_open2 but when I get through and create the MP3 file the sound is very distorted - it sounds about 2 octaves lower than usual with a massive hum in the background - here's an example recording:
http://hosting.ispyconnect.com/example.mp3
I've been told by the ffmpeg guys that this is because I now need to manually deinterleave my byte stream before calling:
avcodec_fill_audio_frame
How do I do that? I've tried using the swrescale library without success and i've tried manually feeding in L/R data into avcodec_fill_audio_frame but the results i'm getting are sounding exactly the same as without interleaving.
Here is my code for encoding:
void add_audio_sample( AudioWriterPrivateData^ data, BYTE* soundBuffer, int soundBufferSize)
{
libffmpeg::AVCodecContext* c = data->AudioStream->codec;
memcpy(data->AudioBuffer + data->AudioBufferSizeCurrent, soundBuffer, soundBufferSize);
data->AudioBufferSizeCurrent += soundBufferSize;
uint8_t* pSoundBuffer = (uint8_t *)data->AudioBuffer;
DWORD nCurrentSize = data->AudioBufferSizeCurrent;
libffmpeg::AVFrame *frame;
int got_packet;
int ret;
int size = libffmpeg::av_samples_get_buffer_size(NULL, c->channels,
data->AudioInputSampleSize,
c->sample_fmt, 1);
while( nCurrentSize >= size) {
frame=libffmpeg::avcodec_alloc_frame();
libffmpeg::avcodec_get_frame_defaults(frame);
frame->nb_samples = data->AudioInputSampleSize;
ret = libffmpeg::avcodec_fill_audio_frame(frame, c->channels, c->sample_fmt, pSoundBuffer, size, 1);
if (ret<0)
{
throw gcnew System::IO::IOException("error filling audio");
}
//audio_pts = (double)audio_st->pts.val * audio_st->time_base.num / audio_st->time_base.den;
libffmpeg::AVPacket pkt = { 0 };
libffmpeg::av_init_packet(&pkt);
ret = libffmpeg::avcodec_encode_audio2(c, &pkt, frame, &got_packet);
if (ret<0)
throw gcnew System::IO::IOException("error encoding audio");
if (got_packet) {
pkt.stream_index = data->AudioStream->index;
if (pkt.pts != AV_NOPTS_VALUE)
pkt.pts = libffmpeg::av_rescale_q(pkt.pts, c->time_base, c->time_base);
if (pkt.duration > 0)
pkt.duration = av_rescale_q(pkt.duration, c->time_base, c->time_base);
pkt.flags |= AV_PKT_FLAG_KEY;
if (libffmpeg::av_interleaved_write_frame(data->FormatContext, &pkt) != 0)
throw gcnew System::IO::IOException("unable to write audio frame.");
}
nCurrentSize -= size;
pSoundBuffer += size;
}
memcpy(data->AudioBuffer, data->AudioBuffer + data->AudioBufferSizeCurrent - nCurrentSize, nCurrentSize);
data->AudioBufferSizeCurrent = nCurrentSize;
}
Would love to hear any ideas - I've been trying to get this working for 3 days now :(
you don't want to increase pSoundBuffer if a frame hasn't been fully encoded (e.g. got_packet isn't set to true) as no memory has been written yet. Also, you are allocating a frame during each loop: there's no need for that, you can re-use the same AVFrame over an over. Your code is also leaking as you never free the AVFrame.
I wrote a code as part of MythTV that encode audio to AC3.
This also do what you were looking for: deinterleave the content.
https://github.com/MythTV/mythtv/blob/476b2a826d43fca5e658ebe787c3cb1ec2334f98/mythtv/libs/libmyth/audio/audiooutputdigitalencoder.cpp#L178
I know this question is old, but for posterity: I'm working on some audio resampling code, and after I arrived at an audio sounding very similar to the mp3 the author linked, I identified the cause as being a mismatch in audio sampling rate between the input the resampler expects and the actual data.

WaveOutWrite callback creates choppy audio

I have four buffers that I am using for audio playback in a synthesizer. I submit two buffers initially, and then in the callback routine I write data into the next buffer and then submit that buffer.
When I generate each buffer I'm just putting a sine wave into it whose period is a multiple of the buffer length.
When I execute I hear brief pauses between each buffer. I've increased the buffer size to 16K samples at 44100 Hz so I can clearly hear that the whole buffer is playing, but there is an interruption between each.
What I think is happening is that the callback function is only called when ALL buffers that have been written are complete. I need the synthesis to stay ahead of the playback so I need a callback when each buffer is completed.
How do people usually solve this problem?
Update: I've been asked to add code. Here's what I have:
First I connect to the WaveOut device:
// Always grab the mapped wav device.
UINT deviceId = WAVE_MAPPER;
// This is an excelent tutorial:
// http://planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=4422&lngWId=3
WAVEFORMATEX wfx;
wfx.nSamplesPerSec = 44100;
wfx.wBitsPerSample = 16;
wfx.nChannels = 1;
wfx.cbSize = 0;
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nBlockAlign = (wfx.wBitsPerSample >> 3) * wfx.nChannels;
wfx.nAvgBytesPerSec = wfx.nBlockAlign * wfx.nSamplesPerSec;
_waveChangeEventHandle = CreateMutex(NULL,false,NULL);
MMRESULT res;
res = waveOutOpen(&_wo, deviceId, &wfx, (DWORD_PTR)WavCallback,
(DWORD_PTR)this, CALLBACK_FUNCTION);
I initialize the four frames I'll be using:
for (int i=0; i<_numFrames; ++i)
{
WAVEHDR *header = _outputFrames+i;
ZeroMemory(header, sizeof(WAVEHDR));
// Block size is in bytes. We have 2 bytes per sample.
header->dwBufferLength = _codeSpec->OutputNumSamples*2;
header->lpData = (LPSTR)malloc(2 * _codeSpec->OutputNumSamples);
ZeroMemory(header->lpData, 2*_codeSpec->OutputNumSamples);
res = waveOutPrepareHeader(_wo, header, sizeof(WAVEHDR));
if (res != MMSYSERR_NOERROR)
{
printf("Error preparing header: %d\n", res - MMSYSERR_BASE);
}
}
SubmitBuffer();
SubmitBuffer();
Here is the SubmitBuffer code:
void Vodec::SubmitBuffer()
{
WAVEHDR *header = _outputFrames+_curFrame;
MMRESULT res;
res = waveOutWrite(_wo, header, sizeof(WAVEHDR));
if (res != MMSYSERR_NOERROR)
{
if (res = WAVERR_STILLPLAYING)
{
printf("Cannot write when still playing.");
}
else
{
printf("Error calling waveOutWrite: %d\n", res-WAVERR_BASE);
}
}
_curFrame = (_curFrame+1)&0x3;
if (_pointQueue != NULL)
{
RenderQueue();
_nextFrame = (_nextFrame + 1) & 0x3;
}
}
And here is my callback code:
void CALLBACK Vodec::WavCallback(HWAVEOUT hWaveOut,
UINT uMsg,
DWORD dwInstance,
DWORD dwParam1,
DWORD dwParam2 )
{
// Only listen for end of block messages.
if(uMsg != WOM_DONE) return;
Vodec *instance = (Vodec *)dwInstance;
instance->SubmitBuffer();
}
The RenderQueue code is pretty simple - just copies a piece of a template buffer into the output buffer:
void Vodec::RenderQueue()
{
double white = _pointQueue->White;
white = 10.0; // For now just override with a constant value
int numSamples = _codeSpec->OutputNumSamples;
signed short int *data = (signed short int *)_outputFrames[_nextFrame].lpData;
for (int i=0; i<numSamples; ++i)
{
Sample x = white * _noise->Samples[i];
data[i] = (signed short int)(x);
}
_sampleOffset += numSamples;
if (_sampleOffset >= _pointQueue->DurationInSamples)
{
_sampleOffset = 0;
_pointQueue = _pointQueue->next;
}
}
UPDATE: Mostly solved the issue. I need to increment _nextFrame along with _curFrame (not conditionally). The playback buffer was getting ahead of the writing buffer.
However, when I decrease the playback buffer to 1024 samples, it gets choppy again. At 2048 samples it is clear. This happens for both Debug and Release builds.
1024 samples is just about 23ms of audio data. wav is pretty high level API from Windows Vista onwards. If you want low-latency audio playback, you should use CoreAudio. You can get latencies down to 10 ms in shared mode and 3 ms in exclusive mode. Also, the audio depends upon the processes currently running on your system. In other words, it depends on how frequently your audio thread can run to get data. You should also look at MultiMedia Class Scheduler Service and AvSetMmThreadCharacteristics function.

How to write bitmaps as frames to Ogg Theora in C\C++?

How to write bitmaps as frames to Ogg Theora in C\C++?
Some Examples with source would be grate!)
The entire solution is a little lengthy to post on here as a code sample, but if you download libtheora from Xiph.org, there is an example png2theora. All of the library functions I am about to mention can be found in the documentation on Xiph.org for theora and ogg.
Call th_info_init() to initialise a th_info structure, then set up you output parameters by assigning the appropriate members in that.
Use that structure in a call to th_encode_alloc() to get an encoder context
Initialise an ogg stream, with ogg_stream_init()
Initialise a blank th_comment structure using th_comment_init
Iterate through the following:
Call th_encode_flushheader with the the encoder context, the blank comment structure and an ogg_packet.
Send the resulting packet to the ogg stream with ogg_stream_packetin()
Until th_encode_flushheader returns 0 (or an error code)
Now, repeatedly call ogg_stream_pageout(), every time writing the page.header and then page.body to an output file, until it returns 0. Now call ogg_stream_flush and write the resulting page to the file.
You can now write frames to the encoder. Here is how I did it:
int theora_write_frame(int outputFd, unsigned long w, unsigned long h, unsigned char *yuv_y, unsigned char *yuv_u, unsigned char *yuv_v, int last)
{
th_ycbcr_buffer ycbcr;
ogg_packet op;
ogg_page og;
unsigned long yuv_w;
unsigned long yuv_h;
/* Must hold: yuv_w >= w */
yuv_w = (w + 15) & ~15;
/* Must hold: yuv_h >= h */
yuv_h = (h + 15) & ~15;
//Fill out the ycbcr buffer
ycbcr[0].width = yuv_w;
ycbcr[0].height = yuv_h;
ycbcr[0].stride = yuv_w;
ycbcr[1].width = yuv_w;
ycbcr[1].stride = ycbcr[1].width;
ycbcr[1].height = yuv_h;
ycbcr[2].width = ycbcr[1].width;
ycbcr[2].stride = ycbcr[1].stride;
ycbcr[2].height = ycbcr[1].height;
if(encoderInfo->pixel_fmt == TH_PF_420)
{
//Chroma is decimated by 2 in both directions
ycbcr[1].width = yuv_w >> 1;
ycbcr[2].width = yuv_w >> 1;
ycbcr[1].height = yuv_h >> 1;
ycbcr[2].height = yuv_h >> 1;
}else if(encoderInfo->pixel_fmt == TH_PF_422)
{
ycbcr[1].width = yuv_w >> 1;
ycbcr[2].width = yuv_w >> 1;
}else if(encoderInfo->pixel_fmt != TH_PF_422)
{
//Then we have an unknown pixel format
//We don't know how long the arrays are!
fprintf(stderr, "[theora_write_frame] Unknown pixel format in writeFrame!\n");
return -1;
}
ycbcr[0].data = yuv_y;
ycbcr[1].data = yuv_u;
ycbcr[2].data = yuv_v;
/* Theora is a one-frame-in,one-frame-out system; submit a frame
for compression and pull out the packet */
if(th_encode_ycbcr_in(encoderContext, ycbcr)) {
fprintf(stderr, "[theora_write_frame] Error: could not encode frame\n");
return -1;
}
if(!th_encode_packetout(encoderContext, last, &op)) {
fprintf(stderr, "[theora_write_frame] Error: could not read packets\n");
return -1;
}
ogg_stream_packetin(&theoraStreamState, &op);
ssize_t bytesWritten = 0;
int pagesOut = 0;
while(ogg_stream_pageout(&theoraStreamState, &og)) {
pagesOut ++;
bytesWritten = write(outputFd, og.header, og.header_len);
if(bytesWritten != og.header_len)
{
fprintf(stderr, "[theora_write_frame] Error: Could not write to file\n");
return -1;
}
bytesWritten = write(outputFd, og.body, og.body_len);
if(bytesWritten != og.body_len)
{
bytesWritten = fprintf(stderr, "[theora_write_frame] Error: Could not write to file\n");
return -1;
}
}
return pagesOut;
}
Where encoderInfo is the th_info structure used to initialise the encoder (static in the data section for me).
On your last frame, setting the last frame on th_encode_packetout() will make sure the stream terminates properly.
Once your done, just make sure to clean up (closing fds mainly). th_info_clear() will clear the th_info structure, and th_encode_free() will free your encoder context.
Obviously, you'll need to convert your bitmap into YUV planes before you can pass them to theora_write_frame().
Hope this is of some help. Good luck!
Here's the libtheora API and example code.
Here's a micro howto that shows how to use the theora binaries. As the encoder reads raw, uncompressed 'yuv4mpeg' data for video you could use that from your app, too by piping the video frames to the encoder.