FFmpeg - resampled audio with much noise - c++

I'm not familiar with auido resampling. I tried to resample auido streams from two videos. The first one's output was close to the original but with noise, the other one was almost full of noise.
Information for the first one
128 kb/s, 48.0kHz, 2 channels, AACLC
Information for the second one
384 kb/s, 48.0 kHz, 6channels, AACLC
I found that, when I set the sample size 16, the frist one worked quit good but still with noise. The other one worked too bad but still had sound. What and how to determine the output sample size? Although I used channels * av_get_bytes_per_sample((AVSampleFormat)output_fmt) as the output sample size because I wanted it to be the same as the original, it had no sound at all.
MyResampling.cpp
bool MyResample::open(AVCodecParameters* par) {
if (!par) {
std::cout << "par is null" << std::endl;
return false;
}
audio_context = swr_alloc_set_opts(
audio_context, av_get_default_channel_layout(2), (AVSampleFormat)output_fmt,
par->sample_rate, av_get_default_channel_layout(par->channels), (AVSampleFormat)par->format, par->sample_rate,
0, 0);
avcodec_parameters_free(&par);
int ret = swr_init(audio_context);
if (ret != 0) {
std::cout << "failed to open audio codec" << std::endl;
}
return true;
}
int MyResample::resample(AVFrame* frame, unsigned char* output)
{
if (!frame)
return 0;
if (!output)
av_frame_free(&frame);
uint8_t* data[2] = { 0 };
data[0] = output;
int ret = swr_convert(audio_context, data, frame->nb_samples, (const uint8_t**)frame->data, frame->nb_samples);
//int size = ret * frame->channels * av_get_bytes_per_sample((AVSampleFormat)output_fmt);
int size = av_samples_get_buffer_size(nullptr, frame->channels, frame->nb_samples, (AVSampleFormat)output_fmt, 1);
if (ret < 0)
return ret;
return size;
}
MyAudioPlayer.cpp
bool open()
{
close();
QAudioFormat fmt;
fmt.setSampleRate(sample_rate); // from audioStream->codecpar->sample_rate
fmt.setSampleSize(16); //
fmt.setChannelCount(channels); // from audioStream->codecpar->channels
fmt.setCodec("audio/pcm");
fmt.setByteOrder(QAudioFormat::LittleEndian);
fmt.setSampleType(QAudioFormat::UnSignedInt);
output = new QAudioOutput(fmt);
io = output->start();
if (io)
return true;
return false;
}
bool write(const unsigned char* data, int data_size)
{
if (!data || data_size <= 0)
return false;
if (!output || !io)
{
return false;
}
int size = io->write((char*)data, data_size);
if (data_size != size)
return false;
return true;
}
main.cpp
MyAudioPlayer::open();
unsigned char* pcm = new unsigned char[1024 * 1024];
if (demux.get_media_type() == 1) { // audio
audio_decode.sendPacket(pkt);
AVFrame* frame = audio_decode.receiveFrame();
int len = resample.resample(frame, pcm);
while (len > 0) {
if (MyAudioPlayer::check_space() >= len) {
MyAudioPlayer::write(pcm, len);
break;
}
msleep(1);
}
}

If you have troubles with the final quality and noise probably you are misunderstanding the proper way to perform a resampling or there is a bug in your configuration.
Take a look into this example: libswresample-example.
I am not familiar with the FFmpeg API because to do resampling I tend to use libsamplerate.
Regarding old example, those are the steps to perform a basic resample with FFMPEG:
Start by configuring your resampling context:
//Set up resampling context
SwrContext *swr = swr_alloc();
av_opt_set_channel_layout(swr, "in_channel_layout", AV_CH_LAYOUT_STEREO, 0);
av_opt_set_channel_layout(swr, "out_channel_layout", AV_CH_LAYOUT_STEREO, 0);
av_opt_set_int(swr, "in_sample_rate", 44100, 0);
av_opt_set_int(swr, "out_sample_rate", 22050, 0);
av_opt_set_sample_fmt(swr, "in_sample_fmt", AV_SAMPLE_FMT_FLT, 0);
av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_FLT, 0);
swr_init(swr);
Depending on your input data types and the format you expect as an output, you will need to specify the right format. This is the equivalence in C++ standard:
----------------------------------------
| *AV_SAMPLE_FMT_S16* | `std::int16_t` |
| *AV_SAMPLE_FMT_S32* | `std::int32_t` |
| *AV_SAMPLE_FMT_FLT* | `float` |
| *AV_SAMPLE_FMT_DBL | `double` |
| *AV_SAMPLE_FMT_U8P* | `std::uint8_t` |
| ... | |
Get your data from whatever place in the right format and estimate your sampling count.
After that, you can perform the resampling in few steps:
Estimate the number of output samples
uint8_t* out_samples;
int out_num_samples = av_rescale_rnd(swr_get_delay(swr, in_samplerate) + in_num_samples, out_samplerate, in_samplerate, AV_ROUND_UP);
Allocate the memory for the output file
av_samples_alloc(&out_samples, NULL, out_num_channels, out_num_samples, AV_SAMPLE_FMT_FLT, 0);
Convert the input data into the expected output format
out_num_samples = swr_convert(swr, &out_samples, out_num_samples, &in_samples, in_num_samples);
Do not forget to free your memory
av_freep(&out_samples);
swr_free(&swr);
If you have noise, probably the input formats and output formats are not the proper one or the resampling quality is low.
For instance, do not panic if you get fewer samples than what you expected. It is the common behavior because of the way the filtering works. To get the remaining trailing you can perform the step 5 with NULL as input, which will flush the internal data.

Related

FFmpeg Opus choppy sound UPDATED DESCRIPTION

I'm using FFmpeg and try to encode and decode a raw PCM sound to Opus using a built-in FFmpeg "opus" codec. My input samples are raw PCM 8000 Hz 16 bit mono, in AV_SAMPLE_FMT_S16 format. Since Opus requires sample format AV_SAMPLE_FMT_FLTP and sample rate 48000 Hz only, so I resample my samples before encode them.
I have two instances of ResamplerAudio class that does the work of resampling audio samples and has a member of SwrContext, I use the first instance of ResamplerAudio for resampling a raw PCM input audio before encoding and the second for resampling decoded audio to get it's format and sample rate the same as source values of input raw audio.
ResamplerAudio class has a function that init it's SwrContext member like this:
void ResamplerAudio::init(AVCodecContext *codecContext, int inSampleRate, int outSampleRate, AVSampleFormat inSampleFmt, AVSampleFormat outSampleFmt)
{
swrContext = swr_alloc();
if (!swrContext)
{
LOGE(TAG, "[init] Couldn't allocate swr context");
return;
}
av_opt_set_int(swrContext, "in_channel_layout", (int64_t) codecContext->channel_layout, 0);
av_opt_set_int(swrContext, "out_channel_layout", (int64_t) codecContext->channel_layout, 0);
av_opt_set_int(swrContext, "in_channel_count", codecContext->channels, 0);
av_opt_set_int(swrContext, "out_channel_count", codecContext->channels, 0);
av_opt_set_int(swrContext, "in_sample_rate", inSampleRate, 0);
av_opt_set_int(swrContext, "out_sample_rate", outSampleRate, 0);
av_opt_set_sample_fmt(swrContext, "in_sample_fmt", inSampleFmt, 0);
av_opt_set_sample_fmt(swrContext, "out_sample_fmt", outSampleFmt, 0);
int ret = swr_init(swrContext);
if (ret < 0)
{
LOGE(TAG, "[init] swr_init error: %s", av_err2str(ret));
return;
}
LOGD(TAG, "[init] success codecContext->channel_layout: %d; inSampleRate: %d; outSampleRate: %d; inSampleFmt: %d; outSampleFmt: %d", (int) codecContext->channel_layout, inSampleRate, outSampleRate, inSampleFmt, outSampleFmt);
}
And I call ResamplerAudio::init function for the first instance of ResamplerAudio (this instance do resamping a raw PCM input audio before encoding and I called it resamplerEncoder) with the following args:
resamplerEncoder->init(contextEncoder, 8000, 48000, AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_FLTP);
The second instance of ResamplerAudio (this instance do resamping after decoding audio from Opus and I called it resamplerDecoder) I init with the following args:
resamplerDecoder->init(contextDecoder, 48000, 8000, AV_SAMPLE_FMT_FLTP, AV_SAMPLE_FMT_S16);
The function of ResamplerAudio that does resampling looks like this:
std::vector<uint8_t> ResamplerAudio::convert(uint8_t **inData, int inSamplesCount, int outChannels, int outFormat)
{
std::vector<uint8_t> result;
uint8_t *dstData = NULL;
const int dstNbSamples = swr_get_out_samples(swrContext, inSamplesCount);
av_samples_alloc(&dstData, NULL, outChannels, dstNbSamples, AVSampleFormat(outFormat), 1);
int resampledSize = swr_convert(swrContext, &dstData, dstNbSamples, (const uint8_t **)inData, inSamplesCount);
int dstBufSize = av_samples_get_buffer_size(NULL, outChannels, resampledSize, AVSampleFormat(outFormat), 1);
if (dstBufSize <= 0) return result;
std::copy(&dstData[0], &dstData[dstBufSize], std::back_inserter(result));
return result;
}
And I call ResamplerAudio::convert function before encoding with the following args:
// data - an array of raw pcm audio
// dataLength - the length of data array
// getSamplesCount() - function that calculates samples count
// frameEncode - AVFrame that using for encode audio
std::vector<uint8_t> resampledData = resamplerEncoder->convert(&data, getSamplesCount(dataLength, frameEncode->channels, AV_SAMPLE_FMT_S16), frameEncode->channels, frameEncode->format);
getSamplesCount() function looks like this:
getSamplesCount(int bytesCount, int channels, AVSampleFormat format)
{
return bytesCount / av_get_bytes_per_sample(format) / channels;
}
After that I fill my frameEncode with resampled samples:
memcpy(&frame->data[0][0], &resampledData[0], sizeof(uint8_t) * resampledDataLength);
And pass frameEncode to encoding like this encodeFrame(resampledDataLength):
void encodeFrame(int dataLength)
{
/* send the frame for encoding */
int ret = avcodec_send_frame(contextEncoder, frameEncode);
if (ret < 0)
{
LOGE(TAG, "[encodeFrame] avcodec_send_frame error: %s", av_err2str(ret));
return;
}
/* read all the available output packets (in general there may be any number of them */
while (ret >= 0)
{
ret = avcodec_receive_packet(contextEncoder, packetEncode);
if (ret < 0 && ret != AVERROR(EAGAIN)) LOGE(TAG, "[encodeFrame] error in avcodec_receive_packet: %s", av_err2str(ret));
if (ret < 0) break;
// encodedData - std::vector<uint8_t> that stores encoded data
std::copy(&packetEncode->data[0], &packetEncode->data[dataLength], std::back_inserter(encodedData));
av_packet_unref(packetEncode);
}
}
Then I decode my encoded samples and do resampling to get back them in source sample format and sample rate so I call ResamplerAudio::convert function for resamplerDecoder with the following args:
// frameDecode - AVFrame that holds decoded audio
std::vector<uint8_t> resampledData = resamplerDecoder->convert(frameDecode->data, frameDecode->nb_samples, frameDecode->channels, AV_SAMPLE_FMT_S16);
And result sound is choppy and I also noticed that the decoded array size is bigger than the source array size with raw pcm audio.
Please any ideas what I'm doing wrong?
UPD 18.05.2020
I tested my resampling logic, I did resampling of raw pcm sound without any encoding and decoding routines. First I tried to convert the sample rate of input sound from 8000 Hz to 48000 Hz than I took resampled samples from step above and convert it's sample rate from 48000 Hz to 8000 Hz and the result sound is perfect and clean, also I did the same steps but I converted not a sample rate but a sample format from AV_SAMPLE_FMT_S16 to AV_SAMPLE_FMT_FLTP and vice versa and again the result sound is perfect and clean, also I got the same result when I coverted both a sample rate and a sample format.
So I assume that the problem of distorted and choppy sound is in my encoding or decoding routine, I think most likely in decoding routine because after decoding I ALWAYS get AVFrame with 960 nb_samples despite what was the size of input sound.
My decoding routine looks like this:
std::vector<uint8_t> decode(uint8_t *data, unsigned int dataLength)
{
decodedData.clear();
int dataSize = dataLength;
while (dataSize > 0)
{
if (!frameDecode)
{
frameDecode = av_frame_alloc();
if (!frameDecode)
{
LOGE(TAG, "[decode] Couldn't allocate the frame");
return EMPTY_DATA;
}
}
ret = av_parser_parse2(parser, contextDecoder, &packetDecode->data, &packetDecode->size, &data[0], dataSize, AV_NOPTS_VALUE, AV_NOPTS_VALUE, 0);
if (ret < 0) {
LOGE(TAG, "[decode] av_parser_parse2 error: %s", av_err2str(ret));
return EMPTY_DATA;
}
data += ret;
dataSize -= ret;
doDecode();
}
return decodedData;
}
void doDecode()
{
if (packetDecode->size) {
/* send the packet with the compressed data to the decoder */
int ret = avcodec_send_packet(contextDecoder, packetDecode);
if (ret < 0) LOGE(TAG, "[decode] avcodec_send_packet error: %s", av_err2str(ret));
/* read all the output frames (in general there may be any number of them */
while (ret >= 0)
{
ret = avcodec_receive_frame(contextDecoder, frameDecode);
if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF) LOGE(TAG, "[decode] avcodec_receive_frame error: %s", av_err2str(ret));
if (ret < 0) break;
std::vector<uint8_t> resampledData = resamplerDecoder->convert(frameDecode->data, frameDecode->nb_samples, frameDecode->channels, AV_SAMPLE_FMT_S16);
if (!resampledData.size()) continue;
std::copy(&resampledData.data()[0], &resampledData.data()[resampledData.size()], std::back_inserter(decodedData));
}
}
}
UPD 30.05.2020
I decided to refuse to use FFmpeg in my project and use libopus 1.3.1 instead, so I made a wrapper around it and it works fine.

Create CMSampleBufferRef from an AudioInputIOProc

I have an AudioInputIOProc that I'm getting an AudioBufferList from. I need to convert this AudioBufferList to a CMSampleBufferRef.
Here's the code I've written so far:
- (void)handleAudioSamples:(const AudioBufferList*)samples numSamples:(UInt32)numSamples hostTime:(UInt64)hostTime {
// Create a CMSampleBufferRef from the list of samples, which we'll own
AudioStreamBasicDescription monoStreamFormat;
memset(&monoStreamFormat, 0, sizeof(monoStreamFormat));
monoStreamFormat.mSampleRate = 44100;
monoStreamFormat.mFormatID = kAudioFormatMPEG4AAC;
monoStreamFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved;
monoStreamFormat.mBytesPerPacket = 4;
monoStreamFormat.mFramesPerPacket = 1;
monoStreamFormat.mBytesPerFrame = 4;
monoStreamFormat.mChannelsPerFrame = 2;
monoStreamFormat.mBitsPerChannel = 16;
CMFormatDescriptionRef format = NULL;
OSStatus status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &monoStreamFormat, 0, NULL, 0, NULL, NULL, &format);
if (status != noErr) {
// really shouldn't happen
return;
}
mach_timebase_info_data_t tinfo;
mach_timebase_info(&tinfo);
UInt64 _hostTimeToNSFactor = (double)tinfo.numer / tinfo.denom;
uint64_t timeNS = (uint64_t)(hostTime * _hostTimeToNSFactor);
CMTime presentationTime = CMTimeMake(timeNS, 1000000000);
CMSampleTimingInfo timing = { CMTimeMake(1, 44100), kCMTimeZero, kCMTimeInvalid };
CMSampleBufferRef sampleBuffer = NULL;
status = CMSampleBufferCreate(kCFAllocatorDefault, NULL, false, NULL, NULL, format, numSamples, 1, &timing, 0, NULL, &sampleBuffer);
if (status != noErr) {
// couldn't create the sample buffer
NSLog(#"Failed to create sample buffer");
CFRelease(format);
return;
}
// add the samples to the buffer
status = CMSampleBufferSetDataBufferFromAudioBufferList(sampleBuffer,
kCFAllocatorDefault,
kCFAllocatorDefault,
0,
samples);
if (status != noErr) {
NSLog(#"Failed to add samples to sample buffer");
CFRelease(sampleBuffer);
CFRelease(format);
NSLog(#"Error status code: %d", status);
return;
}
[self addAudioFrame:sampleBuffer];
NSLog(#"Original sample buf size: %ld for %d samples from %d buffers, first buffer has size %d", CMSampleBufferGetTotalSampleSize(sampleBuffer), numSamples, samples->mNumberBuffers, samples->mBuffers[0].mDataByteSize);
NSLog(#"Original sample buf has %ld samples", CMSampleBufferGetNumSamples(sampleBuffer));
}
Now, I'm unsure how to calculate the numSamples given this function definition of an AudioInputIOProc:
OSStatus AudioTee::InputIOProc(AudioDeviceID inDevice, const AudioTimeStamp *inNow, const AudioBufferList *inInputData, const AudioTimeStamp *inInputTime, AudioBufferList *outOutputData, const AudioTimeStamp *inOutputTime, void *inClientData)
This definition exists in the AudioTee.cpp file in WavTap.
The error I'm getting is a CMSampleBufferError_RequiredParameterMissing error with the error code -12731 when I try to call CMSampleBufferSetDataBufferFromAudioBufferList.
Update:
To clarify on the problem a bit, the following is the format of the audio data I'm getting from the AudioDeviceIOProc:
Channels: 2, Sample Rate: 44100, Precision: 32-bit, Sample Encoding: 32-bit Signed Integer PCM, Endian Type: little, Reverse Nibbles: no, Reverse Bits: no
I'm getting an AudioBufferList* that has all the audio data (30 seconds of video) that I need to convert to a CMSampleBufferRef* and add those sample buffers to a video (that is 30 seconds long) that is being written to disk via an AVAssetWriterInput.
Three things look wrong:
You declare that the format ID is kAudioFormatMPEG4AAC, but configure it as LPCM. So try
monoStreamFormat.mFormatID = kAudioFormatLinearPCM;
You also call the format "mono" when it's configured as stereo.
Why use mach_timebase_info which could leave gaps in your audio presentation timestamps? Use sample count instead:
CMTime presentationTime = CMTimeMake(numSamplesProcessed, 44100);
Your CMSampleTimingInfo looks wrong, and you're not using presentationTime. You set the buffer's duration as 1 sample long when it can be numSamples and its presentation time to zero which can't be right. Something like this would make more sense:
CMSampleTimingInfo timing = { CMTimeMake(numSamples, 44100), presentationTime, kCMTimeInvalid };
And some questions:
Does your AudioBufferList have the expected 2 AudioBuffers?
Do you have a runnable version of this?
p.s. I'm guilty of it myself, but allocating memory on the audio thread is considered harmful in audio dev.

RtAudio - Playing samples from wav file

I am currently trying to learn audio programming. My goal is to open a wav file, extract everything and play the samples with RtAudio.
I made a WaveLoader class which let's me extract the samples and meta data. I used this guide to do that and I checked that everything is correct with 010 editor. Here is a snapshot of 010 editor showing the structure and data.
And this is how i store the raw samples inside WaveLoader class:
data = new short[wave_data.payloadSize]; // - Allocates memory size of chunk size
if (!fread(data, 1, wave_data.payloadSize, sound_file))
{
throw ("Could not read wav data");
}
If i print out each sample I get : 1, -3, 4, -5 ... which seems ok.
The problem is that I am not sure how I can play them. This is what I've done:
/*
* Using PortAudio to play samples
*/
bool Player::Play()
{
ShowDevices();
rt.showWarnings(true);
RtAudio::StreamParameters oParameters; //, iParameters;
oParameters.deviceId = rt.getDefaultOutputDevice();
oParameters.firstChannel = 0;
oParameters.nChannels = mAudio.channels;
//iParameters.deviceId = rt.getDefaultInputDevice();
//iParameters.nChannels = 2;
unsigned int sampleRate = mAudio.sampleRate;
// Use a buffer of 512, we need to feed callback with 512 bytes everytime!
unsigned int nBufferFrames = 512;
RtAudio::StreamOptions options;
options.flags = RTAUDIO_SCHEDULE_REALTIME;
options.flags = RTAUDIO_NONINTERLEAVED;
//&parameters, NULL, RTAUDIO_FLOAT64,sampleRate, &bufferFrames, &mCallback, (void *)&rawData
try {
rt.openStream(&oParameters, NULL, RTAUDIO_SINT16, sampleRate, &nBufferFrames, &mCallback, (void*) &mAudio);
rt.startStream();
}
catch (RtAudioError& e) {
std::cout << e.getMessage() << std::endl;
return false;
}
return true;
}
/*
* RtAudio Callback
*
*/
int mCallback(void * outputBuffer, void * inputBuffer, unsigned int nBufferFrames, double streamTime, RtAudioStreamStatus status, void * userData)
{
unsigned int i = 0;
short *out = static_cast<short*>(outputBuffer);
auto *data = static_cast<Player::AUDIO_DATA*>(userData);
// if i is more than our data size, we are done!
if (i > data->dataSize) return 1;
// First time callback is called data->ptr is 0, this means that the offset is 0
// Second time data->ptr is 1, this means offset = nBufferFrames (512) * 1 = 512
unsigned int offset = nBufferFrames * data->ptr++;
printf("Offset: %i\n", offset);
// First time callback is called offset is 0, we are starting from 0 and looping nBufferFrames (512) times, this gives us 512 bytes
// Second time, the offset is 1, we are starting from 512 bytes and looping to 512 + 512 = 1024
for (i = offset; i < offset + nBufferFrames; ++i)
{
short sample = data->rawData[i]; // Get raw sample from our struct
*out++ = sample; // Pass to output buffer for playback
printf("Current sample value: %i\n", sample); // this is showing 1, -3, 4, -5 check 010 editor
}
printf("Current time: %f\n", streamTime);
return 0;
}
Inside callback function, when I print out sample values I get exactly like 010 editor? Why isnt rtaudio playing them. What is wrong here? Do I need to normalize sample values to between -1 and 1?
Edit:
The wav file I am trying to play:
Chunksize: 16
Format: 1
Channel: 1
SampleRate: 48000
ByteRate: 96000
BlockAlign: 2
BitPerSample: 16
Size of raw samples total: 2217044 bytes
For some reason it works when I pass input parameters to the openStream()
RtAudio::StreamParameters oParameters, iParameters;
oParameters.deviceId = rt.getDefaultOutputDevice();
oParameters.firstChannel = 0;
//oParameters.nChannels = mAudio.channels;
oParameters.nChannels = mAudio.channels;
iParameters.deviceId = rt.getDefaultInputDevice();
iParameters.nChannels = 1;
unsigned int sampleRate = mAudio.sampleRate;
// Use a buffer of 512, we need to feed callback with 512 bytes everytime!
unsigned int nBufferFrames = 512;
RtAudio::StreamOptions options;
options.flags = RTAUDIO_SCHEDULE_REALTIME;
options.flags = RTAUDIO_NONINTERLEAVED;
//&parameters, NULL, RTAUDIO_FLOAT64,sampleRate, &bufferFrames, &mCallback, (void *)&rawData
try {
rt.openStream(&oParameters, &iParameters, RTAUDIO_SINT16, sampleRate, &nBufferFrames, &mCallback, (void*) &mAudio);
rt.startStream();
}
catch (RtAudioError& e) {
std::cout << e.getMessage() << std::endl;
return false;
}
return true;
It was so random when I was trying to playback my mic. I left input parameters and my wav file was suddenly playing. Is this is a bug?

Is it possible to get good FPS using Raspberry Pi camera v4l2 in c++?

I'm trying to stream video on a Raspberry Pi using the official V4L2 driver with the Raspberry Pi camera, from C++ on raspbian (2015-02 release), and I'm having low FPS issues.
Currently I'm just creating a window and copying the buffer to the screen (which takes about 30ms) whereas the select() takes about 140ms (for a total of 5-6 fps). I also tried sleeping for 100ms and it decreases the select() time by a similar amount (resulting in the same fps). CPU load is about 5-15%.
I also tried changing the driver fps from console (or system()) but it only works downwards (for example, if I set the driver fps to 1fps, I'll get 1fps but if I set it to 90fps I still get 5-6fps, even though the driver confirms setting it to 90fps).
Also, when querying FPS modes for the used resolution I get 90fps.
I included the parts of the code related to V4L2 (code omitted between different parts) :
//////////////////
// Open device
//////////////////
mFD = open(mDevName, O_RDWR | O_NONBLOCK, 0);
if (mFD == -1) ErrnoExit("Open device failed");
//////////////////
// Setup format
//////////////////
struct v4l2_format fmt;
memset(&fmt, 0, sizeof(fmt));
fmt.fmt.pix.pixelformat = V4L2_PIX_FMT_YUYV;
fmt.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
Xioctl(VIDIOC_G_FMT, &fmt);
mImgWidth = fmt.fmt.pix.width;
mImgHeight = fmt.fmt.pix.height;
cout << "width=" << mImgWidth << " height=" << mImgHeight << "\nbytesperline=" << fmt.fmt.pix.bytesperline << " sizeimage=" << fmt.fmt.pix.sizeimage << "\n";
// For some reason querying the format always sets pixelformat to JPEG
// no matter the input, so set it back to YUYV
fmt.fmt.pix.pixelformat = V4L2_PIX_FMT_YUYV;
if (Xioctl(VIDIOC_S_FMT, &fmt) == -1)
{
cout << "Set video format failed : " << strerror(errno) << "\n";
}
//////////////////
// Setup streaming
//////////////////
struct v4l2_requestbuffers req;
memset(&req, 0, sizeof(req));
req.count = 20;
req.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
req.memory = V4L2_MEMORY_MMAP;
if (-1 == Xioctl(VIDIOC_REQBUFS, &req))
{
ErrnoExit("Reqbufs");
}
if (req.count < 2)
throw "Not enough buffer memory !";
mNBuffers = req.count;
mBuffers = new CBuffer[mNBuffers];
if (!mBuffers) throw "Out of memory !";
for (unsigned int i = 0; i < mNBuffers; i++)
{
struct v4l2_buffer buf;
memset(&buf, 0, sizeof(buf));
buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
buf.memory = V4L2_MEMORY_MMAP;
buf.index = i;
if (-1 == Xioctl(VIDIOC_QUERYBUF, &buf))
ErrnoExit("Querybuf");
mBuffers[i].mLength = buf.length;
mBuffers[i].pStart = mmap(NULL, buf.length, PROT_READ | PROT_WRITE, MAP_SHARED, mFD, buf.m.offset);
if (mBuffers[i].pStart == MAP_FAILED)
ErrnoExit("mmap");
}
//////////////////
// Start streaming
//////////////////
unsigned int i;
enum v4l2_buf_type type;
struct v4l2_buffer buf;
for (i = 0; i < mNBuffers; i++)
{
memset(&buf, 0, sizeof(buf));
buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
buf.memory = V4L2_MEMORY_MMAP;
buf.index = i;
if (-1 == Xioctl(VIDIOC_QBUF, &buf))
ErrnoExit("QBUF");
}
type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
if (-1==Xioctl(VIDIOC_STREAMON, &type))
ErrnoExit("STREAMON");
And the last two parts in the main loop :
//////////////////
// Get frame
//////////////////
FD_ZERO(&fds);
FD_SET(mFD, &fds);
tv.tv_sec = 3;
tv.tv_usec = 0;
struct timespec t0, t1;
clock_gettime(CLOCK_REALTIME, &t0);
// This line takes about 140ms which I don't get
r = select(mFD + 1, &fds, NULL, NULL, &tv);
clock_gettime(CLOCK_REALTIME, &t1);
cout << "select time : " << ((float)(t1.tv_sec - t0.tv_sec))*1000.0f + ((float)(t1.tv_nsec - t0.tv_nsec))/1000000.0f << "\n";
if (-1 == r)
{
if (EINTR == errno)
continue;
ErrnoExit("select");
}
if (r == 0)
throw "Select timeout\n";
// Read the frame
//~ struct v4l2_buffer buf;
memset(&mCurBuf, 0, sizeof(mCurBuf));
mCurBuf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
mCurBuf.memory = V4L2_MEMORY_MMAP;
// DQBUF about 2ms
if (-1 == Xioctl(VIDIOC_DQBUF, &mCurBuf))
{
if (errno == EAGAIN) continue;
ErrnoExit("DQBUF");
}
clock_gettime(CLOCK_REALTIME, &mCaptureTime);
// Manage frame in mBuffers[buf.index]
mCurBufIndex = mCurBuf.index;
break;
}
//////////////////
// Release frame
//////////////////
if (-1 == Xioctl(VIDIOC_QBUF, &mCurBuf))
ErrnoExit("VIDIOC_QBUF during mainloop");
I have been looking at the various methods of using the picamera and I'm hardly an expert, but it would seem that the default camera settings are what's holding you back. There are many modes and switches. I don't know if they are exposed through ioctls or how yet, I just started. But I had to use a program called v4l-ctl to get things ready for the mode I wanted. A deep look at that source and some code lifting should let you achieve greatness. Oh, and I doubt the select call is an issue, it's simply waiting on the descriptor which is slow to become readable. Depending on mode, etc. there can be a mandatory wait for autoexposure, etc.
Edit: I meant to say "a default setting" as you've changed some. There are also rules not codified in the driver.
The pixelformat matters. I encountered the similar low-fps problem, and I spent some time testing using my program in Go and C++ using V4L2 API. What I found is, Rpi Cam Module has good accelaration with H.264/MJPG pixelformat. I can easily get 60fps at 640*480, same as non-compressed formats like YUYV/RGB. However JPEG runs very slow. I can only get 4fps even at 320*240. And I also found the current is higher (>700mA) with JPEG compare to 500mA with H.264/MJPG.

FFmpeg + OpenAL - playback streaming sound from video won't work

I am decoding an OGG video (theora & vorbis as codecs) and want to show it on the screen (using Ogre 3D) while playing its sound. I can decode the image stream just fine and the video plays perfectly with the correct frame rate, etc.
However, I cannot get the sound to play at all with OpenAL.
Edit: I managed to make the playing sound resemble the actual audio in the video at least somewhat. Updated sample code.
Edit 2: I was able to get "almost" correct sound now. I had to set OpenAL to use AL_FORMAT_STEREO_FLOAT32 (after initializing the extension) instead of just STEREO16. Now the sound is "only" extremely high pitched and stuttering, but at the correct speed.
Here is how I decode audio packets (in a background thread, the equivalent works just fine for the image stream of the video file):
//------------------------------------------------------------------------------
int decodeAudioPacket( AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
// Decode audio frame
int got_frame = 0;
int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
if (decoded < 0)
{
p_videoInfo.error = "Error decoding audio frame.";
return decoded;
}
// Frame is complete, store it in audio frame queue
if (got_frame)
{
int bufferSize = av_samples_get_buffer_size(NULL, p_audioCodecContext->channels, p_frame->nb_samples,
p_audioCodecContext->sample_fmt, 0);
int64_t duration = p_frame->pkt_duration;
int64_t dts = p_frame->pkt_dts;
if (staticOgreLog)
{
staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: "
+ boost::lexical_cast<std::string>(bufferSize) + " / "
+ boost::lexical_cast<std::string>(duration) + " / "
+ boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
}
// Create the audio frame
AudioFrame* frame = new AudioFrame();
frame->dataSize = bufferSize;
frame->data = new uint8_t[bufferSize];
if (p_frame->channels == 2)
{
memcpy(frame->data, p_frame->data[0], bufferSize >> 1);
memcpy(frame->data + (bufferSize >> 1), p_frame->data[1], bufferSize >> 1);
}
else
{
memcpy(frame->data, p_frame->data, bufferSize);
}
double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
frame->lifeTime = duration * timeBase;
p_player->addAudioFrame(frame);
}
return decoded;
}
So, as you can see, I decode the frame, memcpy it to my own struct, AudioFrame. Now, when the sound is played, I use these audio frame like this:
int numBuffers = 4;
ALuint buffers[4];
alGenBuffers(numBuffers, buffers);
ALenum success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on alGenBuffers : " + Ogre::StringConverter::toString(success) + alGetString(success));
return;
}
// Fill a number of data buffers with audio from the stream
std::vector<AudioFrame*> audioBuffers;
std::vector<unsigned int> audioBufferSizes;
unsigned int numReturned = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffers, audioBuffers, audioBufferSizes);
// Assign the data buffers to the OpenAL buffers
for (unsigned int i = 0; i < numReturned; ++i)
{
alBufferData(buffers[i], _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on alBufferData : " + Ogre::StringConverter::toString(success) + alGetString(success)
+ " size: " + Ogre::StringConverter::toString(audioBufferSizes[i]));
return;
}
}
// Queue the buffers into OpenAL
alSourceQueueBuffers(_source, numReturned, buffers);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error queuing streaming buffers: " + Ogre::StringConverter::toString(success) + alGetString(success));
return;
}
}
alSourcePlay(_source);
The format and frequency I give to OpenAL are AL_FORMAT_STEREO_FLOAT32 (it is a stereo sound stream, and I did initialize the FLOAT32 extension) and 48000 (which is the sample rate of the AVCodecContext of the audio stream).
And during playback, I do the following to refill OpenAL's buffers:
ALint numBuffersProcessed;
// Check if OpenAL is done with any of the queued buffers
alGetSourcei(_source, AL_BUFFERS_PROCESSED, &numBuffersProcessed);
if(numBuffersProcessed <= 0)
return;
// Fill a number of data buffers with audio from the stream
std::vector<AudiFrame*> audioBuffers;
std::vector<unsigned int> audioBufferSizes;
unsigned int numFilled = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffersProcessed, audioBuffers, audioBufferSizes);
// Assign the data buffers to the OpenAL buffers
ALuint buffer;
for (unsigned int i = 0; i < numFilled; ++i)
{
// Pop the oldest queued buffer from the source,
// fill it with the new data, then re-queue it
alSourceUnqueueBuffers(_source, 1, &buffer);
ALenum success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error Unqueuing streaming buffers: " + Ogre::StringConverter::toString(success));
return;
}
alBufferData(buffer, _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on re- alBufferData: " + Ogre::StringConverter::toString(success));
return;
}
alSourceQueueBuffers(_source, 1, &buffer);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error re-queuing streaming buffers: " + Ogre::StringConverter::toString(success) + " "
+ alGetString(success));
return;
}
}
// Make sure the source is still playing,
// and restart it if needed.
ALint playStatus;
alGetSourcei(_source, AL_SOURCE_STATE, &playStatus);
if(playStatus != AL_PLAYING)
alSourcePlay(_source);
As you can see, I do quite heavy error checking. But I do not get any errors, neither from OpenAL nor from FFmpeg.
Edit: What I hear somewhat resembles the actual audio from the video, but VERY high pitched and stuttering VERY much. Also, it seems to be playing on top of TV noise. Very strange. Plus, it is playing much slower than the correct audio would.
Edit: 2 After using AL_FORMAT_STEREO_FLOAT32, the sound plays at the correct speed, but is still very high pitched and stuttering (though less than before).
The video itself is not broken, it can be played fine on any player. OpenAL can also play *.way files just fine in the same application, so it is also working.
Any ideas what could be wrong here or how to do this correctly?
My only guess is that somehow, FFmpeg's decode function does not produce data OpenGL can read. But this is as far as the FFmpeg decode example goes, so I don't know what's missing. As I understand it, the decode_audio4 function decodes the frame to raw data. And OpenAL should be able to work with RAW data (or rather, doesn't work with anything else).
So, I finally figured out how to do it. Gee, what a mess. It was a hint from a user on the libav-users mailing list that put me on the correct path.
Here are my mistakes:
Using the wrong format in the alBufferData function. I used AL_FORMAT_STEREO16 (as that is what every single streaming example with OpenAL uses). I should have used AL_FORMAT_STEREO_FLOAT32, as the video I stream is Ogg and vorbis is stored in floating points. And using swr_convert to convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16 just crashes. No idea why.
Not using swr_convert to convert the decoded audio frame to the target format. After I was trying to use swr_convert to convert from FLTP to S16, and it would simply crash without a reason given, I assumed it was broken. But after figuring out my first mistake, I tried again, converting from FLTP to FLT (non-planar) and then it worked! So OpenAL uses interleaved format, not planar. Good to know.
So here is the decodeAudioPacket function that is working for me with Ogg video, vorbis audio stream:
int decodeAudioPacket( AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
SwrContext* p_swrContext, uint8_t** p_destBuffer, int p_destLinesize,
FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
// Decode audio frame
int got_frame = 0;
int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
if (decoded < 0)
{
p_videoInfo.error = "Error decoding audio frame.";
return decoded;
}
if(decoded <= p_packet.size)
{
/* Move the unread data to the front and clear the end bits */
int remaining = p_packet.size - decoded;
memmove(p_packet.data, &p_packet.data[decoded], remaining);
av_shrink_packet(&p_packet, remaining);
}
// Frame is complete, store it in audio frame queue
if (got_frame)
{
int outputSamples = swr_convert(p_swrContext,
p_destBuffer, p_destLinesize,
(const uint8_t**)p_frame->extended_data, p_frame->nb_samples);
int bufferSize = av_get_bytes_per_sample(AV_SAMPLE_FMT_FLT) * p_videoInfo.audioNumChannels
* outputSamples;
int64_t duration = p_frame->pkt_duration;
int64_t dts = p_frame->pkt_dts;
if (staticOgreLog)
{
staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: "
+ boost::lexical_cast<std::string>(bufferSize) + " / "
+ boost::lexical_cast<std::string>(duration) + " / "
+ boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
}
// Create the audio frame
AudioFrame* frame = new AudioFrame();
frame->dataSize = bufferSize;
frame->data = new uint8_t[bufferSize];
memcpy(frame->data, p_destBuffer[0], bufferSize);
double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
frame->lifeTime = duration * timeBase;
p_player->addAudioFrame(frame);
}
return decoded;
}
And here is how I initialize the context and the destination buffer:
// Initialize SWR context
SwrContext* swrContext = swr_alloc_set_opts(NULL,
audioCodecContext->channel_layout, AV_SAMPLE_FMT_FLT, audioCodecContext->sample_rate,
audioCodecContext->channel_layout, audioCodecContext->sample_fmt, audioCodecContext->sample_rate,
0, NULL);
int result = swr_init(swrContext);
// Create destination sample buffer
uint8_t** destBuffer = NULL;
int destBufferLinesize;
av_samples_alloc_array_and_samples( &destBuffer,
&destBufferLinesize,
videoInfo.audioNumChannels,
2048,
AV_SAMPLE_FMT_FLT,
0);