how to use self-defined inputSamples for trasforming pcm to aac with facc - pcm

I'm trying to transform a live stream with g726 and h264 to mp4. I decode g726 to pcm then use faac to encode pcm to aac. Every g726 audio packet I receive is 320 bytes. After decoding, the pcm size is 1280 bytes, so the sample number is 640. But the inputSamples which faacEncOpen gives me is 1024, and my inputFormat is FAAC_INPUT_16BIT. When I pass 640 to faacEncEncode, the sound is not good at all. Does anyone know how to fix this. Thanks in advance!
// (1) Open FAAC engine
hEncoder = faacEncOpen(nSampleRate, nChannels, &nInputSamples, &nMaxOutputBytes); // nInputSamples the function returns is 1024
if(hEncoder == NULL)
{
printf("[ERROR] Failed to call faacEncOpen()\n");
return -1;
}
nInputSamples = 640;// here overwrites the input samples returned from faacEncOpen
nPCMBufferSize = nInputSamples * nPCMBitSize / 8; // nPCMBitSize is 16
pbPCMBuffer = new BYTE [nPCMBufferSize];
pbAACBuffer = new BYTE [nMaxOutputBytes];
// (2.1) Get current encoding configuration
pConfiguration = faacEncGetCurrentConfiguration(hEncoder);
pConfiguration->inputFormat = FAAC_INPUT_16BIT;
// (2.2) Set encoding configuration
nRet = faacEncSetConfiguration(hEncoder, pConfiguration);
for(int i = 0; 1; i++)
{
nBytesRead = fread(pbPCMBuffer, 1, nPCMBufferSize, fpIn);
nInputSamples = nBytesRead * 8 / nPCMBitSize;
// (3) Encode
nRet = faacEncEncode(
hEncoder, (int*) pbPCMBuffer, nInputSamples, pbAACBuffer, nMaxOutputBytes);
fwrite(pbAACBuffer, 1, nRet, fpOut);
printf("%d: faacEncEncode returns %d\n", i, nRet);
if(nBytesRead <= 0)
{
break;
}
}

Related

ESP32 i2s_read returns empty buffer after calling this function

I am trying to record audio from an INMP441 which is connected to a ESP32 but returning the buffer containing the bytes the microphone read always leads to something which is NULL.
The code for setting up i2s and the microphone is this:
// i2s config
const i2s_config_t i2s_config = {
.mode = i2s_mode_t(I2S_MODE_MASTER | I2S_MODE_RX), // receive
.sample_rate = SAMPLE_RATE, // 44100 (44,1KHz)
.bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT, // 32 bits per sample
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT, // use right channel
.communication_format = i2s_comm_format_t(I2S_COMM_FORMAT_I2S | I2S_COMM_FORMAT_I2S_MSB),
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1, // interrupt level 1
.dma_buf_count = 64, // number of buffers
.dma_buf_len = SAMPLES_PER_BUFFER}; // 512
// pin config
const i2s_pin_config_t pin_config = {
.bck_io_num = gpio_sck, // serial clock, sck (gpio 33)
.ws_io_num = gpio_ws, // word select, ws (gpio 32)
.data_out_num = I2S_PIN_NO_CHANGE, // only used for speakers
.data_in_num = gpio_sd // serial data, sd (gpio 34)
};
// config i2s driver and pins
// fct must be called before any read/write
esp_err_t err = i2s_driver_install(I2S_PORT, &i2s_config, 0, NULL);
if (err != ESP_OK)
{
Serial.printf("Failed installing the driver: %d\n", err);
}
err = i2s_set_pin(I2S_PORT, &pin_config);
if (err != ESP_OK)
{
Serial.printf("Failed setting pin: %d\n", err);
}
Serial.println("I2S driver installed! :-)");
Setting up the i2s stuff is no problem at all. The tricky part for me is reading from the i2s:
// 44KHz * Byte per sample * time in seconds = total size in bytes
const size_t recordSize = (SAMPLE_RATE * I2S_BITS_PER_SAMPLE_32BIT / 8) * recordTime; //recordTime = 5s
// size in bytes
size_t totalReadSize = 0;
// 32 bits per sample set in config * 1024 samples per buffers = total bits per buffer
char *samples = (char *)calloc(totalBitsPerBuffer, sizeof(char));
// number of bytes read
size_t bytesRead;
Serial.println("Start recording...");
// read until wanted size is reached
while (totalReadSize < recordSize)
{
// read to buffer
esp_err_t err = i2s_read(I2S_PORT, (void *)samples, totalBitsPerBuffer, &bytesRead, portMAX_DELAY);
// check if error occurd, if so stop recording
if (err != ESP_OK)
{
Serial.println("Error while recording!");
break;
}
// check if bytes read works → yes
/*
for (int i = 0; i < bytesRead; i++)
{
uint8_t sample = (uint8_t) samples[i];
Serial.print(sample);
} */
// add read size to total read size
totalReadSize += bytesRead;
// Serial.printf("Currently recorded %d%% \n", totalReadSize * 100 / recordSize);
}
// convert bytes to mb
double_t totalReadSizeMB = (double_t)totalReadSize / 1e+6;
Serial.printf("Total read size: %fMb\n", totalReadSizeMB);
Serial.println("Samples deref");
Serial.println(*samples);
Serial.println("Samples");
Serial.println(samples);
return samples;
Using this code leads to the following output:
I2S driver installed! :-)
Start recording...
Total read size: 0.884736Mb
Samples deref
␀
Samples
When I uncomment the part where I iterate over the bytes read part I get something like this:
200224231255255224210022418725525522493000902552550238002241392542552241520020425225508050021624525501286700194120022461104022421711102242271030018010402242510000188970224141930022291022410185022487830021679001127500967200666902241776600246610224895902244757022418353002224802242274302249741022419339009435001223102242432602243322022412120001241402245911022418580084402248325525522461252255044249255224312452552242212372552241272352550342302552241212262552242112212550252216255014621325501682092550112205255224161202255224237198255224235194255224231922552248518725501141832550421812552241951762550144172255018168255034164255224173157255018215525522455152255028148255021014425505214025522487137255014613225522412112825502361252550180120255018011725522451172550252113255224133111255061082550248105255224891042552249910125522439972550138942552242279225503287255224101832552242478125522410178255224231732552244970255224336525501766225501426125502325625522424553255224109492550186[...]
This shows that the microphone is able to record, but I cant return the actual value of the buffer.
While programming this code I looked up at the official doku and some code which seems to work elsewhere.
I am also new to C++ and am not used to work with pointers.
Does anyone know what the problem could be?

FFmpeg Opus choppy sound UPDATED DESCRIPTION

I'm using FFmpeg and try to encode and decode a raw PCM sound to Opus using a built-in FFmpeg "opus" codec. My input samples are raw PCM 8000 Hz 16 bit mono, in AV_SAMPLE_FMT_S16 format. Since Opus requires sample format AV_SAMPLE_FMT_FLTP and sample rate 48000 Hz only, so I resample my samples before encode them.
I have two instances of ResamplerAudio class that does the work of resampling audio samples and has a member of SwrContext, I use the first instance of ResamplerAudio for resampling a raw PCM input audio before encoding and the second for resampling decoded audio to get it's format and sample rate the same as source values of input raw audio.
ResamplerAudio class has a function that init it's SwrContext member like this:
void ResamplerAudio::init(AVCodecContext *codecContext, int inSampleRate, int outSampleRate, AVSampleFormat inSampleFmt, AVSampleFormat outSampleFmt)
{
swrContext = swr_alloc();
if (!swrContext)
{
LOGE(TAG, "[init] Couldn't allocate swr context");
return;
}
av_opt_set_int(swrContext, "in_channel_layout", (int64_t) codecContext->channel_layout, 0);
av_opt_set_int(swrContext, "out_channel_layout", (int64_t) codecContext->channel_layout, 0);
av_opt_set_int(swrContext, "in_channel_count", codecContext->channels, 0);
av_opt_set_int(swrContext, "out_channel_count", codecContext->channels, 0);
av_opt_set_int(swrContext, "in_sample_rate", inSampleRate, 0);
av_opt_set_int(swrContext, "out_sample_rate", outSampleRate, 0);
av_opt_set_sample_fmt(swrContext, "in_sample_fmt", inSampleFmt, 0);
av_opt_set_sample_fmt(swrContext, "out_sample_fmt", outSampleFmt, 0);
int ret = swr_init(swrContext);
if (ret < 0)
{
LOGE(TAG, "[init] swr_init error: %s", av_err2str(ret));
return;
}
LOGD(TAG, "[init] success codecContext->channel_layout: %d; inSampleRate: %d; outSampleRate: %d; inSampleFmt: %d; outSampleFmt: %d", (int) codecContext->channel_layout, inSampleRate, outSampleRate, inSampleFmt, outSampleFmt);
}
And I call ResamplerAudio::init function for the first instance of ResamplerAudio (this instance do resamping a raw PCM input audio before encoding and I called it resamplerEncoder) with the following args:
resamplerEncoder->init(contextEncoder, 8000, 48000, AV_SAMPLE_FMT_S16, AV_SAMPLE_FMT_FLTP);
The second instance of ResamplerAudio (this instance do resamping after decoding audio from Opus and I called it resamplerDecoder) I init with the following args:
resamplerDecoder->init(contextDecoder, 48000, 8000, AV_SAMPLE_FMT_FLTP, AV_SAMPLE_FMT_S16);
The function of ResamplerAudio that does resampling looks like this:
std::vector<uint8_t> ResamplerAudio::convert(uint8_t **inData, int inSamplesCount, int outChannels, int outFormat)
{
std::vector<uint8_t> result;
uint8_t *dstData = NULL;
const int dstNbSamples = swr_get_out_samples(swrContext, inSamplesCount);
av_samples_alloc(&dstData, NULL, outChannels, dstNbSamples, AVSampleFormat(outFormat), 1);
int resampledSize = swr_convert(swrContext, &dstData, dstNbSamples, (const uint8_t **)inData, inSamplesCount);
int dstBufSize = av_samples_get_buffer_size(NULL, outChannels, resampledSize, AVSampleFormat(outFormat), 1);
if (dstBufSize <= 0) return result;
std::copy(&dstData[0], &dstData[dstBufSize], std::back_inserter(result));
return result;
}
And I call ResamplerAudio::convert function before encoding with the following args:
// data - an array of raw pcm audio
// dataLength - the length of data array
// getSamplesCount() - function that calculates samples count
// frameEncode - AVFrame that using for encode audio
std::vector<uint8_t> resampledData = resamplerEncoder->convert(&data, getSamplesCount(dataLength, frameEncode->channels, AV_SAMPLE_FMT_S16), frameEncode->channels, frameEncode->format);
getSamplesCount() function looks like this:
getSamplesCount(int bytesCount, int channels, AVSampleFormat format)
{
return bytesCount / av_get_bytes_per_sample(format) / channels;
}
After that I fill my frameEncode with resampled samples:
memcpy(&frame->data[0][0], &resampledData[0], sizeof(uint8_t) * resampledDataLength);
And pass frameEncode to encoding like this encodeFrame(resampledDataLength):
void encodeFrame(int dataLength)
{
/* send the frame for encoding */
int ret = avcodec_send_frame(contextEncoder, frameEncode);
if (ret < 0)
{
LOGE(TAG, "[encodeFrame] avcodec_send_frame error: %s", av_err2str(ret));
return;
}
/* read all the available output packets (in general there may be any number of them */
while (ret >= 0)
{
ret = avcodec_receive_packet(contextEncoder, packetEncode);
if (ret < 0 && ret != AVERROR(EAGAIN)) LOGE(TAG, "[encodeFrame] error in avcodec_receive_packet: %s", av_err2str(ret));
if (ret < 0) break;
// encodedData - std::vector<uint8_t> that stores encoded data
std::copy(&packetEncode->data[0], &packetEncode->data[dataLength], std::back_inserter(encodedData));
av_packet_unref(packetEncode);
}
}
Then I decode my encoded samples and do resampling to get back them in source sample format and sample rate so I call ResamplerAudio::convert function for resamplerDecoder with the following args:
// frameDecode - AVFrame that holds decoded audio
std::vector<uint8_t> resampledData = resamplerDecoder->convert(frameDecode->data, frameDecode->nb_samples, frameDecode->channels, AV_SAMPLE_FMT_S16);
And result sound is choppy and I also noticed that the decoded array size is bigger than the source array size with raw pcm audio.
Please any ideas what I'm doing wrong?
UPD 18.05.2020
I tested my resampling logic, I did resampling of raw pcm sound without any encoding and decoding routines. First I tried to convert the sample rate of input sound from 8000 Hz to 48000 Hz than I took resampled samples from step above and convert it's sample rate from 48000 Hz to 8000 Hz and the result sound is perfect and clean, also I did the same steps but I converted not a sample rate but a sample format from AV_SAMPLE_FMT_S16 to AV_SAMPLE_FMT_FLTP and vice versa and again the result sound is perfect and clean, also I got the same result when I coverted both a sample rate and a sample format.
So I assume that the problem of distorted and choppy sound is in my encoding or decoding routine, I think most likely in decoding routine because after decoding I ALWAYS get AVFrame with 960 nb_samples despite what was the size of input sound.
My decoding routine looks like this:
std::vector<uint8_t> decode(uint8_t *data, unsigned int dataLength)
{
decodedData.clear();
int dataSize = dataLength;
while (dataSize > 0)
{
if (!frameDecode)
{
frameDecode = av_frame_alloc();
if (!frameDecode)
{
LOGE(TAG, "[decode] Couldn't allocate the frame");
return EMPTY_DATA;
}
}
ret = av_parser_parse2(parser, contextDecoder, &packetDecode->data, &packetDecode->size, &data[0], dataSize, AV_NOPTS_VALUE, AV_NOPTS_VALUE, 0);
if (ret < 0) {
LOGE(TAG, "[decode] av_parser_parse2 error: %s", av_err2str(ret));
return EMPTY_DATA;
}
data += ret;
dataSize -= ret;
doDecode();
}
return decodedData;
}
void doDecode()
{
if (packetDecode->size) {
/* send the packet with the compressed data to the decoder */
int ret = avcodec_send_packet(contextDecoder, packetDecode);
if (ret < 0) LOGE(TAG, "[decode] avcodec_send_packet error: %s", av_err2str(ret));
/* read all the output frames (in general there may be any number of them */
while (ret >= 0)
{
ret = avcodec_receive_frame(contextDecoder, frameDecode);
if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF) LOGE(TAG, "[decode] avcodec_receive_frame error: %s", av_err2str(ret));
if (ret < 0) break;
std::vector<uint8_t> resampledData = resamplerDecoder->convert(frameDecode->data, frameDecode->nb_samples, frameDecode->channels, AV_SAMPLE_FMT_S16);
if (!resampledData.size()) continue;
std::copy(&resampledData.data()[0], &resampledData.data()[resampledData.size()], std::back_inserter(decodedData));
}
}
}
UPD 30.05.2020
I decided to refuse to use FFmpeg in my project and use libopus 1.3.1 instead, so I made a wrapper around it and it works fine.

Live555 truncates encoded data of FFMpeg

I am trying to stream H264 based data using Live555 over RTSP.
I am capturing data using V4L2, and then encodes it using FFMPEG and then passing data to Live555's DeviceSource file, in that I using H264VideoStreamFramer class,
Below is my codec settings to configure AVCodecContext of encoder,
codec = avcodec_find_encoder_by_name(CODEC_NAME);
if (!codec) {
cerr << "Codec " << codec_name << " not found\n";
exit(1);
}
c = avcodec_alloc_context3(codec);
if (!c) {
cerr << "Could not allocate video codec context\n";
exit(1);
}
pkt = av_packet_alloc();
if (!pkt)
exit(1);
/* put sample parameters */
c->bit_rate = 400000;
/* resolution must be a multiple of two */
c->width = PIC_HEIGHT;
c->height = PIC_WIDTH;
/* frames per second */
c->time_base = (AVRational){1, FPS};
c->framerate = (AVRational){FPS, 1};
c->gop_size = 10;
c->max_b_frames = 1;
c->pix_fmt = AV_PIX_FMT_YUV420P;
c->rtp_payload_size = 30000;
if (codec->id == AV_CODEC_ID_H264)
av_opt_set(c->priv_data, "preset", "fast", 0);
av_opt_set_int(c->priv_data, "slice-max-size", 30000, 0);
/* open it */
ret = avcodec_open2(c, codec, NULL);
if (ret < 0) {
cerr << "Could not open codec\n";
exit(1);
}
And I am getting encoded data using avcodec_receive_packet() function. which will return AVPacket.
And I am passing AVPacket's data into DeviceSource file below is code snippet of my Live555 code:
void DeviceSource::deliverFrame() {
if (!isCurrentlyAwaitingData()) return; // we're not ready for the data yet
u_int8_t* newFrameDataStart = (u_int8_t*) pkt->data;
unsigned newFrameSize = pkt->size; //%%% TO BE WRITTEN %%%
// Deliver the data here:
if (newFrameSize > fMaxSize) { // Condition becomes true many times
fFrameSize = fMaxSize;
fNumTruncatedBytes = newFrameSize - fMaxSize;
} else {
fFrameSize = newFrameSize;
}
gettimeofday(&fPresentationTime, NULL); // If you have a more accurate time - e.g., from an encoder - then use that instead.
// If the device is *not* a 'live source' (e.g., it comes instead from a file or buffer), then set "fDurationInMicroseconds" here.
memmove(fTo, newFrameDataStart, fFrameSize);
}
But here, sometimes my packet's size is getting more than fMaxSize value and as per LIVE555 logic it will truncate frame data, so that sometimes I am getting bad frames on my VLC,
From Live555 forum, I get to know that encoder should not send packet whose size is more than fMaxSize value, so my question is:
How to restrict encoder to limit size of packet?
Thanks in Advance,
Harshil
You can increase the maximum allowed sample size by changing "maxSize" in the OutPacketBuffer class in MediaSink.cpp. This worked for me. There are cases we may require high-quality video to be streamed, I don't think we will always be able to restrict the encoder to not to produce samples of size more than a particular value which would result in video quality issues. In fact, the samples are fragmented by the UDP sink live555 to match the default MTU (1500), so increasing the max sample size limit has no side effects.

FFmpeg + OpenAL - playback streaming sound from video won't work

I am decoding an OGG video (theora & vorbis as codecs) and want to show it on the screen (using Ogre 3D) while playing its sound. I can decode the image stream just fine and the video plays perfectly with the correct frame rate, etc.
However, I cannot get the sound to play at all with OpenAL.
Edit: I managed to make the playing sound resemble the actual audio in the video at least somewhat. Updated sample code.
Edit 2: I was able to get "almost" correct sound now. I had to set OpenAL to use AL_FORMAT_STEREO_FLOAT32 (after initializing the extension) instead of just STEREO16. Now the sound is "only" extremely high pitched and stuttering, but at the correct speed.
Here is how I decode audio packets (in a background thread, the equivalent works just fine for the image stream of the video file):
//------------------------------------------------------------------------------
int decodeAudioPacket( AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
// Decode audio frame
int got_frame = 0;
int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
if (decoded < 0)
{
p_videoInfo.error = "Error decoding audio frame.";
return decoded;
}
// Frame is complete, store it in audio frame queue
if (got_frame)
{
int bufferSize = av_samples_get_buffer_size(NULL, p_audioCodecContext->channels, p_frame->nb_samples,
p_audioCodecContext->sample_fmt, 0);
int64_t duration = p_frame->pkt_duration;
int64_t dts = p_frame->pkt_dts;
if (staticOgreLog)
{
staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: "
+ boost::lexical_cast<std::string>(bufferSize) + " / "
+ boost::lexical_cast<std::string>(duration) + " / "
+ boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
}
// Create the audio frame
AudioFrame* frame = new AudioFrame();
frame->dataSize = bufferSize;
frame->data = new uint8_t[bufferSize];
if (p_frame->channels == 2)
{
memcpy(frame->data, p_frame->data[0], bufferSize >> 1);
memcpy(frame->data + (bufferSize >> 1), p_frame->data[1], bufferSize >> 1);
}
else
{
memcpy(frame->data, p_frame->data, bufferSize);
}
double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
frame->lifeTime = duration * timeBase;
p_player->addAudioFrame(frame);
}
return decoded;
}
So, as you can see, I decode the frame, memcpy it to my own struct, AudioFrame. Now, when the sound is played, I use these audio frame like this:
int numBuffers = 4;
ALuint buffers[4];
alGenBuffers(numBuffers, buffers);
ALenum success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on alGenBuffers : " + Ogre::StringConverter::toString(success) + alGetString(success));
return;
}
// Fill a number of data buffers with audio from the stream
std::vector<AudioFrame*> audioBuffers;
std::vector<unsigned int> audioBufferSizes;
unsigned int numReturned = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffers, audioBuffers, audioBufferSizes);
// Assign the data buffers to the OpenAL buffers
for (unsigned int i = 0; i < numReturned; ++i)
{
alBufferData(buffers[i], _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on alBufferData : " + Ogre::StringConverter::toString(success) + alGetString(success)
+ " size: " + Ogre::StringConverter::toString(audioBufferSizes[i]));
return;
}
}
// Queue the buffers into OpenAL
alSourceQueueBuffers(_source, numReturned, buffers);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error queuing streaming buffers: " + Ogre::StringConverter::toString(success) + alGetString(success));
return;
}
}
alSourcePlay(_source);
The format and frequency I give to OpenAL are AL_FORMAT_STEREO_FLOAT32 (it is a stereo sound stream, and I did initialize the FLOAT32 extension) and 48000 (which is the sample rate of the AVCodecContext of the audio stream).
And during playback, I do the following to refill OpenAL's buffers:
ALint numBuffersProcessed;
// Check if OpenAL is done with any of the queued buffers
alGetSourcei(_source, AL_BUFFERS_PROCESSED, &numBuffersProcessed);
if(numBuffersProcessed <= 0)
return;
// Fill a number of data buffers with audio from the stream
std::vector<AudiFrame*> audioBuffers;
std::vector<unsigned int> audioBufferSizes;
unsigned int numFilled = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffersProcessed, audioBuffers, audioBufferSizes);
// Assign the data buffers to the OpenAL buffers
ALuint buffer;
for (unsigned int i = 0; i < numFilled; ++i)
{
// Pop the oldest queued buffer from the source,
// fill it with the new data, then re-queue it
alSourceUnqueueBuffers(_source, 1, &buffer);
ALenum success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error Unqueuing streaming buffers: " + Ogre::StringConverter::toString(success));
return;
}
alBufferData(buffer, _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on re- alBufferData: " + Ogre::StringConverter::toString(success));
return;
}
alSourceQueueBuffers(_source, 1, &buffer);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error re-queuing streaming buffers: " + Ogre::StringConverter::toString(success) + " "
+ alGetString(success));
return;
}
}
// Make sure the source is still playing,
// and restart it if needed.
ALint playStatus;
alGetSourcei(_source, AL_SOURCE_STATE, &playStatus);
if(playStatus != AL_PLAYING)
alSourcePlay(_source);
As you can see, I do quite heavy error checking. But I do not get any errors, neither from OpenAL nor from FFmpeg.
Edit: What I hear somewhat resembles the actual audio from the video, but VERY high pitched and stuttering VERY much. Also, it seems to be playing on top of TV noise. Very strange. Plus, it is playing much slower than the correct audio would.
Edit: 2 After using AL_FORMAT_STEREO_FLOAT32, the sound plays at the correct speed, but is still very high pitched and stuttering (though less than before).
The video itself is not broken, it can be played fine on any player. OpenAL can also play *.way files just fine in the same application, so it is also working.
Any ideas what could be wrong here or how to do this correctly?
My only guess is that somehow, FFmpeg's decode function does not produce data OpenGL can read. But this is as far as the FFmpeg decode example goes, so I don't know what's missing. As I understand it, the decode_audio4 function decodes the frame to raw data. And OpenAL should be able to work with RAW data (or rather, doesn't work with anything else).
So, I finally figured out how to do it. Gee, what a mess. It was a hint from a user on the libav-users mailing list that put me on the correct path.
Here are my mistakes:
Using the wrong format in the alBufferData function. I used AL_FORMAT_STEREO16 (as that is what every single streaming example with OpenAL uses). I should have used AL_FORMAT_STEREO_FLOAT32, as the video I stream is Ogg and vorbis is stored in floating points. And using swr_convert to convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16 just crashes. No idea why.
Not using swr_convert to convert the decoded audio frame to the target format. After I was trying to use swr_convert to convert from FLTP to S16, and it would simply crash without a reason given, I assumed it was broken. But after figuring out my first mistake, I tried again, converting from FLTP to FLT (non-planar) and then it worked! So OpenAL uses interleaved format, not planar. Good to know.
So here is the decodeAudioPacket function that is working for me with Ogg video, vorbis audio stream:
int decodeAudioPacket( AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
SwrContext* p_swrContext, uint8_t** p_destBuffer, int p_destLinesize,
FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
// Decode audio frame
int got_frame = 0;
int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
if (decoded < 0)
{
p_videoInfo.error = "Error decoding audio frame.";
return decoded;
}
if(decoded <= p_packet.size)
{
/* Move the unread data to the front and clear the end bits */
int remaining = p_packet.size - decoded;
memmove(p_packet.data, &p_packet.data[decoded], remaining);
av_shrink_packet(&p_packet, remaining);
}
// Frame is complete, store it in audio frame queue
if (got_frame)
{
int outputSamples = swr_convert(p_swrContext,
p_destBuffer, p_destLinesize,
(const uint8_t**)p_frame->extended_data, p_frame->nb_samples);
int bufferSize = av_get_bytes_per_sample(AV_SAMPLE_FMT_FLT) * p_videoInfo.audioNumChannels
* outputSamples;
int64_t duration = p_frame->pkt_duration;
int64_t dts = p_frame->pkt_dts;
if (staticOgreLog)
{
staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: "
+ boost::lexical_cast<std::string>(bufferSize) + " / "
+ boost::lexical_cast<std::string>(duration) + " / "
+ boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
}
// Create the audio frame
AudioFrame* frame = new AudioFrame();
frame->dataSize = bufferSize;
frame->data = new uint8_t[bufferSize];
memcpy(frame->data, p_destBuffer[0], bufferSize);
double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
frame->lifeTime = duration * timeBase;
p_player->addAudioFrame(frame);
}
return decoded;
}
And here is how I initialize the context and the destination buffer:
// Initialize SWR context
SwrContext* swrContext = swr_alloc_set_opts(NULL,
audioCodecContext->channel_layout, AV_SAMPLE_FMT_FLT, audioCodecContext->sample_rate,
audioCodecContext->channel_layout, audioCodecContext->sample_fmt, audioCodecContext->sample_rate,
0, NULL);
int result = swr_init(swrContext);
// Create destination sample buffer
uint8_t** destBuffer = NULL;
int destBufferLinesize;
av_samples_alloc_array_and_samples( &destBuffer,
&destBufferLinesize,
videoInfo.audioNumChannels,
2048,
AV_SAMPLE_FMT_FLT,
0);

How to write bitmaps as frames to Ogg Theora in C\C++?

How to write bitmaps as frames to Ogg Theora in C\C++?
Some Examples with source would be grate!)
The entire solution is a little lengthy to post on here as a code sample, but if you download libtheora from Xiph.org, there is an example png2theora. All of the library functions I am about to mention can be found in the documentation on Xiph.org for theora and ogg.
Call th_info_init() to initialise a th_info structure, then set up you output parameters by assigning the appropriate members in that.
Use that structure in a call to th_encode_alloc() to get an encoder context
Initialise an ogg stream, with ogg_stream_init()
Initialise a blank th_comment structure using th_comment_init
Iterate through the following:
Call th_encode_flushheader with the the encoder context, the blank comment structure and an ogg_packet.
Send the resulting packet to the ogg stream with ogg_stream_packetin()
Until th_encode_flushheader returns 0 (or an error code)
Now, repeatedly call ogg_stream_pageout(), every time writing the page.header and then page.body to an output file, until it returns 0. Now call ogg_stream_flush and write the resulting page to the file.
You can now write frames to the encoder. Here is how I did it:
int theora_write_frame(int outputFd, unsigned long w, unsigned long h, unsigned char *yuv_y, unsigned char *yuv_u, unsigned char *yuv_v, int last)
{
th_ycbcr_buffer ycbcr;
ogg_packet op;
ogg_page og;
unsigned long yuv_w;
unsigned long yuv_h;
/* Must hold: yuv_w >= w */
yuv_w = (w + 15) & ~15;
/* Must hold: yuv_h >= h */
yuv_h = (h + 15) & ~15;
//Fill out the ycbcr buffer
ycbcr[0].width = yuv_w;
ycbcr[0].height = yuv_h;
ycbcr[0].stride = yuv_w;
ycbcr[1].width = yuv_w;
ycbcr[1].stride = ycbcr[1].width;
ycbcr[1].height = yuv_h;
ycbcr[2].width = ycbcr[1].width;
ycbcr[2].stride = ycbcr[1].stride;
ycbcr[2].height = ycbcr[1].height;
if(encoderInfo->pixel_fmt == TH_PF_420)
{
//Chroma is decimated by 2 in both directions
ycbcr[1].width = yuv_w >> 1;
ycbcr[2].width = yuv_w >> 1;
ycbcr[1].height = yuv_h >> 1;
ycbcr[2].height = yuv_h >> 1;
}else if(encoderInfo->pixel_fmt == TH_PF_422)
{
ycbcr[1].width = yuv_w >> 1;
ycbcr[2].width = yuv_w >> 1;
}else if(encoderInfo->pixel_fmt != TH_PF_422)
{
//Then we have an unknown pixel format
//We don't know how long the arrays are!
fprintf(stderr, "[theora_write_frame] Unknown pixel format in writeFrame!\n");
return -1;
}
ycbcr[0].data = yuv_y;
ycbcr[1].data = yuv_u;
ycbcr[2].data = yuv_v;
/* Theora is a one-frame-in,one-frame-out system; submit a frame
for compression and pull out the packet */
if(th_encode_ycbcr_in(encoderContext, ycbcr)) {
fprintf(stderr, "[theora_write_frame] Error: could not encode frame\n");
return -1;
}
if(!th_encode_packetout(encoderContext, last, &op)) {
fprintf(stderr, "[theora_write_frame] Error: could not read packets\n");
return -1;
}
ogg_stream_packetin(&theoraStreamState, &op);
ssize_t bytesWritten = 0;
int pagesOut = 0;
while(ogg_stream_pageout(&theoraStreamState, &og)) {
pagesOut ++;
bytesWritten = write(outputFd, og.header, og.header_len);
if(bytesWritten != og.header_len)
{
fprintf(stderr, "[theora_write_frame] Error: Could not write to file\n");
return -1;
}
bytesWritten = write(outputFd, og.body, og.body_len);
if(bytesWritten != og.body_len)
{
bytesWritten = fprintf(stderr, "[theora_write_frame] Error: Could not write to file\n");
return -1;
}
}
return pagesOut;
}
Where encoderInfo is the th_info structure used to initialise the encoder (static in the data section for me).
On your last frame, setting the last frame on th_encode_packetout() will make sure the stream terminates properly.
Once your done, just make sure to clean up (closing fds mainly). th_info_clear() will clear the th_info structure, and th_encode_free() will free your encoder context.
Obviously, you'll need to convert your bitmap into YUV planes before you can pass them to theora_write_frame().
Hope this is of some help. Good luck!
Here's the libtheora API and example code.
Here's a micro howto that shows how to use the theora binaries. As the encoder reads raw, uncompressed 'yuv4mpeg' data for video you could use that from your app, too by piping the video frames to the encoder.