SDL2 & SMPEG2 - Empty sound buffer trying to read a MP3 - c++

I'm trying to load a MP3 in a buffer using the SMPEG2 library, which comes with the SDL2. Every SMPEG function calls returns without error, but when I'm done, the sound buffer is full of zeros.
Here's the code :
bool LoadMP3(char* filename)
{
bool success = false;
const Uint32 Mp3ChunkLen = 4096;
SMPEG* mp3;
SMPEG_Info infoMP3;
Uint8 * ChunkBuffer;
Uint32 MP3Length = 0;
// Allocate a chunk buffer
ChunkBuffer = (Uint8*)malloc(Mp3ChunkLen);
SDL_RWops *mp3File = SDL_RWFromFile(filename, "rb");
if (mp3File != NULL)
{
mp3 = SMPEG_new_rwops(mp3File, &infoMP3, 1, 0);
if(mp3 != NULL)
{
if(infoMP3.has_audio)
{
Uint32 readLen;
// Inform the MP3 of the output audio specifications
SMPEG_actualSpec(mp3, &asDeviceSpecs); // static SDL_AudioSpec asDeviceSpecs; containing valid values after a call to SDL_OpenAudioDevice
// Enable the audio and disable the video.
SMPEG_enableaudio(mp3, 1);
SMPEG_enablevideo(mp3, 0);
// Play the MP3 once to get the size of the needed finale buffer
SMPEG_play(mp3);
while ((readLen = SMPEG_playAudio(mp3, ChunkBuffer, Mp3ChunkLen)) > 0)
{
MP3Length += readLen;
}
SMPEG_stop(mp3);
if(MP3Length > 0)
{
// Reallocate the buffer with the new length (if needed)
if (MP3Length != Mp3ChunkLen)
{
ChunkBuffer = (Uint8*)realloc(ChunkBuffer, MP3Length);
}
// Replay the entire MP3 into the new ChunkBuffer.
SMPEG_rewind(mp3);
SMPEG_play(mp3);
bool readBackSuccess = (MP3Length == SMPEG_playAudio(mp3, ChunkBuffer, MP3Length));
SMPEG_stop(mp3);
if(readBackSuccess)
{
// !!! Here, ChunkBuffer contains only zeros !!!
success = true;
}
}
}
SMPEG_delete(mp3);
mp3 = NULL;
}
SDL_RWclose(mp3File);
mp3File = NULL;
}
free(ChunkBuffer);
return success;
}
The code's widely based on SDL_Mixer, which I cannot use for my projet, based on its limitations.
I know Ogg Vorbis would be a better choice of file format, but I'm porting a very old project, and it worked entirely with MP3s.
I'm sure the sound system is initialized correctly because I can play WAV files just fine. It's intialized with a frequency of 44100, 2 channels, 1024 samples, and the AUDIO_S16SYS format (the latter which is, as I understood from the SMPEG source, mandatory).
I've calculated the anticipated buffer size, based on the bitrate, the amount of data in the MP3 and the OpenAudioDevice audio specs, and everything is consistent.
I cannot figure why everything but the buffer data seems to be working.
UPDATE #1
Still trying to figure out what's wrong, I thought the support for MP3 might not be working, so I created the following function :
SMPEG *mpeg;
SMPEG_Info info;
mpeg = SMPEG_new(filename,&info, 1);
SMPEG_play(mpeg);
do { SDL_Delay(50); } while(SMPEG_status(mpeg) == SMPEG_PLAYING);
SMPEG_delete(mpeg);
The MP3 played. So, the decoding should actually be working. But that's not what I need ; I really need the sound buffer data so I can send it to my mixer.

After much tinkering, research and digging through the SMPEG source code, I realized that I had to pass 1 as the SDLAudio parameter to SMPEG_new_rwops function.
The comment found in smpeg.h is misleading :
The sdl_audio parameter indicates if SMPEG should initialize the SDL audio subsystem. If not, you will have to use the SMPEG_playaudio() function below to extract the decoded data.
Since the audio subsystem was already initialized and I was using the SMPEG_playaudio() function, I had no reason to think I needed this parameter to be non-zero. In SMPEG, this parameter triggers the audio decompression at opening time, but even though I called SMPEG_enableaudio(mp3, 1); the data is never reparsed. This might be a bug/a shady feature.
I had another problem with the freesrc parameter which needed to be 0, since I freed the SDL_RWops object myself.
For future reference, once ChunkBuffer has the MP3 data, it needs to pass through SDL_BuildAudioCVT/SDL_ConvertAudio if it's to be played through an already opened audio device.
The final working code is :
// bool ReadMP3ToBuffer(char* filename)
bool success = false;
const Uint32 Mp3ChunkLen = 4096;
SDL_AudioSpec mp3Specs;
SMPEG* mp3;
SMPEG_Info infoMP3;
Uint8 * ChunkBuffer;
Uint32 MP3Length = 0;
// Allocate a chunk buffer
ChunkBuffer = (Uint8*)malloc(Mp3ChunkLen);
memset(ChunkBuffer, 0, Mp3ChunkLen);
SDL_RWops *mp3File = SDL_RWFromFile(filename, "rb"); // filename is a char* passed to the function.
if (mp3File != NULL)
{
mp3 = SMPEG_new_rwops(mp3File, &infoMP3, 0, 1);
if(mp3 != NULL)
{
if(infoMP3.has_audio)
{
Uint32 readLen;
// Get the MP3 audio specs for later conversion
SMPEG_wantedSpec(mp3, &mp3Specs);
SMPEG_enablevideo(mp3, 0);
// Play the MP3 once to get the size of the needed buffer in relation with the audio specs
SMPEG_play(mp3);
while ((readLen = SMPEG_playAudio(mp3, ChunkBuffer, Mp3ChunkLen)) > 0)
{
MP3Length += readLen;
}
SMPEG_stop(mp3);
if(MP3Length > 0)
{
// Reallocate the buffer with the new length (if needed)
if (MP3Length != Mp3ChunkLen)
{
ChunkBuffer = (Uint8*)realloc(ChunkBuffer, MP3Length);
memset(ChunkBuffer, 0, MP3Length);
}
// Replay the entire MP3 into the new ChunkBuffer.
SMPEG_rewind(mp3);
SMPEG_play(mp3);
bool readBackSuccess = (MP3Length == SMPEG_playAudio(mp3, ChunkBuffer, MP3Length));
SMPEG_stop(mp3);
if(readBackSuccess)
{
SDL_AudioCVT convertedSound;
// NOTE : static SDL_AudioSpec asDeviceSpecs; containing valid values after a call to SDL_OpenAudioDevice
if(SDL_BuildAudioCVT(&convertedSound, mp3Specs.format, mp3Specs.channels, mp3Specs.freq, asDeviceSpecs.format, asDeviceSpecs.channels, asDeviceSpecs.freq) >= 0)
{
Uint32 newBufferLen = MP3Length*convertedSound.len_mult;
// Make sure the audio length is a multiple of a sample size to avoid sound clicking
int sampleSize = ((asDeviceSpecs.format & 0xFF)/8)*asDeviceSpecs.channels;
newBufferLen &= ~(sampleSize-1);
// Allocate the new buffer and proceed with the actual conversion.
convertedSound.buf = (Uint8*)malloc(newBufferLen);
memcpy(convertedSound.buf, ChunkBuffer, MP3Length);
convertedSound.len = MP3Length;
if(SDL_ConvertAudio(&convertedSound) == 0)
{
// Save convertedSound.buf and convertedSound.len_cvt for future use in your mixer code.
// Dont forget to free convertedSound.buf once it's not used anymore.
success = true;
}
}
}
}
}
SMPEG_delete(mp3);
mp3 = NULL;
}
SDL_RWclose(mp3File);
mp3File = NULL;
}
free(ChunkBuffer);
return success;
NOTE : Some MP3 files I tried lost a few milliseconds and cutoff too early during playback when I resampled them with this code. Some others didn't. I could reproduce the same behaviour in Audacity, so I'm not sure what's going on. There may still have a bug with my code, a bug in SMPEG, or it maybe a known issue with the MP3 format itself. If someone can provide and explanation in the comments, that would be great!

Related

Decoding with OGG/Vorbis gives no sound

I'd like to play an Ogg/Vorbis audio/video file, but right now I can't get to read audio from a file.
My algorithm to read audio is:
Initialize required structures:
vorbis_info info;
vorbis_comment comment;
vorbis_dsp_state dsp;
vorbis_block block;
vorbis_info_init(&info);
vorbis_comment_init(&comment);
Read headers:
Call vorbis_synthesis_headerin(&info, &comment, packet); until it returns OV_ENOTVORBIS
vorbis_synthesis_init(&dsp, &info);
vorbis_block_init(&dsp, &block);
Pass the first non-header packet to function below
Parse packets, do it until audioReady == READY
putPacket(ogg_packet *packet) {
int ret;
ret = vorbis_synthesis(&block, packet);
if( ret == 0 ) {
ret = vorbis_synthesis_blockin(&dsp, &block);
audioReady = (ret == 0) ? READY : NOT_READY;
} else {
audioReady = NOT_READY;
}
}
Read audio data:
float** rawData = nullptr;
readSamples = vorbis_synthesis_pcmout(&dsp, &rawData);
if( readSamples == 0 ) {
audioReady = NOT_READY;
return;
}
int16_t* newData = new int16_t[readSamples * getChannels()];
int16_t* dst = newData;
for(unsigned int i=0; i<readSamples; ++i) {
for(unsigned char ch=0; ch<getChannels(); ++ch) {
*(dst++) = math::clamp<int16_t>(rawData[ch][i]*32767 + 0.5f, -32767, 32767);
}
}
audioData.push_back({readSamples * getChannels() , newData});
vorbis_synthesis_read(&dsp, static_cast<int>(readSamples));
audioReady = NOT_READY;
This is where it gets wrong: after examining the newData contents it is revealed that it contains a very silent sound. I doubt if it is the right data which means somewhere along my algorithm I did something wrong.
I tried to find some examples of similar programs, but all I got are sources with very spaghetti-like code, which seems to do the same algorithm like mine, yet they do their job. (There is one off such library: https://github.com/icculus/theoraplay )
Is there any reason why I'm getting (almost) silence in my application?
PS: If you are wondering if I might getting OGG packets wrong, then I assure you this part of my code is working right, as I'm also reading video data from the same file, using the same code and it shows the video right.
I've found it: during reading packets I assumed that one Ogg Page = one Ogg packet. I's wrong: for audio one page can contain many packets. To read it properly one has to make a code like:
do{
putPacket(&packet);
}while( ogg_stream_packetout(&state, &packet) == 1 );
I did this mistake because for video packets (which I did first) a page contains only one packet.

Sound playback using FFmpeg and libsoundio in c++

I am trying to make a video player desktop application in c++ using primarily FFmpeg and Qt6. As of for now, I can decode and play video frames correctly at the right speed, that is not a problem. I am now trying to get to playback audio, which is much harder than I expected it to be. I am using libsoundio for my audio library but the documentation is really poor and there are not many examples/tutorials on it. I am also a beginner when it comes to audio programming, although I understand the basics. First off, if anyone can recommend an audio library for this type of job let me know, but I would like to use open source libraries. Anyways, here is how I decode my audio data with FFmpeg. I'm not sure if I am doing it correctly as I could barely find documentation on that as well...
I have a struct that contains all the information which is initiated through a function:
struct VideoReader
{
bool valid;
int width, height;
int video_stream_index;
int audio_stream_index;
AVRational time_base;
AVFormatContext* av_format_ctx;
AVCodecContext* av_vi_codec_ctx;
AVCodecContext* av_au_codec_ctx;
AVPacket* packet;
AVFrame* frame;
SwsContext* sws_ctx;
SwrContext* swr_ctx;
};
The function that initiates it is quite long and is not necessary to share but it populates all those values except for the sws_ctx and the swr_ctx.
Here is how I decode packets, this function is simplified, I left the video decoding out of it, ill take care of syncing once I can properly playback audio:
bool video_reader_read_au_frame(VideoReader *video_reader, unsigned char **frame_buffer)
{
// Unpack video_reader
auto& av_format_ctx = video_reader->av_format_ctx;
auto& av_codec_ctx = video_reader->av_au_codec_ctx;
auto& av_packet = video_reader->packet;
auto& av_frame = video_reader->frame;
auto& swr_ctx = video_reader->swr_ctx;
int& audio_stream_index = video_reader->audio_stream_index;
// Decode the video frame data
int response;
while (av_read_frame(av_format_ctx, av_packet) >= 0)
{
last_frame = false;
if (av_packet->stream_index != audio_stream_index)
{
av_packet_unref(av_packet);
continue;
}
response = avcodec_send_packet(av_codec_ctx, av_packet);
if (response < 0)
{
Logger::error("Could not decode packet.");
return false;
}
response = avcodec_receive_frame(av_codec_ctx, av_frame);
if (response == AVERROR(EAGAIN) || response == AVERROR_EOF)
{
av_packet_unref(av_packet);
continue;
}
else if (response < 0)
{
Logger::error("Could not decode packet.");
return false;
}
av_packet_unref(av_packet);
break;
}
// Initialize SwrContext
if (!swr_ctx) {
swr_ctx = swr_alloc_set_opts(nullptr,
av_codec_ctx->channel_layout, AV_SAMPLE_FMT_FLT,
av_codec_ctx->sample_rate, av_codec_ctx->channel_layout,
av_codec_ctx->sample_fmt, av_codec_ctx->sample_rate,
0, nullptr);
if (!swr_ctx)
{
Logger::error("Could not create SwrContext.");
return false;
}
if (swr_init(swr_ctx) < 0)
{
Logger::error("Could not initialize SwrContext.");
return false;
}
}
const int MAX_BUFFER_SIZE = av_samples_get_buffer_size(nullptr, av_frame->channels, av_frame->nb_samples, AV_SAMPLE_FMT_FLT, 1);
*frame_buffer = (unsigned char*)av_malloc(MAX_BUFFER_SIZE);
swr_convert(swr_ctx, frame_buffer, av_frame->nb_samples,
(const unsigned char**)av_frame->data, av_frame->nb_samples);
av_frame_unref(av_frame);
return true;
}
Here is how I would normally call this function:
VideoReader vr{};
if(!video_reader_open(&vr, "C:/Path/to/file.mp4"))
{
Logger::error("Could not initialize VideoReader.");
return 1;
}
unsigned char* buffer;
if(!video_reader_read_au_frame(&vr, &buffer))
{
Logger::error("Could not read audio data.");
return 1;
}
play_audio(&buffer); <-- Find a way to play audio once buffer has data in it
video_reader_close(&vr);
return 0;
Obviously I will loop over video_reader_read_au_frame(&vr, &buffer) to playback the whole video.
I believe my code puts the samples from the decoded frame in buffer, but I am really not sure.. I am unsure as well if I need to convert to AV_SAMPLE_FMT_FLT audio format or something else or just leave it as it is. For libsoundio, I kind of understand this example: http://libsound.io/ but I'm not sure I fully understand how this library works, especially the callback function. I know I have to pass buffer in outstream->userdata as a void pointer, but I don't know how to use it in the callback function. Any help or guidance would be greatly appreciated. Note that later on in this project I might want to send this data over a network to play the video on another computer in sync.

Oboe Async Audio Extraction

I am trying to build a NDK based c++ low latancy audio player which will encounter three operations for multiple audios.
Play from assets.
Stream from an online source.
Play from local device storage.
From one of the Oboe samples provided by Google, I added another function to the class NDKExtractor.cpp to extract a URL based audio and render it to audio device while reading from source at the same time.
int32_t NDKExtractor::decode(char *file, uint8_t *targetData, AudioProperties targetProperties) {
LOGD("Using NDK decoder: %s",file);
// Extract the audio frames
AMediaExtractor *extractor = AMediaExtractor_new();
//using this method instead of AMediaExtractor_setDataSourceFd() as used for asset files in the rythem game example
media_status_t amresult = AMediaExtractor_setDataSource(extractor, file);
if (amresult != AMEDIA_OK) {
LOGE("Error setting extractor data source, err %d", amresult);
return 0;
}
// Specify our desired output format by creating it from our source
AMediaFormat *format = AMediaExtractor_getTrackFormat(extractor, 0);
int32_t sampleRate;
if (AMediaFormat_getInt32(format, AMEDIAFORMAT_KEY_SAMPLE_RATE, &sampleRate)) {
LOGD("Source sample rate %d", sampleRate);
if (sampleRate != targetProperties.sampleRate) {
LOGE("Input (%d) and output (%d) sample rates do not match. "
"NDK decoder does not support resampling.",
sampleRate,
targetProperties.sampleRate);
return 0;
}
} else {
LOGE("Failed to get sample rate");
return 0;
};
int32_t channelCount;
if (AMediaFormat_getInt32(format, AMEDIAFORMAT_KEY_CHANNEL_COUNT, &channelCount)) {
LOGD("Got channel count %d", channelCount);
if (channelCount != targetProperties.channelCount) {
LOGE("NDK decoder does not support different "
"input (%d) and output (%d) channel counts",
channelCount,
targetProperties.channelCount);
}
} else {
LOGE("Failed to get channel count");
return 0;
}
const char *formatStr = AMediaFormat_toString(format);
LOGD("Output format %s", formatStr);
const char *mimeType;
if (AMediaFormat_getString(format, AMEDIAFORMAT_KEY_MIME, &mimeType)) {
LOGD("Got mime type %s", mimeType);
} else {
LOGE("Failed to get mime type");
return 0;
}
// Obtain the correct decoder
AMediaCodec *codec = nullptr;
AMediaExtractor_selectTrack(extractor, 0);
codec = AMediaCodec_createDecoderByType(mimeType);
AMediaCodec_configure(codec, format, nullptr, nullptr, 0);
AMediaCodec_start(codec);
// DECODE
bool isExtracting = true;
bool isDecoding = true;
int32_t bytesWritten = 0;
while (isExtracting || isDecoding) {
if (isExtracting) {
// Obtain the index of the next available input buffer
ssize_t inputIndex = AMediaCodec_dequeueInputBuffer(codec, 2000);
//LOGV("Got input buffer %d", inputIndex);
// The input index acts as a status if its negative
if (inputIndex < 0) {
if (inputIndex == AMEDIACODEC_INFO_TRY_AGAIN_LATER) {
// LOGV("Codec.dequeueInputBuffer try again later");
} else {
LOGE("Codec.dequeueInputBuffer unknown error status");
}
} else {
// Obtain the actual buffer and read the encoded data into it
size_t inputSize;
uint8_t *inputBuffer = AMediaCodec_getInputBuffer(codec, inputIndex,
&inputSize);
//LOGV("Sample size is: %d", inputSize);
ssize_t sampleSize = AMediaExtractor_readSampleData(extractor, inputBuffer,
inputSize);
auto presentationTimeUs = AMediaExtractor_getSampleTime(extractor);
if (sampleSize > 0) {
// Enqueue the encoded data
AMediaCodec_queueInputBuffer(codec, inputIndex, 0, sampleSize,
presentationTimeUs,
0);
AMediaExtractor_advance(extractor);
} else {
LOGD("End of extractor data stream");
isExtracting = false;
// We need to tell the codec that we've reached the end of the stream
AMediaCodec_queueInputBuffer(codec, inputIndex, 0, 0,
presentationTimeUs,
AMEDIACODEC_BUFFER_FLAG_END_OF_STREAM);
}
}
}
if (isDecoding) {
// Dequeue the decoded data
AMediaCodecBufferInfo info;
ssize_t outputIndex = AMediaCodec_dequeueOutputBuffer(codec, &info, 0);
if (outputIndex >= 0) {
// Check whether this is set earlier
if (info.flags & AMEDIACODEC_BUFFER_FLAG_END_OF_STREAM) {
LOGD("Reached end of decoding stream");
isDecoding = false;
} else {
// Valid index, acquire buffer
size_t outputSize;
uint8_t *outputBuffer = AMediaCodec_getOutputBuffer(codec, outputIndex,
&outputSize);
/*LOGV("Got output buffer index %d, buffer size: %d, info size: %d writing to pcm index %d",
outputIndex,
outputSize,
info.size,
m_writeIndex);*/
// copy the data out of the buffer
memcpy(targetData + bytesWritten, outputBuffer, info.size);
bytesWritten += info.size;
AMediaCodec_releaseOutputBuffer(codec, outputIndex, false);
}
} else {
// The outputIndex doubles as a status return if its value is < 0
switch (outputIndex) {
case AMEDIACODEC_INFO_TRY_AGAIN_LATER:
LOGD("dequeueOutputBuffer: try again later");
break;
case AMEDIACODEC_INFO_OUTPUT_BUFFERS_CHANGED:
LOGD("dequeueOutputBuffer: output buffers changed");
break;
case AMEDIACODEC_INFO_OUTPUT_FORMAT_CHANGED:
LOGD("dequeueOutputBuffer: output outputFormat changed");
format = AMediaCodec_getOutputFormat(codec);
LOGD("outputFormat changed to: %s", AMediaFormat_toString(format));
break;
}
}
}
}
// Clean up
AMediaFormat_delete(format);
AMediaCodec_delete(codec);
AMediaExtractor_delete(extractor);
return bytesWritten;
}
Now the problem i am facing is that this code it first extracts all the audio data saves it into a buffer which then becomes part of AFileDataSource which i derived from DataSource class in the same sample.
And after its done extracting the whole file it plays by calling the onAudioReady() for Oboe AudioStreamBuilder.
What I need is to play as it streams the chunk of audio buffer.
Optional Query: Also aside from the question it blocks the UI even though i created a foreground service to communicate with the NDK functions to execute this code. Any thoughts on this?
You probably solved this already, but for future readers...
You need a FIFO buffer to store the decoded audio. You can use the Oboe's FIFO buffer e.g. oboe::FifoBuffer.
You can have a low/high watermark for the buffer and a state machine, so you start decoding when the buffer is almost empty and you stop decoding when it's full (you'll figure out the other states that you need).
As a side note, I implemented such player only to find at some later time, that the AAC codec is broken on some devices (Xiaomi and Amazon come to mind), so I had to throw away the AMediaCodec/AMediaExtractor parts and use an AAC library instead.
You have to implement a ringBuffer (or use the one implemented in the oboe example LockFreeQueue.h) and copy the data on buffers that you send on the ringbuffer from the extracting thread. On the other end of the RingBuffer, the audio thread will get that data from the queue and copy it to the audio buffer. This will happen on onAudioReady(oboe::AudioStream *oboeStream, void *audioData, int32_t numFrames) callback that you have to implement in your class (look oboe docs). Be sure to follow all the good practices on the Audio thread (don't allocate/deallocate memory there, no mutexes and no file I/O etc.)
Optional query: A service doesn't run in a separate thread, so obviously if you call it from UI thread it blocks the UI. Look at other types of services, there you can have IntentService or a service with a Messenger that will launch a separate thread on Java, or you can create threads in C++ side using std::thread

How to read YUV8 data from avi file?

I have avi file that contains uncompressed gray video data. I need to extract frames from it. The size of file is 22 Gb.
How do i do that?
I have already tried ffmpeg, but it gives me "could not find codec parameters for video stream" message - because there is no codec at work, just frames.
Since Opencv just uses ffmpeg to read video, that rules out opencv as well.
The only path that seems to be left is to try and dig into the raw data, but i do not know how.
Edit: this is the code i use to read from the file with opencv. The failure occurs inside the second if. Running ffmpeg binary on the file also fails with the message above (could not find codec aprameters etc)
/* register all formats and codecs */
av_register_all();
/* open input file, and allocate format context */
if (avformat_open_input(&fmt_ctx, src_filename, NULL, NULL) < 0) {
fprintf(stderr, "Could not open source file %s\n", src_filename);
ret = 1;
goto end;
}
fmt_ctx->seek2any = true;
/* retrieve stream information */
int res = avformat_find_stream_info(fmt_ctx, NULL);
if (res < 0) {
fprintf(stderr, "Could not find stream information\n");
ret = 1;
goto end;
}
Edit:
Here is sample code i have tried to make the extraction: pastebin. The result i get is an unchanging buffer after every call to AVIStreamRead.
If you do not need cross platform functionality Video for Windows (VFW) API is a good alternative (http://msdn.microsoft.com/en-us/library/windows/desktop/dd756808(v=vs.85).aspx), i will not put an entire code block, since there's quite much to do, but you should be able to figure it out from the reference link. Basically, you do a AVIFileOpen, then get the video stream via AVIFileGetStream with streamtypeVIDEO, or alternatively do it at once with AVIStreamOpenFromFile and then read samples from the stream with AVIStreamRead. If you get to a point where you fail I can try to help, but it should be pretty straightforward.
Also, not sure why ffmpeg is failing, I have been doing raw AVI reading with ffmpeg without any codecs involved, can you post what call to ffpeg actually fails?
EDIT:
For the issue that you are seeing when the read data size is 0. The AVI file has N slots for frames in each second where N is the fps of the video. In real life the samples won't come exactly at that speed (e.g. IP surveillance cameras) so the actual data sample indexes can be non continuous like 1,5,11,... and VFW would insert empty samples between them (that is from where you read a sample with a zero size). What you have to do is call AVIStreamRead with NULL as buffer and 0 as size until the bRead is not 0 or you run past last sample. When you get an actual size, then you can again call AVIStreamRead on that sample index with the buffer pointer and size. I usually do compressed video so i don't use the suggested size, but at least according to your code snipplet I would do something like this:
...
bRead = 0;
do
{
aviOpRes = AVIStreamRead(ppavi,smpS,1,NULL,0,&bRead,&smpN);
} while (bRead == 0 && ++smpS < si.dwLength + si.dwStart);
if(smpS >= si.dwLength + si.dwStart)
break;
PUCHAR tempBuffer = new UCHAR[bRead];
aviOpRes = AVIStreamRead(ppavi,smpS,1,tempBuffer,bRead,&bRead,&smpN);
/* do whatever you need */
delete tempBuffer;
...
EDIT 2:
Since this may come in handy to someone or yourself to make a choice between VFW and FFMPEG I also updated your FFMPEG example so that it parsed the same file (sorry for the code quality since it lacks error checking but i guess you can see the logical flow):
/* register all formats and codecs */
av_register_all();
AVFormatContext* fmt_ctx = NULL;
/* open input file, and allocate format context */
const char *src_filename = "E:\\Output.avi";
if (avformat_open_input(&fmt_ctx, src_filename, NULL, NULL) < 0) {
fprintf(stderr, "Could not open source file %s\n", src_filename);
abort();
}
/* retrieve stream information */
int res = avformat_find_stream_info(fmt_ctx, NULL);
if (res < 0) {
fprintf(stderr, "Could not find stream information\n");
abort();
}
int video_stream_index = 0; /* video stream is usualy 0 but still better to lookup in case it's not present */
for(; video_stream_index < fmt_ctx->nb_streams; ++video_stream_index)
{
if(fmt_ctx->streams[video_stream_index]->codec->codec_type == AVMEDIA_TYPE_VIDEO)
break;
}
if(video_stream_index == fmt_ctx->nb_streams)
abort();
AVPacket *packet = new AVPacket;
while(av_read_frame(fmt_ctx, packet) == 0)
{
if (packet->stream_index == video_stream_index)
printf("Sample nr %d\n", packet->pts);
av_free_packet(packet);
}
Basically you open the context and read packets from it. You will get both audio and video packets so you should check if the packet belongs to the stream of interest. FFMPEG will save you the trouble with empty frames and give only those samples that have data in them.

how do i create a stereo mp3 file with latest version of ffmpeg?

I'm updating my code from the older version of ffmpeg (53) to the newer (54/55). Code that did work has now been deprecated or removed so i'm having problems updating it.
Previously I could create a stereo MP3 file using a sample format called:
SAMPLE_FMT_S16
That matched up perfectly with my source stream. This has now been replace with
AV_SAMPLE_FMT_S16
Which works fine for mono recordings but when I try to create a stereo MP3 file it bugs out at avcodec_open2 with:
"Specified sample_fmt is not supported."
Through trial and error I've found that using
AV_SAMPLE_FMT_S16P
...is accepted by avcodec_open2 but when I get through and create the MP3 file the sound is very distorted - it sounds about 2 octaves lower than usual with a massive hum in the background - here's an example recording:
http://hosting.ispyconnect.com/example.mp3
I've been told by the ffmpeg guys that this is because I now need to manually deinterleave my byte stream before calling:
avcodec_fill_audio_frame
How do I do that? I've tried using the swrescale library without success and i've tried manually feeding in L/R data into avcodec_fill_audio_frame but the results i'm getting are sounding exactly the same as without interleaving.
Here is my code for encoding:
void add_audio_sample( AudioWriterPrivateData^ data, BYTE* soundBuffer, int soundBufferSize)
{
libffmpeg::AVCodecContext* c = data->AudioStream->codec;
memcpy(data->AudioBuffer + data->AudioBufferSizeCurrent, soundBuffer, soundBufferSize);
data->AudioBufferSizeCurrent += soundBufferSize;
uint8_t* pSoundBuffer = (uint8_t *)data->AudioBuffer;
DWORD nCurrentSize = data->AudioBufferSizeCurrent;
libffmpeg::AVFrame *frame;
int got_packet;
int ret;
int size = libffmpeg::av_samples_get_buffer_size(NULL, c->channels,
data->AudioInputSampleSize,
c->sample_fmt, 1);
while( nCurrentSize >= size) {
frame=libffmpeg::avcodec_alloc_frame();
libffmpeg::avcodec_get_frame_defaults(frame);
frame->nb_samples = data->AudioInputSampleSize;
ret = libffmpeg::avcodec_fill_audio_frame(frame, c->channels, c->sample_fmt, pSoundBuffer, size, 1);
if (ret<0)
{
throw gcnew System::IO::IOException("error filling audio");
}
//audio_pts = (double)audio_st->pts.val * audio_st->time_base.num / audio_st->time_base.den;
libffmpeg::AVPacket pkt = { 0 };
libffmpeg::av_init_packet(&pkt);
ret = libffmpeg::avcodec_encode_audio2(c, &pkt, frame, &got_packet);
if (ret<0)
throw gcnew System::IO::IOException("error encoding audio");
if (got_packet) {
pkt.stream_index = data->AudioStream->index;
if (pkt.pts != AV_NOPTS_VALUE)
pkt.pts = libffmpeg::av_rescale_q(pkt.pts, c->time_base, c->time_base);
if (pkt.duration > 0)
pkt.duration = av_rescale_q(pkt.duration, c->time_base, c->time_base);
pkt.flags |= AV_PKT_FLAG_KEY;
if (libffmpeg::av_interleaved_write_frame(data->FormatContext, &pkt) != 0)
throw gcnew System::IO::IOException("unable to write audio frame.");
}
nCurrentSize -= size;
pSoundBuffer += size;
}
memcpy(data->AudioBuffer, data->AudioBuffer + data->AudioBufferSizeCurrent - nCurrentSize, nCurrentSize);
data->AudioBufferSizeCurrent = nCurrentSize;
}
Would love to hear any ideas - I've been trying to get this working for 3 days now :(
you don't want to increase pSoundBuffer if a frame hasn't been fully encoded (e.g. got_packet isn't set to true) as no memory has been written yet. Also, you are allocating a frame during each loop: there's no need for that, you can re-use the same AVFrame over an over. Your code is also leaking as you never free the AVFrame.
I wrote a code as part of MythTV that encode audio to AC3.
This also do what you were looking for: deinterleave the content.
https://github.com/MythTV/mythtv/blob/476b2a826d43fca5e658ebe787c3cb1ec2334f98/mythtv/libs/libmyth/audio/audiooutputdigitalencoder.cpp#L178
I know this question is old, but for posterity: I'm working on some audio resampling code, and after I arrived at an audio sounding very similar to the mp3 the author linked, I identified the cause as being a mismatch in audio sampling rate between the input the resampler expects and the actual data.