How can I perform a realtime FFT on SDL2 audio stream data - c++

I am trying to create a music visualiser in C++ using SDL2 and FFTW3.
My aim is to load a .wav audio file and then simultaneously play the audio and perform a realtime Fast Fourier Transform using an SDL2 Callback function.
I want to get the frequency spectrum data so that I can implement the graphical visualiser at a later date.
I followed an SDL YouTube guide on loading the .wav and playing the audio using the callback function, but I don't understand how to perform the FFT on this data. I followed yet another guide on using FFTW and SDL with C to produce a similar effect but I'm still unsure how to actually implement it.
Uint8* sampData;
SDL_AudioSpec wavSpec;
Uint8* wavStart;
Uint32 wavLength;
SDL_AudioDeviceID aDevice;
struct AudioData {
Uint8* filePosition;
Uint32 fileLength;
};
void PlayAudioCallback(void* userData, Uint8* stream, int streamLength) {
AudioData* audio = (AudioData*)userData;
sampData = new Uint8;
if (audio->fileLength == 0) {
return;
}
Uint32 length = (Uint32)streamLength;
length = (length > audio->fileLength ? audio->fileLength : length);
SDL_memcpy(stream, audio->filePosition, length);
// HERE is where i'd like to implement the FFT on 'stream' data
// but i don't know how to implement this using FFTW
audio->filePosition += length;
audio->fileLength -= length;
}
int main() {
SDL_Init(SDL_INIT_AUDIO);
// Load .wav file
if (SDL_LoadWAV(FILE_PATH, &wavSpec, &wavStart, &wavLength) == NULL) {
cerr << "Couldnt load file: " << FILE_PATH << endl;
getchar();
}
cout << "Loaded " << FILE_PATH << endl;
AudioData audio;
audio.filePosition = wavStart;
audio.fileLength = wavLength;
wavSpec.callback = PlayAudioCallback;
wavSpec.userdata = &audio;
// Open audio playback endpoint
aDevice = SDL_OpenAudioDevice(NULL, 0, &wavSpec, NULL, SDL_AUDIO_ALLOW_ANY_CHANGE);
if (aDevice == 0) {
cerr << "Audio Device connection failed: " << SDL_GetError() << endl;
getchar();
}
// Play audio on playback endpoint
SDL_PauseAudioDevice(aDevice, 0);
// Do nothing while there's still data to be played
while (audio.fileLength > 0) {
SDL_Delay(100);
}
}
From previous experience I used NumPy to unpack .wav data into a NumPy array, before sending it the built-in NumPy-FFT function, but I'm clueless on what to do with the SDL stream data that I have here.

What you want is the short term FFT. You collect a buffer of samples from your stream and apply a window function to the samples before performing the FFT. You then collect another buffer, keeping some samples from the first buffer and appending new samples. Repeat until all the data has been processed.
Since your input data is real, the FFT is symmetric, hence you only want the first N/2+1 complex output bins. These represent frequencies from d.c. to Fs/2. Take their magnitudes and plot. Repeat for each FFT.

Related

SDL only making beeping noises instead of the actual audio file

I recently managed to get past the errors of using SDL for sound.
Now that it's running and I'm not running into errors, my program is only playing beeping noises instead of the file I've provided.
I want the program to play the .wav file I'm passing to the SDL_LoadWAV.
I've tried with two different .wav files of different length and size, and checked the header files to find comments and tips on what format is required for the SDL to play the .wav file, haven't gotten anywhere with either of it.
The myAudioCallback function is responsible for handling the SDL callback.
void myAudioCallback(void* userdata, Uint8* stream, int len)
{
AudioData* audio = (AudioData*)userdata;
if (audio->length == 0)
return;
Uint32 length = (Uint32)len;
length = (length > audio->length ? audio->length : length); // if length is more than the audio length, then set length to be the audio.length, if not, set it to be the length passed to the function
std::cout << "Audio Length " << audio->length << std::endl;
std::cout << "Audio Position " << audio->position << std::endl;
SDL_memcpy(stream, audio->position, length); // audio callback is called by SDL, this ensures that the stream and data that is sent, is copied over to our struct, so we can use it and manipulate it
audio->position += length;
audio->length -= length;
}
My loadAudio function is responsible for loading the audio file and saving information about the audio file to the various variables I've declared in the .h (see further down for my .h)
void mainEngineCW4::loadAudio() // this function is for the sole purpose of loading the .wav file
{
SDL_Init(SDL_INIT_AUDIO); // loads the SDL to initialise audio
char* audioFile = "backgroundmusic.wav"; // a char pointer for the file path
// LoadWAV loads the wav file, and by putting it in an if statement like this, we can simutaneously check if the result is null, meaning an error occured while loading it.
if (SDL_LoadWAV(audioFile, &wavSpec, &wavStart, &wavLength) == NULL)
std::cerr << "Error: file could not be loaded as an audio file." << std::endl;
else
std::cout << audioFile << " loaded" << std::endl;
}
The playAudio function is responsible for loading the audio device and playing the audio through the Audio device
void mainEngineCW4::playAudio() // this function is for loading an audio device, and playing the audio through that device
{
audio.position = wavStart; // define where we start in the audio file
audio.length = wavLength; // define the length of the audio file
wavSpec.callback = myAudioCallback; // the callback variable needs a function that its going to run to be able to call back the audio that is played. assigning the function name to the variable allows it to call that function when needed
wavSpec.userdata = &audio; // the userdata is the audio itself
audioDevice = SDL_OpenAudioDevice(NULL, 0, &wavSpec, NULL, SDL_AUDIO_ALLOW_ANY_CHANGE); // opens the audio device, also having it play the audio information found at memory address for wavspec
if (audioDevice == 0) {
std::cerr << SDL_GetError() << std::endl;
return; }
SDL_PauseAudioDevice(audioDevice, 0); // mildly confused by why they decided to call the function for starting to play audio for "PauseAudioDevice" but yeah. this plays audio.
}
Here's my .h. I've defined myAudioCallback outside of the class, since SDL doesn't like the additional hidden parameter of a member function
struct AudioData
{
Uint8* position;
Uint32 length;
};
void myAudioCallback(void* userdata, Uint8* stream, int len);
class mainEngineCW4 :
public BaseEngine
{
public:
void loadAudio();
void playAudio();
void endAudio();
private:
// variables and pointers for audio information
AudioData audio;
SDL_AudioSpec wavSpec;
SDL_AudioDeviceID audioDevice;
Uint8* wavStart;
Uint32 wavLength;
};
I've removed the other functions and variables that are irrelevant to the issue I'm having
My problem is that I want my program to play the audio file I pass in, not just beeping noises.
Any help is greatly appreciated
EDIT: I realised I'm crap at providing info and explanation to things, so I edited in more information, explanation and the header file. If there is anything else I can provide, please let me know
With the help of a friend, I managed to fix the issue.
It seems like SDL didn't like it when I passed
SDL_OpenAudioDevice(NULL, 0, &wavSpec, NULL, SDL_AUDIO_ALLOW_ANY_CHANGE);
so instead, I passed
SDL_OpenAudioDevice(NULL, 0, &wavSpec, NULL, 0);
instead. This made the file play nicely, as well as having additional variables for the length and position of the audio file.
Another issue I ran into even after I got the file playing, was that the beeping was still playing with the audio file. I wasn't able to fix this directly myself, instead, when I cleaned the solution the day after, the beeping was gone, and the only thing playing was the audio file.
I've attached the code that works below.
In addition to the struct I created in the .h
struct AudioData
{
Uint8* position;
Uint32 length;
};
Defining the audio_position and audio_length as a global variable also helped in copying over information in the audio callback function.
static Uint8* audio_position;
static Uint32 audio_length;
void myAudioCallback(void* userdata, Uint8* stream, int len)
{
if (audio_length == 0)
return;
len = (len > audio_length ? audio_length : len); // if length is more than the audio length, then set length to be the audio.length, if not, set it to be the length passed to the function
SDL_memcpy(stream, audio_position, len); // audio callback is called by SDL, this ensures that the stream and data that is sent, is copied over to our struct, so we can use it and manipulate it
audio_position += len;
audio_length -= len;
}
For the load audio, I made sure that I actually load all the information that would be considered "loading", including storing the AudioSpec callback function, and setting the length and position of the audio file.
void mainEngineCW4::loadAudio() // this function is for the sole purpose of loading the .wav file
{
if (SDL_Init(SDL_INIT_AUDIO) < 0 || audioPlaying == true) // loads the SDL to initialise audio
return;
char* filePath = "backgroundmusic.wav"; // a char pointer for the file path
// LoadWAV loads the wav file, and by putting it in an if statement like this, we can simutaneously check if the result is null, meaning an error occured while loading it.
if (SDL_LoadWAV(filePath, &desiredSpec, &wavStart, &wavLength) == NULL)
std::cerr << "Error: file could not be loaded as an audio file." << std::endl;
else
std::cout << filePath << " loaded" << std::endl;
desiredSpec.callback = myAudioCallback; // the callback variable needs a function that its going to run to be able to call back the audio that is played. assigning the function name to the variable allows it to call that function when needed
desiredSpec.userdata = &audioInfo; // the userdata is the audio itself
audio_position = wavStart; // define where we start in the audio file
audio_length = wavLength; // define the length of the audio file
}
I also added a boolean to the class, so that when this returns true, it means that the audio has already been playing or has already been loaded, as to ensure SDL won't play the same thing simultaneously.
void mainEngineCW4::playAudio() // this function is for loading an audio device, and playing the audio through that device
{
if (audioPlaying == true)
return;
audioDevice = SDL_OpenAudioDevice(NULL, 0, &desiredSpec, NULL, 0); // opens the audio device, also having it play the audio information found at memory address for wavspec
if (audioDevice == 0)
{
std::cerr << SDL_GetError() << std::endl;
return;
}
SDL_PauseAudioDevice(audioDevice, 0); // mildly confused by why they decided to call the function for starting to play audio for "PauseAudioDevice" but yeah. this plays audio.
audioPlaying = true;
}
void mainEngineCW4::endAudio()
{
SDL_CloseAudioDevice(audioDevice);
SDL_FreeWAV(wavStart);
audioPlaying = false;
}

Oboe Async Audio Extraction

I am trying to build a NDK based c++ low latancy audio player which will encounter three operations for multiple audios.
Play from assets.
Stream from an online source.
Play from local device storage.
From one of the Oboe samples provided by Google, I added another function to the class NDKExtractor.cpp to extract a URL based audio and render it to audio device while reading from source at the same time.
int32_t NDKExtractor::decode(char *file, uint8_t *targetData, AudioProperties targetProperties) {
LOGD("Using NDK decoder: %s",file);
// Extract the audio frames
AMediaExtractor *extractor = AMediaExtractor_new();
//using this method instead of AMediaExtractor_setDataSourceFd() as used for asset files in the rythem game example
media_status_t amresult = AMediaExtractor_setDataSource(extractor, file);
if (amresult != AMEDIA_OK) {
LOGE("Error setting extractor data source, err %d", amresult);
return 0;
}
// Specify our desired output format by creating it from our source
AMediaFormat *format = AMediaExtractor_getTrackFormat(extractor, 0);
int32_t sampleRate;
if (AMediaFormat_getInt32(format, AMEDIAFORMAT_KEY_SAMPLE_RATE, &sampleRate)) {
LOGD("Source sample rate %d", sampleRate);
if (sampleRate != targetProperties.sampleRate) {
LOGE("Input (%d) and output (%d) sample rates do not match. "
"NDK decoder does not support resampling.",
sampleRate,
targetProperties.sampleRate);
return 0;
}
} else {
LOGE("Failed to get sample rate");
return 0;
};
int32_t channelCount;
if (AMediaFormat_getInt32(format, AMEDIAFORMAT_KEY_CHANNEL_COUNT, &channelCount)) {
LOGD("Got channel count %d", channelCount);
if (channelCount != targetProperties.channelCount) {
LOGE("NDK decoder does not support different "
"input (%d) and output (%d) channel counts",
channelCount,
targetProperties.channelCount);
}
} else {
LOGE("Failed to get channel count");
return 0;
}
const char *formatStr = AMediaFormat_toString(format);
LOGD("Output format %s", formatStr);
const char *mimeType;
if (AMediaFormat_getString(format, AMEDIAFORMAT_KEY_MIME, &mimeType)) {
LOGD("Got mime type %s", mimeType);
} else {
LOGE("Failed to get mime type");
return 0;
}
// Obtain the correct decoder
AMediaCodec *codec = nullptr;
AMediaExtractor_selectTrack(extractor, 0);
codec = AMediaCodec_createDecoderByType(mimeType);
AMediaCodec_configure(codec, format, nullptr, nullptr, 0);
AMediaCodec_start(codec);
// DECODE
bool isExtracting = true;
bool isDecoding = true;
int32_t bytesWritten = 0;
while (isExtracting || isDecoding) {
if (isExtracting) {
// Obtain the index of the next available input buffer
ssize_t inputIndex = AMediaCodec_dequeueInputBuffer(codec, 2000);
//LOGV("Got input buffer %d", inputIndex);
// The input index acts as a status if its negative
if (inputIndex < 0) {
if (inputIndex == AMEDIACODEC_INFO_TRY_AGAIN_LATER) {
// LOGV("Codec.dequeueInputBuffer try again later");
} else {
LOGE("Codec.dequeueInputBuffer unknown error status");
}
} else {
// Obtain the actual buffer and read the encoded data into it
size_t inputSize;
uint8_t *inputBuffer = AMediaCodec_getInputBuffer(codec, inputIndex,
&inputSize);
//LOGV("Sample size is: %d", inputSize);
ssize_t sampleSize = AMediaExtractor_readSampleData(extractor, inputBuffer,
inputSize);
auto presentationTimeUs = AMediaExtractor_getSampleTime(extractor);
if (sampleSize > 0) {
// Enqueue the encoded data
AMediaCodec_queueInputBuffer(codec, inputIndex, 0, sampleSize,
presentationTimeUs,
0);
AMediaExtractor_advance(extractor);
} else {
LOGD("End of extractor data stream");
isExtracting = false;
// We need to tell the codec that we've reached the end of the stream
AMediaCodec_queueInputBuffer(codec, inputIndex, 0, 0,
presentationTimeUs,
AMEDIACODEC_BUFFER_FLAG_END_OF_STREAM);
}
}
}
if (isDecoding) {
// Dequeue the decoded data
AMediaCodecBufferInfo info;
ssize_t outputIndex = AMediaCodec_dequeueOutputBuffer(codec, &info, 0);
if (outputIndex >= 0) {
// Check whether this is set earlier
if (info.flags & AMEDIACODEC_BUFFER_FLAG_END_OF_STREAM) {
LOGD("Reached end of decoding stream");
isDecoding = false;
} else {
// Valid index, acquire buffer
size_t outputSize;
uint8_t *outputBuffer = AMediaCodec_getOutputBuffer(codec, outputIndex,
&outputSize);
/*LOGV("Got output buffer index %d, buffer size: %d, info size: %d writing to pcm index %d",
outputIndex,
outputSize,
info.size,
m_writeIndex);*/
// copy the data out of the buffer
memcpy(targetData + bytesWritten, outputBuffer, info.size);
bytesWritten += info.size;
AMediaCodec_releaseOutputBuffer(codec, outputIndex, false);
}
} else {
// The outputIndex doubles as a status return if its value is < 0
switch (outputIndex) {
case AMEDIACODEC_INFO_TRY_AGAIN_LATER:
LOGD("dequeueOutputBuffer: try again later");
break;
case AMEDIACODEC_INFO_OUTPUT_BUFFERS_CHANGED:
LOGD("dequeueOutputBuffer: output buffers changed");
break;
case AMEDIACODEC_INFO_OUTPUT_FORMAT_CHANGED:
LOGD("dequeueOutputBuffer: output outputFormat changed");
format = AMediaCodec_getOutputFormat(codec);
LOGD("outputFormat changed to: %s", AMediaFormat_toString(format));
break;
}
}
}
}
// Clean up
AMediaFormat_delete(format);
AMediaCodec_delete(codec);
AMediaExtractor_delete(extractor);
return bytesWritten;
}
Now the problem i am facing is that this code it first extracts all the audio data saves it into a buffer which then becomes part of AFileDataSource which i derived from DataSource class in the same sample.
And after its done extracting the whole file it plays by calling the onAudioReady() for Oboe AudioStreamBuilder.
What I need is to play as it streams the chunk of audio buffer.
Optional Query: Also aside from the question it blocks the UI even though i created a foreground service to communicate with the NDK functions to execute this code. Any thoughts on this?
You probably solved this already, but for future readers...
You need a FIFO buffer to store the decoded audio. You can use the Oboe's FIFO buffer e.g. oboe::FifoBuffer.
You can have a low/high watermark for the buffer and a state machine, so you start decoding when the buffer is almost empty and you stop decoding when it's full (you'll figure out the other states that you need).
As a side note, I implemented such player only to find at some later time, that the AAC codec is broken on some devices (Xiaomi and Amazon come to mind), so I had to throw away the AMediaCodec/AMediaExtractor parts and use an AAC library instead.
You have to implement a ringBuffer (or use the one implemented in the oboe example LockFreeQueue.h) and copy the data on buffers that you send on the ringbuffer from the extracting thread. On the other end of the RingBuffer, the audio thread will get that data from the queue and copy it to the audio buffer. This will happen on onAudioReady(oboe::AudioStream *oboeStream, void *audioData, int32_t numFrames) callback that you have to implement in your class (look oboe docs). Be sure to follow all the good practices on the Audio thread (don't allocate/deallocate memory there, no mutexes and no file I/O etc.)
Optional query: A service doesn't run in a separate thread, so obviously if you call it from UI thread it blocks the UI. Look at other types of services, there you can have IntentService or a service with a Messenger that will launch a separate thread on Java, or you can create threads in C++ side using std::thread

Fragmented MP4 - problem playing in browser

I try to create fragmented MP4 from raw H264 video data so I could play it in internet browser's player. My goal is to create live streaming system, where media server would send fragmented MP4 pieces to browser. The server would buffer input data from RaspberryPi camera, which sends video as H264 frames. It would then mux that video data and make it available for client. The browser would play media data (that were muxed by server and sent i.e. through websocket) by using Media Source Extensions.
For test purpose I wrote the following pieces of code (using many examples I found in the intenet):
C++ application using avcodec which muxes raw H264 video to fragmented MP4 and saves it to a file:
#define READBUFSIZE 4096
#define IOBUFSIZE 4096
#define ERRMSGSIZE 128
#include <cstdint>
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
extern "C"
{
#include <libavformat/avformat.h>
#include <libavutil/error.h>
#include <libavutil/opt.h>
}
enum NalType : uint8_t
{
//NALs containing stream metadata
SEQ_PARAM_SET = 0x7,
PIC_PARAM_SET = 0x8
};
std::vector<uint8_t> outputData;
int mediaMuxCallback(void *opaque, uint8_t *buf, int bufSize)
{
outputData.insert(outputData.end(), buf, buf + bufSize);
return bufSize;
}
std::string getAvErrorString(int errNr)
{
char errMsg[ERRMSGSIZE];
av_strerror(errNr, errMsg, ERRMSGSIZE);
return std::string(errMsg);
}
int main(int argc, char **argv)
{
if(argc < 2)
{
std::cout << "Missing file name" << std::endl;
return 1;
}
std::fstream file(argv[1], std::ios::in | std::ios::binary);
if(!file.is_open())
{
std::cout << "Couldn't open file " << argv[1] << std::endl;
return 2;
}
std::vector<uint8_t> inputMediaData;
do
{
char buf[READBUFSIZE];
file.read(buf, READBUFSIZE);
int size = file.gcount();
if(size > 0)
inputMediaData.insert(inputMediaData.end(), buf, buf + size);
} while(!file.eof());
file.close();
//Initialize avcodec
av_register_all();
uint8_t *ioBuffer;
AVCodec *codec = avcodec_find_decoder(AV_CODEC_ID_H264);
AVCodecContext *codecCtxt = avcodec_alloc_context3(codec);
AVCodecParserContext *parserCtxt = av_parser_init(AV_CODEC_ID_H264);
AVOutputFormat *outputFormat = av_guess_format("mp4", nullptr, nullptr);
AVFormatContext *formatCtxt;
AVIOContext *ioCtxt;
AVStream *videoStream;
int res = avformat_alloc_output_context2(&formatCtxt, outputFormat, nullptr, nullptr);
if(res < 0)
{
std::cout << "Couldn't initialize format context; the error was: " << getAvErrorString(res) << std::endl;
return 3;
}
if((videoStream = avformat_new_stream( formatCtxt, avcodec_find_encoder(formatCtxt->oformat->video_codec) )) == nullptr)
{
std::cout << "Couldn't initialize video stream" << std::endl;
return 4;
}
else if(!codec)
{
std::cout << "Couldn't initialize codec" << std::endl;
return 5;
}
else if(codecCtxt == nullptr)
{
std::cout << "Couldn't initialize codec context" << std::endl;
return 6;
}
else if(parserCtxt == nullptr)
{
std::cout << "Couldn't initialize parser context" << std::endl;
return 7;
}
else if((ioBuffer = (uint8_t*)av_malloc(IOBUFSIZE)) == nullptr)
{
std::cout << "Couldn't allocate I/O buffer" << std::endl;
return 8;
}
else if((ioCtxt = avio_alloc_context(ioBuffer, IOBUFSIZE, 1, nullptr, nullptr, mediaMuxCallback, nullptr)) == nullptr)
{
std::cout << "Couldn't initialize I/O context" << std::endl;
return 9;
}
//Set video stream data
videoStream->id = formatCtxt->nb_streams - 1;
videoStream->codec->width = 1280;
videoStream->codec->height = 720;
videoStream->time_base.den = 60; //FPS
videoStream->time_base.num = 1;
videoStream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
formatCtxt->pb = ioCtxt;
//Retrieve SPS and PPS for codec extdata
const uint32_t synchMarker = 0x01000000;
unsigned int i = 0;
int spsStart = -1, ppsStart = -1;
uint16_t spsSize = 0, ppsSize = 0;
while(spsSize == 0 || ppsSize == 0)
{
uint32_t *curr = (uint32_t*)(inputMediaData.data() + i);
if(*curr == synchMarker)
{
unsigned int currentNalStart = i;
i += sizeof(uint32_t);
uint8_t nalType = inputMediaData.data()[i] & 0x1F;
if(nalType == SEQ_PARAM_SET)
spsStart = currentNalStart;
else if(nalType == PIC_PARAM_SET)
ppsStart = currentNalStart;
if(spsStart >= 0 && spsSize == 0 && spsStart != i)
spsSize = currentNalStart - spsStart;
else if(ppsStart >= 0 && ppsSize == 0 && ppsStart != i)
ppsSize = currentNalStart - ppsStart;
}
++i;
}
videoStream->codec->extradata = inputMediaData.data() + spsStart;
videoStream->codec->extradata_size = ppsStart + ppsSize;
//Write main header
AVDictionary *options = nullptr;
av_dict_set(&options, "movflags", "frag_custom+empty_moov", 0);
res = avformat_write_header(formatCtxt, &options);
if(res < 0)
{
std::cout << "Couldn't write container main header; the error was: " << getAvErrorString(res) << std::endl;
return 10;
}
//Retrieve frames from input video and wrap them in container
int currentInputIndex = 0;
int framesInSecond = 0;
while(currentInputIndex < inputMediaData.size())
{
uint8_t *frameBuffer;
int frameSize;
res = av_parser_parse2(parserCtxt, codecCtxt, &frameBuffer, &frameSize, inputMediaData.data() + currentInputIndex,
inputMediaData.size() - currentInputIndex, AV_NOPTS_VALUE, AV_NOPTS_VALUE, 0);
if(frameSize == 0) //No more frames while some data still remains (is that even possible?)
{
std::cout << "Some data left unparsed: " << std::to_string(inputMediaData.size() - currentInputIndex) << std::endl;
break;
}
//Prepare packet with video frame to be dumped into container
AVPacket packet;
av_init_packet(&packet);
packet.data = frameBuffer;
packet.size = frameSize;
packet.stream_index = videoStream->index;
currentInputIndex += frameSize;
//Write packet to the video stream
res = av_write_frame(formatCtxt, &packet);
if(res < 0)
{
std::cout << "Couldn't write packet with video frame; the error was: " << getAvErrorString(res) << std::endl;
return 11;
}
if(++framesInSecond == 60) //We want 1 segment per second
{
framesInSecond = 0;
res = av_write_frame(formatCtxt, nullptr); //Flush segment
}
}
res = av_write_frame(formatCtxt, nullptr); //Flush if something has been left
//Write media data in container to file
file.open("my_mp4.mp4", std::ios::out | std::ios::binary);
if(!file.is_open())
{
std::cout << "Couldn't open output file " << std::endl;
return 12;
}
file.write((char*)outputData.data(), outputData.size());
if(file.fail())
{
std::cout << "Couldn't write to file" << std::endl;
return 13;
}
std::cout << "Media file muxed successfully" << std::endl;
return 0;
}
(I hardcoded a few values, such as video dimensions or framerate, but as I said this is just a test code.)
Simple HTML webpage using MSE to play my fragmented MP4
<!DOCTYPE html>
<html>
<head>
<title>Test strumienia</title>
</head>
<body>
<video width="1280" height="720" controls>
</video>
</body>
<script>
var vidElement = document.querySelector('video');
if (window.MediaSource) {
var mediaSource = new MediaSource();
vidElement.src = URL.createObjectURL(mediaSource);
mediaSource.addEventListener('sourceopen', sourceOpen);
} else {
console.log("The Media Source Extensions API is not supported.")
}
function sourceOpen(e) {
URL.revokeObjectURL(vidElement.src);
var mime = 'video/mp4; codecs="avc1.640028"';
var mediaSource = e.target;
var sourceBuffer = mediaSource.addSourceBuffer(mime);
var videoUrl = 'my_mp4.mp4';
fetch(videoUrl)
.then(function(response) {
return response.arrayBuffer();
})
.then(function(arrayBuffer) {
sourceBuffer.addEventListener('updateend', function(e) {
if (!sourceBuffer.updating && mediaSource.readyState === 'open') {
mediaSource.endOfStream();
}
});
sourceBuffer.appendBuffer(arrayBuffer);
});
}
</script>
</html>
Output MP4 file generated by my C++ application can be played i.e. in MPC, but it doesn't play in any web browser I tested it with. It also doesn't have any duration (MPC keeps showing 00:00).
To compare output MP4 file I got from my C++ application described above, I also used FFMPEG to create fragmented MP4 file from the same source file with raw H264 stream. I used the following command:
ffmpeg -r 60 -i input.h264 -c:v copy -f mp4 -movflags empty_moov+default_base_moof+frag_keyframe test.mp4
This file generated by FFMPEG is played correctly by every web browser I used for tests. It also has correct duration (but also it has trailing atom, which wouldn't be present in my live stream anyway, and as I need a live stream, it won't have any fixed duration in the first place).
MP4 atoms for both files look very similiar (they have identical avcc section for sure). What's interesting (but not sure if it's of any importance), both files have different NALs format than input file (RPI camera produces video stream in Annex-B format, while output MP4 files contain NALs in AVCC format... or at least it looks like it's the case when I compare mdat atoms with input H264 data).
I assume there is some field (or a few fields) I need to set for avcodec to make it produce video stream that would be properly decoded and played by browsers players. But what field(s) do I need to set? Or maybe problem lies somewhere else? I ran out of ideas.
EDIT 1:
As suggested, I investigated binary content of both MP4 files (generated by my app and FFMPEG tool) with hex editor. What I can confirm:
both files have identical avcc section (they match perfectly and are in AVCC format, I analyzed it byte after byte and there's no mistake about it)
both files have NALs in AVCC format (I looked closely at mdat atoms and they don't differ between both MP4 files)
So I guess there's nothing wrong with the extradata creation in my code - avcodec takes care of it properly, even if I just feed it with SPS and PPS NALs. It converts them by itself, so no need for me to do it by hand. Still, my original problem remains.
EDIT 2: I achieved partial success - MP4 generated by my app now plays in Firefox. I added this line to the code (along with rest of stream initialization):
videoStream->codec->time_base = videoStream->time_base;
So now this section of my code looks like this:
//Set video stream data
videoStream->id = formatCtxt->nb_streams - 1;
videoStream->codec->width = 1280;
videoStream->codec->height = 720;
videoStream->time_base.den = 60; //FPS
videoStream->time_base.num = 1;
videoStream->codec->time_base = videoStream->time_base;
videoStream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;
formatCtxt->pb = ioCtxt;
I finally found the solution. My MP4 now plays in Chrome (while still playing in other tested browsers).
In Chrome chrome://media-internals/ shows MSE logs (of a sort). When I looked there, I found a few of following warnings for my test player:
ISO-BMFF container metadata for video frame indicates that the frame is not a keyframe, but the video frame contents indicate the opposite.
That made me think and encouraged to set AV_PKT_FLAG_KEY for packets with keyframes. I added following code to section with filling AVPacket structure:
//Check if keyframe field needs to be set
int allowedNalsCount = 3; //In one packet there would be at most three NALs: SPS, PPS and video frame
packet.flags = 0;
for(int i = 0; i < frameSize && allowedNalsCount > 0; ++i)
{
uint32_t *curr = (uint32_t*)(frameBuffer + i);
if(*curr == synchMarker)
{
uint8_t nalType = frameBuffer[i + sizeof(uint32_t)] & 0x1F;
if(nalType == KEYFRAME)
{
std::cout << "Keyframe detected at frame nr " << framesTotal << std::endl;
packet.flags = AV_PKT_FLAG_KEY;
break;
}
else
i += sizeof(uint32_t) + 1; //We parsed this already, no point in doing it again
--allowedNalsCount;
}
}
A KEYFRAME constant turns out to be 0x5 in my case (Slice IDR).
MP4 atoms for both files look very similiar (they have identical avcc
section for sure)
Double check that, The code supplied suggests otherwise to me.
What's interesting (but not sure if it's of any importance), both
files have different NALs format than input file (RPI camera produces
video stream in Annex-B format, while output MP4 files contain NALs in
AVCC format... or at least it looks like it's the case when I compare
mdat atoms with input H264 data).
It is very important, mp4 will not work with annex b.
You need to fill in extradata with AVC Decoder Configuration Record, not just SPS/PPS
Here's how the record should look like:
AVCDCR
We can find this explanation in [Chrome Source]
(https://chromium.googlesource.com/chromium/src/+/refs/heads/master/media/formats/mp4/mp4_stream_parser.cc#799) "chrome media source code":
// Copyright 2014 The Chromium Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
// Use |analysis.is_keyframe|, if it was actually determined, for logging
// if the analysis mismatches the container's keyframe metadata for
// |frame_buf|.
if (analysis.is_keyframe.has_value() &&
is_keyframe != analysis.is_keyframe.value()) {
LIMITED_MEDIA_LOG(DEBUG, media_log_, num_video_keyframe_mismatches_,
kMaxVideoKeyframeMismatchLogs)
<< "ISO-BMFF container metadata for video frame indicates that the "
"frame is "
<< (is_keyframe ? "" : "not ")
<< "a keyframe, but the video frame contents indicate the "
"opposite.";
// As of September 2018, it appears that all of Edge, Firefox, Safari
// work with content that marks non-avc-keyframes as a keyframe in the
// container. Encoders/muxers/old streams still exist that produce
// all-keyframe mp4 video tracks, though many of the coded frames are
// not keyframes (likely workaround due to the impact on low-latency
// live streams until https://crbug.com/229412 was fixed). We'll trust
// the AVC frame's keyframe-ness over the mp4 container's metadata if
// they mismatch. If other out-of-order codecs in mp4 (e.g. HEVC, DV)
// implement keyframe analysis in their frame_bitstream_converter, we'll
// similarly trust that analysis instead of the mp4.
is_keyframe = analysis.is_keyframe.value();
}
As the code comment show, chrome trust the AVC frame's keyframe-ness over the mp4 container's metadata. So nalu type in H264/HEVC should be more important than mp4 container box sdtp & trun description.

C++/C FFmpeg artifact build up across video frames

Context:
I am building a recorder for capturing video and audio in separate threads (using Boost thread groups) using FFmpeg 2.8.6 on Ubuntu 16.04. I followed the demuxing_decoding example here: https://www.ffmpeg.org/doxygen/2.8/demuxing_decoding_8c-example.html
Video capture specifics:
I am reading H264 off a Logitech C920 webcam and writing the video to a raw file. The issue I notice with the video is that there seems to be a build-up of artifacts across frames until a particular frame resets. Here is my frame grabbing, and decoding functions:
// Used for injecting decoding functions for different media types, allowing
// for a generic decode loop
typedef std::function<int(AVPacket*, int*, int)> PacketDecoder;
/**
* Decodes a video packet.
* If the decoding operation is successful, returns the number of bytes decoded,
* else returns the result of the decoding process from ffmpeg
*/
int decode_video_packet(AVPacket *packet,
int *got_frame,
int cached){
int ret = 0;
int decoded = packet->size;
*got_frame = 0;
//Decode video frame
ret = avcodec_decode_video2(video_decode_context,
video_frame, got_frame, packet);
if (ret < 0) {
//FFmpeg users should use av_err2str
char errbuf[128];
av_strerror(ret, errbuf, sizeof(errbuf));
std::cerr << "Error decoding video frame " << errbuf << std::endl;
decoded = ret;
} else {
if (*got_frame) {
video_frame->pts = av_frame_get_best_effort_timestamp(video_frame);
//Write to log file
AVRational *time_base = &video_decode_context->time_base;
log_frame(video_frame, time_base,
video_frame->coded_picture_number, video_log_stream);
#if( DEBUG )
std::cout << "Video frame " << ( cached ? "(cached)" : "" )
<< " coded:" << video_frame->coded_picture_number
<< " pts:" << pts << std::endl;
#endif
/*Copy decoded frame to destination buffer:
*This is required since rawvideo expects non aligned data*/
av_image_copy(video_dest_attr.video_destination_data,
video_dest_attr.video_destination_linesize,
(const uint8_t **)(video_frame->data),
video_frame->linesize,
video_decode_context->pix_fmt,
video_decode_context->width,
video_decode_context->height);
//Write to rawvideo file
fwrite(video_dest_attr.video_destination_data[0],
1,
video_dest_attr.video_destination_bufsize,
video_out_file);
//Unref the refcounted frame
av_frame_unref(video_frame);
}
}
return decoded;
}
/**
* Grabs frames in a loop and decodes them using the specified decoding function
*/
int process_frames(AVFormatContext *context,
PacketDecoder packet_decoder) {
int ret = 0;
int got_frame;
AVPacket packet;
//Initialize packet, set data to NULL, let the demuxer fill it
av_init_packet(&packet);
packet.data = NULL;
packet.size = 0;
// read frames from the file
for (;;) {
ret = av_read_frame(context, &packet);
if (ret < 0) {
if (ret == AVERROR(EAGAIN)) {
continue;
} else {
break;
}
}
//Convert timing fields to the decoder timebase
unsigned int stream_index = packet.stream_index;
av_packet_rescale_ts(&packet,
context->streams[stream_index]->time_base,
context->streams[stream_index]->codec->time_base);
AVPacket orig_packet = packet;
do {
ret = packet_decoder(&packet, &got_frame, 0);
if (ret < 0) {
break;
}
packet.data += ret;
packet.size -= ret;
} while (packet.size > 0);
av_free_packet(&orig_packet);
if(stop_recording == true) {
break;
}
}
//Flush cached frames
std::cout << "Flushing frames" << std::endl;
packet.data = NULL;
packet.size = 0;
do {
packet_decoder(&packet, &got_frame, 1);
} while (got_frame);
av_log(0, AV_LOG_INFO, "Done processing frames\n");
return ret;
}
Questions:
How do I go about debugging the underlying issue?
Is it possible that running the decoding code in a thread other than the one in which the decoding context was opened is causing the problem?
Am I doing something wrong in the decoding code?
Things I have tried/found:
I found this thread that is about the same problem here: FFMPEG decoding artifacts between keyframes
(I cannot post samples of my corrupted frames due to privacy issues, but the image linked to in that question depicts the same issue I have)
However, the answer to the question is posted by the OP without specific details about how the issue was fixed. The OP only mentions that he wasn't 'preserving the packets correctly', but nothing about what was wrong or how to fix it. I do not have enough reputation to post a comment seeking clarification.
I was initially passing the packet into the decoding function by value, but switched to passing by pointer on the off chance that the packet freeing was being done incorrectly.
I found another question about debugging decoding issues, but couldn't find anything conclusive: How is video decoding corruption debugged?
I'd appreciate any insight. Thanks a lot!
[EDIT] In response to Ronald's answer, I am adding a little more information that wouldn't fit in a comment:
I am only calling decode_video_packet() from the thread processing video frames; the other thread processing audio frames calls a similar decode_audio_packet() function. So only one thread calls the function. I should mention that I have set the thread_count in the decoding context to 1, failing which I would get a segfault in malloc.c while flushing the cached frames.
I can see this being a problem if the process_frames and the frame decoder function were run on separate threads, which is not the case. Is there a specific reason why it would matter if the freeing is done within the function, or after it returns? I believe the freeing function is passed a copy of the original packet because multiple decode calls would be required for audio packet in case the decoder doesnt decode the entire audio packet.
A general problem is that the corruption does not occur all the time. I can debug better if it is deterministic. Otherwise, I can't even say if a solution works or not.
A few things to check:
are you running multiple threads that are calling decode_video_packet()? If you are: don't do that! FFmpeg has built-in support for multi-threaded decoding, and you should let FFmpeg do threading internally and transparently.
you are calling av_free_packet() right after calling the frame decoder function, but at that point it may not yet have had a chance to copy the contents. You should probably let decode_video_packet() free the packet instead, after calling avcodec_decode_video2().
General debugging advice:
run it without any threading and see if that works;
if it does, and with threading it fails, use thread debuggers such as tsan or helgrind to help in finding race conditions that point to your code.
it can also help to know whether the output you're getting is reproduceable (this suggests a non-threading-related bug in your code) or changes from one run to the other (this suggests a race condition in your code).
And yes, the periodic clean-ups are because of keyframes.

FFmpeg + OpenAL - playback streaming sound from video won't work

I am decoding an OGG video (theora & vorbis as codecs) and want to show it on the screen (using Ogre 3D) while playing its sound. I can decode the image stream just fine and the video plays perfectly with the correct frame rate, etc.
However, I cannot get the sound to play at all with OpenAL.
Edit: I managed to make the playing sound resemble the actual audio in the video at least somewhat. Updated sample code.
Edit 2: I was able to get "almost" correct sound now. I had to set OpenAL to use AL_FORMAT_STEREO_FLOAT32 (after initializing the extension) instead of just STEREO16. Now the sound is "only" extremely high pitched and stuttering, but at the correct speed.
Here is how I decode audio packets (in a background thread, the equivalent works just fine for the image stream of the video file):
//------------------------------------------------------------------------------
int decodeAudioPacket( AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
// Decode audio frame
int got_frame = 0;
int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
if (decoded < 0)
{
p_videoInfo.error = "Error decoding audio frame.";
return decoded;
}
// Frame is complete, store it in audio frame queue
if (got_frame)
{
int bufferSize = av_samples_get_buffer_size(NULL, p_audioCodecContext->channels, p_frame->nb_samples,
p_audioCodecContext->sample_fmt, 0);
int64_t duration = p_frame->pkt_duration;
int64_t dts = p_frame->pkt_dts;
if (staticOgreLog)
{
staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: "
+ boost::lexical_cast<std::string>(bufferSize) + " / "
+ boost::lexical_cast<std::string>(duration) + " / "
+ boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
}
// Create the audio frame
AudioFrame* frame = new AudioFrame();
frame->dataSize = bufferSize;
frame->data = new uint8_t[bufferSize];
if (p_frame->channels == 2)
{
memcpy(frame->data, p_frame->data[0], bufferSize >> 1);
memcpy(frame->data + (bufferSize >> 1), p_frame->data[1], bufferSize >> 1);
}
else
{
memcpy(frame->data, p_frame->data, bufferSize);
}
double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
frame->lifeTime = duration * timeBase;
p_player->addAudioFrame(frame);
}
return decoded;
}
So, as you can see, I decode the frame, memcpy it to my own struct, AudioFrame. Now, when the sound is played, I use these audio frame like this:
int numBuffers = 4;
ALuint buffers[4];
alGenBuffers(numBuffers, buffers);
ALenum success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on alGenBuffers : " + Ogre::StringConverter::toString(success) + alGetString(success));
return;
}
// Fill a number of data buffers with audio from the stream
std::vector<AudioFrame*> audioBuffers;
std::vector<unsigned int> audioBufferSizes;
unsigned int numReturned = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffers, audioBuffers, audioBufferSizes);
// Assign the data buffers to the OpenAL buffers
for (unsigned int i = 0; i < numReturned; ++i)
{
alBufferData(buffers[i], _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on alBufferData : " + Ogre::StringConverter::toString(success) + alGetString(success)
+ " size: " + Ogre::StringConverter::toString(audioBufferSizes[i]));
return;
}
}
// Queue the buffers into OpenAL
alSourceQueueBuffers(_source, numReturned, buffers);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error queuing streaming buffers: " + Ogre::StringConverter::toString(success) + alGetString(success));
return;
}
}
alSourcePlay(_source);
The format and frequency I give to OpenAL are AL_FORMAT_STEREO_FLOAT32 (it is a stereo sound stream, and I did initialize the FLOAT32 extension) and 48000 (which is the sample rate of the AVCodecContext of the audio stream).
And during playback, I do the following to refill OpenAL's buffers:
ALint numBuffersProcessed;
// Check if OpenAL is done with any of the queued buffers
alGetSourcei(_source, AL_BUFFERS_PROCESSED, &numBuffersProcessed);
if(numBuffersProcessed <= 0)
return;
// Fill a number of data buffers with audio from the stream
std::vector<AudiFrame*> audioBuffers;
std::vector<unsigned int> audioBufferSizes;
unsigned int numFilled = FFMPEG_PLAYER->getDecodedAudioFrames(numBuffersProcessed, audioBuffers, audioBufferSizes);
// Assign the data buffers to the OpenAL buffers
ALuint buffer;
for (unsigned int i = 0; i < numFilled; ++i)
{
// Pop the oldest queued buffer from the source,
// fill it with the new data, then re-queue it
alSourceUnqueueBuffers(_source, 1, &buffer);
ALenum success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error Unqueuing streaming buffers: " + Ogre::StringConverter::toString(success));
return;
}
alBufferData(buffer, _streamingFormat, audioBuffers[i]->data, audioBufferSizes[i], _streamingFrequency);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error on re- alBufferData: " + Ogre::StringConverter::toString(success));
return;
}
alSourceQueueBuffers(_source, 1, &buffer);
success = alGetError();
if(success != AL_NO_ERROR)
{
CONSOLE_LOG("Error re-queuing streaming buffers: " + Ogre::StringConverter::toString(success) + " "
+ alGetString(success));
return;
}
}
// Make sure the source is still playing,
// and restart it if needed.
ALint playStatus;
alGetSourcei(_source, AL_SOURCE_STATE, &playStatus);
if(playStatus != AL_PLAYING)
alSourcePlay(_source);
As you can see, I do quite heavy error checking. But I do not get any errors, neither from OpenAL nor from FFmpeg.
Edit: What I hear somewhat resembles the actual audio from the video, but VERY high pitched and stuttering VERY much. Also, it seems to be playing on top of TV noise. Very strange. Plus, it is playing much slower than the correct audio would.
Edit: 2 After using AL_FORMAT_STEREO_FLOAT32, the sound plays at the correct speed, but is still very high pitched and stuttering (though less than before).
The video itself is not broken, it can be played fine on any player. OpenAL can also play *.way files just fine in the same application, so it is also working.
Any ideas what could be wrong here or how to do this correctly?
My only guess is that somehow, FFmpeg's decode function does not produce data OpenGL can read. But this is as far as the FFmpeg decode example goes, so I don't know what's missing. As I understand it, the decode_audio4 function decodes the frame to raw data. And OpenAL should be able to work with RAW data (or rather, doesn't work with anything else).
So, I finally figured out how to do it. Gee, what a mess. It was a hint from a user on the libav-users mailing list that put me on the correct path.
Here are my mistakes:
Using the wrong format in the alBufferData function. I used AL_FORMAT_STEREO16 (as that is what every single streaming example with OpenAL uses). I should have used AL_FORMAT_STEREO_FLOAT32, as the video I stream is Ogg and vorbis is stored in floating points. And using swr_convert to convert from AV_SAMPLE_FMT_FLTP to AV_SAMPLE_FMT_S16 just crashes. No idea why.
Not using swr_convert to convert the decoded audio frame to the target format. After I was trying to use swr_convert to convert from FLTP to S16, and it would simply crash without a reason given, I assumed it was broken. But after figuring out my first mistake, I tried again, converting from FLTP to FLT (non-planar) and then it worked! So OpenAL uses interleaved format, not planar. Good to know.
So here is the decodeAudioPacket function that is working for me with Ogg video, vorbis audio stream:
int decodeAudioPacket( AVPacket& p_packet, AVCodecContext* p_audioCodecContext, AVFrame* p_frame,
SwrContext* p_swrContext, uint8_t** p_destBuffer, int p_destLinesize,
FFmpegVideoPlayer* p_player, VideoInfo& p_videoInfo)
{
// Decode audio frame
int got_frame = 0;
int decoded = avcodec_decode_audio4(p_audioCodecContext, p_frame, &got_frame, &p_packet);
if (decoded < 0)
{
p_videoInfo.error = "Error decoding audio frame.";
return decoded;
}
if(decoded <= p_packet.size)
{
/* Move the unread data to the front and clear the end bits */
int remaining = p_packet.size - decoded;
memmove(p_packet.data, &p_packet.data[decoded], remaining);
av_shrink_packet(&p_packet, remaining);
}
// Frame is complete, store it in audio frame queue
if (got_frame)
{
int outputSamples = swr_convert(p_swrContext,
p_destBuffer, p_destLinesize,
(const uint8_t**)p_frame->extended_data, p_frame->nb_samples);
int bufferSize = av_get_bytes_per_sample(AV_SAMPLE_FMT_FLT) * p_videoInfo.audioNumChannels
* outputSamples;
int64_t duration = p_frame->pkt_duration;
int64_t dts = p_frame->pkt_dts;
if (staticOgreLog)
{
staticOgreLog->logMessage("Audio frame bufferSize / duration / dts: "
+ boost::lexical_cast<std::string>(bufferSize) + " / "
+ boost::lexical_cast<std::string>(duration) + " / "
+ boost::lexical_cast<std::string>(dts), Ogre::LML_NORMAL);
}
// Create the audio frame
AudioFrame* frame = new AudioFrame();
frame->dataSize = bufferSize;
frame->data = new uint8_t[bufferSize];
memcpy(frame->data, p_destBuffer[0], bufferSize);
double timeBase = ((double)p_audioCodecContext->time_base.num) / (double)p_audioCodecContext->time_base.den;
frame->lifeTime = duration * timeBase;
p_player->addAudioFrame(frame);
}
return decoded;
}
And here is how I initialize the context and the destination buffer:
// Initialize SWR context
SwrContext* swrContext = swr_alloc_set_opts(NULL,
audioCodecContext->channel_layout, AV_SAMPLE_FMT_FLT, audioCodecContext->sample_rate,
audioCodecContext->channel_layout, audioCodecContext->sample_fmt, audioCodecContext->sample_rate,
0, NULL);
int result = swr_init(swrContext);
// Create destination sample buffer
uint8_t** destBuffer = NULL;
int destBufferLinesize;
av_samples_alloc_array_and_samples( &destBuffer,
&destBufferLinesize,
videoInfo.audioNumChannels,
2048,
AV_SAMPLE_FMT_FLT,
0);