Is there a way to decompress sound files using the FMOD library in c++?
I'm developing a sound editor, using the FMOD Engine library, but I got to the problem with compressed audio files, specifically OGG types.
For now I'm just reading the raw data using FMOD::Sound::readData(), and then normalize it and display it to the screen using SFML. This works fine with WAV files, because they are not compressed, but I need to do more steps for compressed formats.
This is what I'm doing now:
FMOD_RESULT r;
FMOD::System* m_fmodSystem = nullptr;
int m_maxChannels = 64;
// Create fmod system
r = FMOD::System_Create(&m_fmodSystem);
FMOD_ERROR_CHECK(r);
// Initialize system
r = m_fmodSystem->init(m_maxChannels, FMOD_INIT_NORMAL, nullptr);
FMOD_ERROR_CHECK(r);
// Create sound
FMOD::Sound* soundResource = nullptr;
FMOD::Channel* channel = nullptr;
r = m_fmodSystem->createSound("640709__chobesha__laser-gun-sound.ogg",
FMOD_DEFAULT | FMOD_OPENONLY,
nullptr,
&soundResource);
FMOD_ERROR_CHECK(r);
// Get sound length in raw bytes
unsigned int audioLength = 0;
FMOD_TIMEUNIT timeUnit = FMOD_TIMEUNIT_RAWBYTES;
r = soundResource->getLength(&audioLength, timeUnit);
FMOD_ERROR_CHECK(r);
// Read sound data
char* audioBuffer = new char[audioLength];
unsigned int readData = 0;
r = soundResource->readData(reinterpret_cast<void*>(audioBuffer),
audioLength,
&readData);
FMOD_ERROR_CHECK(r);
signed short* interpretedData = reinterpret_cast<signed short*>(audioBuffer);
// Analize data to normalize it
signed short maxValue = -32767;
signed short minValue = 32767;
int interpretedDataSize = readData / sizeof(signed short);
for (int i = 0; i < interpretedDataSize; ++i) {
if (interpretedData[i] > maxValue) {
maxValue = interpretedData[i];
}
if (interpretedData[i] < minValue) {
minValue = interpretedData[i];
}
}
float maxValF = static_cast<float>(maxValue);
float minValF = static_cast<float>(minValue);
float* normalizedArray = new float[interpretedDataSize];
// Normalize data
float maxAbsValF = abs(maxValF);
maxAbsValF = maxAbsValF > abs(minValF) ? maxAbsValF : abs(minValF);
for (int i = 0; i < interpretedDataSize; ++i) {
normalizedArray[i] = interpretedData[i] / maxAbsValF;
}
I read on other posts and on the FMOD documentation that I can use the flag FMOD_CREATESAMPLE to tell the createSound function to decompress the data at loadtime, instead of playtime, but It doesn't work in my current structure of the code, I'm guessing because the FMOD_OPENONLY prevents it from closing, and therefor it doesn't gets the chance to decompress, or something. That's what I got from the documentation.
The problem with not using the FMOD_OPENONLY flag, is that I cannot read the data using the readData function, or it returns an error flag.
Searching, I found that I can use the lock function, to help it decompress and to get the pointer to the data of the sound, but even with all of this, it stills appears to be compressed. I donĀ“t know if I'm missing something.
This is the version 2 of the code, with this modifications:
// Create sound
FMOD::Sound* soundResource = nullptr;
FMOD::Channel* channel = nullptr;
r = m_fmodSystem->createSound("640709__chobesha__laser-gun-sound.ogg",
FMOD_DEFAULT | FMOD_CREATESAMPLE,
nullptr,
&soundResource);
FMOD_ERROR_CHECK(r);
// Get sound length in raw bytes
unsigned int audioLength = 0;
FMOD_TIMEUNIT timeUnit = FMOD_TIMEUNIT_RAWBYTES;
r = soundResource->getLength(&audioLength, timeUnit);
FMOD_ERROR_CHECK(r);
// Read sound data
char* audioBuffer = new char[audioLength];
void* ptr2 = nullptr;
unsigned int len1, len2;
r = soundResource->lock(0, audioLength, reinterpret_cast<void**>(&audioBuffer), &ptr2, &len1, &len2);
FMOD_ERROR_CHECK(r);
r = soundResource->unlock(reinterpret_cast<void*>(audioBuffer), ptr2, len1, len2);
FMOD_ERROR_CHECK(r);
This is the graph I get for a WAV sound
The left side is the sound loaded in the Audacity app, and the right side is my graph.
This is the graph for the first try with the OGG file
And this is the graph for the OGG file with the modifications
For what I can see, the first is the same as the second, so I'm assuming both are compressed and what I changed did nothing.
Someone knows a better way to decompress and read the raw data of a sound, preferably using this library of FMOD. If it's not possible with FMOD, what is the best way to decompress any sound file, knowing its format.
This answer is just a couple minority-position takes and a sketchy description of a process I once used. Maybe the thoughts are worth consideration.
One thought: a person who is editing sound (your target audience?) has the know-how to decompress files (e.g., using Audacity), so perhaps adding this capability (handling all possible incoming audio formats) is a lower priority?
Another thought: there are likely many libraries for decompressing sound available. You could employ one of them prior to presenting the results to FMOD. I just did a search on github for "ogg c++" and was shown 51 repositories.
In my own experience, for an application I wrote about seven years ago, I tweaked some code from a Vorbis decoder source so that it output PCM rather than outputting as a .wav. With OGG, the .wav data is converted to PCM prior to compression. So, it decompresses back to PCM before converting that to a .wav. I found the point in the code where the conversion happens and edited that out, leaving the data in a decompressed PCM form.
My application was built to accept PCM, so I actually ended up saving an intermediate step.
Related
I created an application a couple of years ago that allowed me to process audio by downmixing a 6 channel or 8 channel a.k.a 5.1 as 7.1 as matrixed stereo encoded for that purpose I used the portaudio library with great results this is an example of the open stream function and callback to downmix a 7.1 signal
Pa_OpenStream(&Flujo, &inputParameters, &outParameters, SAMPLE_RATE, 1, paClipOff, ptrFunction, NULL);
Notice the use of framesPerBuffer value of just one (1), this is my callback function
int downmixed8channels(const void *input, void *output, unsigned long framesPerBuffer, const PaStreamCallbackTimeInfo * info, PaStreamCallbackFlags state, void * userData)
{
(void)userData;
(void)info;
(void)state;
(void)framesBuffer;
float *ptrInput = (float*)input;
float *ptrOutput = (float*)ouput;
/*This is a struct to identify samples*/
AudioSamples->L = ptrInput[0];
AudioSamples->R = ptrInput[1];
AudioSamples->C = ptrInput[2];
AudioSamples->LFE = ptrInput[3];
AudioSamples->RL = ptrInput[4];
AudioSamples->RR = ptrInput[5];
AudioSamples->SL = ptrInput[6];
AudioSamples->SR = ptrInput[7];
Encoder->8channels(AudioSamples->L,
AudioSamples->R,
AudioSamples->C,
AudioSamples->LFE,
MuestrasdeAudio->SL,
MuestrasdeAudio->SR,
MuestrasdeAudio->RL,
MuestrasdeAudio->RR,);
ptrOutput[0] = Encoder->gtLT();
ptrOutput[1] = Encoder->gtRT();
return paContinue;
}
As you can see the order set by the index in the output and input buffer correspond to a discrete channel
in the case of the output 0 = Left channel, 1 = right Channel. This used to work well, until entered Windows 10 2004, since I updated my system to this new version my audio glitch and I get artifacts like those
Those are captures from the sound of the channel test window under the audio device panel of windows. By the images is clear my program is dropping frames, so the first try to solve this was to use a larger buffer than one to hold samples process them and send then, the reason I did not use a buffer size larger than one in the first place was that the program would drop frames.
But before implementing a I did a proof of concept, would not include audio processing at all, of simple passing of data from input to ouput, for that I set the oputput channelCount parameters to 8 just like the input, resulting in something as simple as this.
for (int i = 0; i < FramesPerBuffer /*1000*/; i++)
{
ptrOutput[i] = ptrOutput[i];
}
but still the program is still dropping samples.
Next I used two callbacks one for writing to a buffer and a second one to read it and send it to output
(void)info;
(void)userData;
(void)state;
(void)output;
float* ptrInput = (float*)input;
for (int i = 0; i < FRAME_SIZE; i++)
{
buffer_input[i] = ptrInput[i];
}
return paContinue;
Callback to store.
(void)info;
(void)userData;
(void)state;
(void)output;
float* ptrOutput = (float*)output;
for (int i = 0; i < FRAME_SIZE; i++)
{
AudioSamples->L = (buffer_input[i] );
AudioSamples->R = (buffer_input[i++]);
AudioSamples->C = (buffer_input[i++] );
AudioSamples->LFE = (buffer_input[i++]);
AudioSamples->SL = (buffer_input[i++] );
AudioSamples->SR = (buffer_input[i++]);
Encoder->Encoder(AudioSamples->L, AudioSamples->R, AudioSamples->C, AudioSamples->LFE,
AudioSamples->SL, AudioSamples->SR);
bufferTransformed[w] = (Encoder->getLT() );
bufferTransformed[w++] = (Encoder->getRT() );
}
w = 0;
for (int i = 0; i < FRAME_REDUCED; i++)
{
ptrOutput[i] = buffer_Transformed[i];
}
return paContinue;
Callback for processing
The processing callback use a reduced frames per buffer since 2 channel is less than eight since it seems in portaudio a frame is composed of a sample for each audio channel.
This also did not work, since the first problem, is how to syncronize the two callback?, after all of this, what recommendation or advice, can you give me to solve this issue,
Notes: the samplerate must be same for both devices, I implemeted logic in the program to prevent this, the bitdepth is also the same I am using paFloat32,
.The portaudio is the modified one use by audacity, since I wanted to use their implementation of WASAPI
loopback
Thank very much in advance!.
At the end of the day it I did not have to change my callbacks functions in any way, what solved it, was changing or increasing the parameter ".suggestedLatency" of the input and output parameters, to 1.0, even the devices defaultLowOutputLatency or defaultHighOutputLatency values where causing to much glitching, I test it until 1.0 was de sweepspot, higher values did not seen to improve.
TL;DR Increased the suggestedLatency until the glitching is gone.
I have no experience in audio programming and C++ is quite low level language so I have a little problems with it. I work with ASIO SDK 2.3 downloaded from http://www.steinberg.net/en/company/developers.html.
I am writing my own host based on example inside SDK.
For now I've managed to go through the whole sample and it looks like it's working. I have external sound card connected to my PC. I've successfully loaded driver for this device, configured it, handled callbacks, casting data from analog to digital etc. common stuff.
And part where I am stuck now:
When I play some track via my device I can see bars moving in the mixer (device's software). So device is connected in right way. In my code I've picked the inputs and outputs with the names of the bars that are moving in mixer. I've also used ASIOCreateBuffers() to create buffer for each input/output.
Now correct me if I am wrong:
When ASIOStart() is called and driver is in running state, when I input the sound signal to my external device I believe the buffers get filled with data, right?
I am reading the documentation but I am a bit lost - how can I access the data being sent by device to application, stored in INPUT buffers? Or signal? I need it for signal analysis or maybe recording in future.
EDIT: If I had made it to complicated then in a nutshell my question is: how can I access input stream data from code? I don't see any objects/callbacks letting me to do so in documentation.
The hostsample in the ASIO SDK is pretty close to what you need. In the bufferSwitchTimeInfo callback there is some code like this:
for (int i = 0; i < asioDriverInfo.inputBuffers + asioDriverInfo.outputBuffers; i++)
{
int ch = asioDriverInfo.bufferInfos[i].channelNum;
if (asioDriverInfo.bufferInfos[i].isInput == ASIOTrue)
{
char* buf = asioDriver.bufferInfos[i].buffers[index];
....
Inside of that if block asioDriver.bufferInfos[i].buffers[index] is a pointer to the raw audio data (index is a parameter to the method).
The format of the buffer is dependent upon the driver and that can be discovered by testing asioDriverInfo.channelInfos[i].type. The types of formats will be 32bit int LSB first, 32bit int MSB first, and so on. You can find the list of values in the ASIOSampleType enum in asio.h. At this point you'll want to convert the samples to some common format for downstream signal processing code. If you're doing signal processing you'll probably want convert to double. The file host\asioconvertsample.cpp will give you some idea of what's involved in the conversion. The most common format you're going to encounter is probably INT32 MSB. Here is how you'd convert it to double.
for (int i = 0; i < asioDriverInfo.inputBuffers + asioDriverInfo.outputBuffers; i++)
{
int ch = asioDriverInfo.bufferInfos[i].channelNum;
if (asioDriverInfo.bufferInfos[i].isInput == ASIOTrue)
{
switch (asioDriverInfo.channelInfos[i].type)
{
case ASIOInt32LSB:
{
double* pDoubleBuf = new double[_bufferSize];
for (int i = 0 ; i < _bufferSize ; ++i)
{
pDoubleBuf[i] = *(int*)asioDriverInfo.bufferInfos.buffers[index] / (double)0x7fffffff;
}
// now pDoubleBuf contains one channels worth of samples in the range of -1.0 to 1.0.
break;
}
// and so on...
Thank you very much. Your answer helped quite much but as I am inexperienced with C++ a bit :P I find it a bit problematic.
In general I've written my own host based on hostsample. I didn't implement asioDriverInfo structure and use common variables for now.
My first problem was:.
char* buf = asioDriver.bufferInfos[i].buffers[index];
as I got error that I can't cast (void*) to char* but this probably solved the problem:
char* buf = static_cast<char*>(bufferInfos[i].buffers[doubleBufferIndex]);
My second problem is with the data conversion. I've checked the file you've recommended me but I find it a little black magic. For now I am trying to follow your example and:
for (int i = 0; i < inputBuffers + outputBuffers; i++)
{
if (bufferInfos[i].isInput)
{
switch (channelInfos[i].type)
{
case ASIOSTInt32LSB:
{
double* pDoubleBuf = new double[buffSize];
for (int j = 0 ; j < buffSize ; ++j)
{
pDoubleBuf[j] = bufferInfos[i].buffers[doubleBufferIndex] / (double)0x7fffffff;
}
break;
}
}
}
I get error there:
pDoubleBuf[j] = bufferInfos[i].buffers[doubleBufferIndex] / (double)0x7fffffff;
which is:
error C2296: '/' : illegal, left operand has type 'void *'
What I don't get is that in your example there is no table there: asioDriverInfo.bufferInfos.buffers[index] after bufferInfos and even if I fix it... to what kind of type should I cast it to make it work. P
PS. I am sure ASIOSTInt32LSB data type is fine for my PC.
The ASIO input and output buffers are accessible using void pointers, but using memcpy or memmove to access I/O buffer will create a memory copy which is to be avoided if you are doing real-time processing. I would suggest casting the pointer type to int* so you can directly access them.
It's also very slow in real-time processing to cast types 1 by 1 when you have like 100+ audio channels when AVX2 is supported on most CPUs.
_mm256_loadu_si256() and _mm256_cvtepi32_ps() will do the conversion much faster.
i'm recording sound and encoding to mp3 with ffmpeg lib. then decode the mp3 data right away, play the decode data, but it's sounds so delayed.
here are the codes:
the function encode first parameter accepts the raw pcm data, len = 44100.
encode parameters:
cntx_->channels = 1;
cntx_->sample_rate = 44100;
cntx_->sample_fmt = 6;
cntx_->channel_layout = AV_CH_LAYOUT_MONO;
cntx_->bit_rate = 8000;
err_ = avcodec_open2(cntx_, codec_, NULL);
vector<unsigned char> encode(unsigned char* encode_data, unsigned int len)
{
vector<unsigned char> ret;
AVPacket avpkt;
av_init_packet(&avpkt);
unsigned int len_encoded = 0;
int data_left = len / 2;
int miss_c = 0, i = 0;
while (data_left > 0)
{
int sz = data_left > cntx_->frame_size ? cntx_->frame_size : data_left;
mp3_frame_->nb_samples = sz;
mp3_frame_->format = cntx_->sample_fmt;
mp3_frame_->channel_layout = cntx_->channel_layout;
int needed_size = av_samples_get_buffer_size(NULL, 1,
mp3_frame_->nb_samples, cntx_->sample_fmt, 1);
int r = avcodec_fill_audio_frame(mp3_frame_, 1, cntx_->sample_fmt, encode_data + len_encoded, needed_size, 0);
int gotted = -1;
r = avcodec_encode_audio2(cntx_, &avpkt, mp3_frame_, &gotted);
if (gotted){
i++;
ret.insert(ret.end(), avpkt.data, avpkt.data + avpkt.size);
}
else if (gotted == 0){
miss_c++;
}
len_encoded += needed_size;
data_left -= sz;
av_free_packet(&avpkt);
}
return ret;
}
std::vector<unsigned char> decode(unsigned char* data, unsigned int len)
{
std::vector<unsigned char> ret;
AVPacket avpkt;
av_init_packet(&avpkt);
avpkt.data = data;
avpkt.size = len;
AVFrame* pframe = av_frame_alloc();
while (avpkt.size > 0){
int goted = -1;av_frame_unref(pframe);
int used = avcodec_decode_audio4(cntx_, pframe, &goted, &avpkt);
if (goted){
ret.insert(ret.end(), pframe->data[0], pframe->data[0] + pframe->linesize[0]);
avpkt.data += used;
avpkt.size -= used;
avpkt.dts = avpkt.pts = AV_NOPTS_VALUE;
}
else if (goted == 0){
avpkt.data += used;
avpkt.size -= used;
avpkt.dts = avpkt.pts = AV_NOPTS_VALUE;
}
else if(goted < 0){
break;
}
}
av_frame_free(&pframe);
return ret;
}
Suppose it's the 100th call to encode(data, len), this "frame" would appear in 150th or later in the decode call, the latency is not acceptable. It seems the mp3lame encoder would keep the sample data for later use, but not my desire.
I don't know what is going wrong. Thank you for any information.
today i debug the code again and post some detail:
encode: each pcm sample frame len = 23040 ,which is 10 times of mp3 frame size, each time call encode only output 9 frames, this output cause decode output 20736 samples, 1 frame(2304 bytes) is lost, and the sound is noisy.
if the mp3 or mp2 encode is not suitable for real time voice transfer, which encoder should i choose?
Suppose it's the 100th call to encode(data, len), this "frame" would appear in 150th or later in the decode call, the latency is not acceptable.
Understand how the codec works and adjust your expectations accordingly.
MP3 is a lossy codec. It works by converting your time domain PCM data to the frequency domain. This conversion alone requires time (because frequency components do not exist in any instant of time... they can only exist over a period of time). At a simple level, it then uses a handful of algorithms to determine what spectral information to keep, and what to throw away. Each MP3 frame is hundreds of samples long in duration. (576 is as low as you can typically go. Twice that is a typical number.)
Now that you have the minimum time for creating frames, MP3 also uses what is called a bit reservoir. If a complex passage requires more bandwidth, it borrows unused bandwidth from neighboring frames. To facilitate this, a buffer of many frames is required.
On top of all the codec work, FFmpeg itself has buffering (for detection of input and what not), and there are buffers in your pipes to and from FFmpeg. I would imagine the codec itself may also employ general buffering on the input and output.
Finally, you're decoding the stream and playing it back, which means that most of the same kinds of buffers used in encoding are now used for decoding. And, we're not even talking about the several hundred milliseconds of latency you have for getting the audio data through a sound card and out the analog to your speaker.
You have an unrealistic expectation, and while it is possible to tweak some things to reduce latency (such as disabling the bit reservoir), it will result in a poor quality stream and will not be of low latency anyway.
I have a question regarding a sound synthesis app that I'm working on. I am trying to read in an audio file, create randomized 'grains' using granular synthesis techniques, place them into an output buffer and then be able to play that back to the user using OpenAL. For testing purposes, I am simply writing the output buffer to a file that I can then listen back to.
Judging by my results, I am on the right track but am getting some aliasing issues and playback sounds that just don't seem quite right. There is usually a rather loud pop in the middle of the output file and volume levels are VERY loud at times.
Here are the steps that I have taken to get the results I need, but I'm a little bit confused about a couple of things, namely formats that I am specifying for my AudioStreamBasicDescription.
Read in an audio file from my mainBundle, which is a mono file in .aiff format:
ExtAudioFileRef extAudioFile;
CheckError(ExtAudioFileOpenURL(loopFileURL,
&extAudioFile),
"couldn't open extaudiofile for reading");
memset(&player->dataFormat, 0, sizeof(player->dataFormat));
player->dataFormat.mFormatID = kAudioFormatLinearPCM;
player->dataFormat.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked;
player->dataFormat.mSampleRate = S_RATE;
player->dataFormat.mChannelsPerFrame = 1;
player->dataFormat.mFramesPerPacket = 1;
player->dataFormat.mBitsPerChannel = 16;
player->dataFormat.mBytesPerFrame = 2;
player->dataFormat.mBytesPerPacket = 2;
// tell extaudiofile about our format
CheckError(ExtAudioFileSetProperty(extAudioFile,
kExtAudioFileProperty_ClientDataFormat,
sizeof(AudioStreamBasicDescription),
&player->dataFormat),
"couldnt set client format on extaudiofile");
SInt64 fileLengthFrames;
UInt32 propSize = sizeof(fileLengthFrames);
ExtAudioFileGetProperty(extAudioFile,
kExtAudioFileProperty_FileLengthFrames,
&propSize,
&fileLengthFrames);
player->bufferSizeBytes = fileLengthFrames * player->dataFormat.mBytesPerFrame;
Next I declare my AudioBufferList and set some more properties
AudioBufferList *buffers;
UInt32 ablSize = offsetof(AudioBufferList, mBuffers[0]) + (sizeof(AudioBuffer) * 1);
buffers = (AudioBufferList *)malloc(ablSize);
player->sampleBuffer = (SInt16 *)malloc(sizeof(SInt16) * player->bufferSizeBytes);
buffers->mNumberBuffers = 1;
buffers->mBuffers[0].mNumberChannels = 1;
buffers->mBuffers[0].mDataByteSize = player->bufferSizeBytes;
buffers->mBuffers[0].mData = player->sampleBuffer;
My understanding is that .mData will be whatever was specified in the formatFlags (in this case, type SInt16). Since it is of type (void *), I want to convert this to float data which is obvious for audio manipulation. Before I set up a for loop which just iterated through the buffer and cast each sample to a float*. This seemed unnecessary so now I pass in my .mData buffer to a function I created which then granularizes the audio:
float *theOutBuffer = [self granularizeWithData:(float *)buffers->mBuffers[0].mData with:framesRead];
In this function, I dynamically allocate some buffers, create random size grains, place them in my out buffer after windowing them using a hamming window and return that buffer (which is float data). Everything is cool up to this point.
Next I set up all my output file ASBD and such:
AudioStreamBasicDescription outputFileFormat;
bzero(audioFormatPtr, sizeof(AudioStreamBasicDescription));
outputFileFormat->mFormatID = kAudioFormatLinearPCM;
outputFileFormat->mSampleRate = 44100.0;
outputFileFormat->mChannelsPerFrame = numChannels;
outputFileFormat->mBytesPerPacket = 2 * numChannels;
outputFileFormat->mFramesPerPacket = 1;
outputFileFormat->mBytesPerFrame = 2 * numChannels;
outputFileFormat->mBitsPerChannel = 16;
outputFileFormat->mFormatFlags = kAudioFormatFlagIsFloat | kAudioFormatFlagIsPacked;
UInt32 flags = kAudioFileFlags_EraseFile;
ExtAudioFileRef outputAudioFileRef = NULL;
NSString *tmpDir = NSTemporaryDirectory();
NSString *outFilename = #"Decomp.caf";
NSString *outPath = [tmpDir stringByAppendingPathComponent:outFilename];
NSURL *outURL = [NSURL fileURLWithPath:outPath];
AudioBufferList *outBuff;
UInt32 abSize = offsetof(AudioBufferList, mBuffers[0]) + (sizeof(AudioBuffer) * 1);
outBuff = (AudioBufferList *)malloc(abSize);
outBuff->mNumberBuffers = 1;
outBuff->mBuffers[0].mNumberChannels = 1;
outBuff->mBuffers[0].mDataByteSize = abSize;
outBuff->mBuffers[0].mData = theOutBuffer;
CheckError(ExtAudioFileCreateWithURL((__bridge CFURLRef)outURL,
kAudioFileCAFType,
&outputFileFormat,
NULL,
flags,
&outputAudioFileRef),
"ErrorCreatingURL_For_EXTAUDIOFILE");
CheckError(ExtAudioFileSetProperty(outputAudioFileRef,
kExtAudioFileProperty_ClientDataFormat,
sizeof(outputFileFormat),
&outputFileFormat),
"ErrorSettingProperty_For_EXTAUDIOFILE");
CheckError(ExtAudioFileWrite(outputAudioFileRef,
framesRead,
outBuff),
"ErrorWritingFile");
The file is written correctly, in CAF format. My question is this: am I handling the .mData buffer correctly in that I am casting the samples to float data, manipulating (granulating) various window sizes and then writing it to a file using ExtAudioFileWrite (in CAF format)? Is there a more elegant way to do this such as declaring my ASBD formatFlag as kAudioFlagIsFloat? My output CAF file has some clicks in it and when I open it in Logic, it looks like there is a lot of aliasing. This makes sense if I am trying to send it float data but there is some kind of conversion happening which I am unaware of.
Thanks in advance for any advice on the matter! I have been an avid reader of pretty much all the source material online, including the Core Audio Book, various blogs, tutorials, etc. The ultimate goal of my app is to play the granularized audio in real time to a user with headphones so the writing to file thing is just being used for testing at the moment. Thanks!
What you say about step 3 suggests to me you are interpreting an array of shorts as an array of floats? If that is so, we found the reason for your trouble. Can you assign the short values one by one into an array of floats? That should fix it.
It looks like mData is a void * pointing to an array of shorts. Casting this pointer to a float * doesn't change the underlying data into float but your audio processing function will treat them as if they were. However, float and short values are stored in totally different ways, so the math you do in that function will operate on very different values that have nothing to do with your true input signal. To investigate this experimentally, try the following:
short data[4] = {-27158, 16825, 23024, 15};
void *pData = data;
The void pointer doesn't indicate what kind of data it points to, so erroneously, one can falsely assume it points to float values. Note that a short is 2 byte wide, but a float is 4 byte wide. It is a coincidence that your code did not crash with an access violation. Interpreted as float the array above is only long enough for two values. Let's just look at the first value:
float *pfData = (float *)pData;
printf("%d == %f\n", data[0], pfData[0]);
The output of this will be -27158 == 23.198200 illustrating how instead of the expected -27158.0f you obtain roughly 23.2f. Two problematic things happened. First, sizeof(float) is not sizeof(short). Second, the "ones and zeros" of a floating point number are stored very differently than an integer. See http://en.wikipedia.org/wiki/Single_precision_floating-point_format.
How to solve the problem? There are at least two simple solutions. First, you could convert each element of the array before you feed it into your audio processor:
int k;
float *pfBuf = (float *)malloc(n_data * sizeof(float));
short *psiBuf = (short *)buffers->mBuffers[0].mData[k];
for (k = 0; k < n_data; k ++)
{
pfBuf[k] = psiBuf[k];
}
[self granularizeWithData:pfBuf with:framesRead];
for (k = 0; k < n_data; k ++)
{
psiBuf[k] = pfBuf[k];
}
free(pfBuf);
You see that most likely you will have to convert everything back to short after your call to granularizeWithData: with:. So the second solution would be to do all processing in short although from what you write, I imagine you would not like that latter approach.
So I'm picking up C++ after a long hiatus and I had the idea to create a program which can generate music based upon strings of numbers at runtime (was inspired by the composition of Pi done by some people) with the eventual goal being some sort of procedural music generation software.
So far I have been able to make a really primitive version of this with the Beep() function and just feeding through the first so and so digits of Pi as a test. Works like a charm.
What I'm looking for now is how I could kick it up a notch and get some higher quality sound being made (because Beep() literally is the most primitive sound... ever) and I realized I have absolutely no idea how to do this. What I need is either a library or some sort of API that can:
1) Generate sound without pre-existing file. I want the result to be 100% generated by code and not rely on any samples, optimally.
2) If I could get something going that would be capable of playing multiple sounds at a time, like be able to play chords or a melody with a beat, that would be nice.
3) and If I could in any way control the wave it plays (kinda like chiptune mixers can) via equation or some other sort of data, that'd be super helpful.
I don't know if this is a weird request or I just researched it using the wrong terms, but I just wasn't able to find anything along these lines or at least nothing that was well documented at all. :/
If anyone can help, I'd really appreciate it.
EDIT: Also, apparently I'm just super not used to asking stuff on forums, my target platform is Windows (7, specifically, although I wouldn't think that matters).
I use portaudio (http://www.portaudio.com/). It will let you create PCM streams in a portable way. Then you just push the samples into the stream, and they will play.
#edit: using PortAudio is pretty easy. You initialize the library. I use floating point samples to make it super easy. I do it like this:
PaError err = Pa_Initialize();
if ( err != paNoError )
return false;
mPaParams.device = Pa_GetDefaultOutputDevice();
if ( mPaParams.device == paNoDevice )
return false;
mPaParams.channelCount = NUM_CHANNELS;
mPaParams.sampleFormat = paFloat32;
mPaParams.suggestedLatency =
Pa_GetDeviceInfo( mPaParams.device )->defaultLowOutputLatency;
mPaParams.hostApiSpecificStreamInfo = NULL;
Then later when you want to play sounds you create a stream, 2 channels for stereo, at 44khz, good for mp3 audio:
PaError err = Pa_OpenStream( &mPaStream,
NULL, // no input
&mPaParams,
44100, // params
NUM_FRAMES, // frames per buffer
0,
sndCallback,
this
);
Then you implement the callback to fill the PCM audio stream. The callback is a c function, but I just call through to my C++ class to handle the audio. I ripped this from my code, and it may not be 100% correct now as I removed a ton of stuff you won't care about. But its works kind of like this:
static int sndCallback( const void* inputBuffer,
void* outputBuffer,
unsigned long framesPerBuffer,
const PaStreamCallbackTimeInfo* timeInfo,
PaStreamCallbackFlags statusFlags,
void* userData )
{
Snd* snd = (Snd*)userData;
return snd->callback( (float*)outputBuffer, framesPerBuffer );
}
u32 Snd::callback( float* outbuf, u32 nFrames )
{
mPlayMutex.lock(); // use mutexes because this is asyc code!
// clear the output buffer
memset( outbuf, 0, nFrames * NUM_CHANNELS * sizeof( float ));
// mix all the sounds.
if ( mChannels.size() )
{
// I have multiple audio sources I'm mixing. That's what mChannels is.
for ( s32 i = mChannels.size(); i > 0; i-- )
{
for ( u32 j = 0; j < frameCount * NUM_CHANNELS; j++ )
{
float f = outbuf[j] + getNextSample( i ) // <------------------- your code here!!!
if ( f > 1.0 ) f = 1.0; // clamp it so you don't get clipping.
if ( f < -1.0 ) f = -1.0;
outbuf[j] = f;
}
}
}
mPlayMutex.unlock_p();
return 1; // when you are done playing audio return zero.
}
I answered a very similar question on this earlier this week: Note Synthesis, Harmonics (Violin, Piano, Guitar, Bass), Frequencies, MIDI . In your case if you don't want to rely on samples then the wavetable method is out. So your simplest option would be to dynamically vary the frequency and amplitude of sinusoids over time, which is easy but will sound pretty terrible (like a cheap Theremin). Your only real option would be a more sophisticated synthesis algorithm such as one of the Physical Modelling ones (eg Karplus-Strong). That would be an interesting project, but be warned that it does require something of a mathematical background.
You can indeed use something like Portaudio as Rafael has mentioned to physically get the sound out of the PC, in fact I think Portaudio is the best option for that. But generating the data so that it sounds musical is by far your biggest challenge.