Read data from wav file before applying FFT - c++

it's the first time when I'm working with wave files.
The problem is that I don't exactly understand how to properly read stored data. My code for reading:
uint8_t* buffer = new uint8_t[BUFFER_SIZE];
std::cout << "Buffering data... " << std::endl;
while ((bytesRead = fread(buffer, sizeof buffer[0], BUFFER_SIZE / (sizeof buffer[0]), wavFile)) > 0)
{
//do sth with buffer data
}
Sample file header gives me information that data is PCM (1 channel) with 8 bits per sample and sampling rate is 11025Hz.
Output data gives me (after updates) values from 0 to 255, so values are proper PCM values for 8bit modulation. But, any idea what BUFFER_SIZE would be prefferable to correctly read those values?
WAV file I'm using: http://www.wavsource.com/movies/2001.htm (daisy.wav)
TXT output: https://paste.ee/p/pXGvm

You've got two common situations. The first is where the WAV file represents a short audio sample and you want to read the whole thing into memory and manipulate it. So BUFFER_SIZE is a variable. Basically you seek to the end of the file to get its size, then load it.
The second common situation is that the WAV file represent fairly long audio recording, and you want to process it piecewise, often by writing to an output device in real time. So BUFFER_SIZE needs to be large enough to hold a bite-sized chunk, but not so large that you require excessive memory. Now often the size of a "frame" of audio is given by the output device itself, it expects 25 samples per second to synchronise with video or something similar. You generally need a double buffer to ensure that you can always meet the demand for more samples when the DAC (digital to analogue converter) runs out. Then on giving out a sample you load the next chunk of data from disk. Sometimes there isn't a "right" value for the chunk size, you've just got to go with something fairly sensible that balances memory footprint against the number of calls.
If you need to do FFT, it's normal to use a buffer size that is a power of two, to make the fast transform simpler. Size you need depends on the lowest frequency you are interested in.

Related

How can I hand over CD audio data read by cdparanoia to the ALSA player?

I'm trying to playback an audio CD by using cd_paranoia (from the cdio package) and to hand over the data read to the ALSA sound output. Buffered, of course. My issue is now the following: As stated in this example program, a call to paranoia_read () returns an int16_t* containing one sector (2,352 bytes) of audio data, which can be then cast into a char*.
The ALSA snd_pcm_writei () method, on the other hand needs a chunk of audio data in a char*, whose length is to be determined by using the snd_pcm_hw_params_get_period_size () method, which basically returns the count of bytes sent to the sound device, until it triggers an interrupt. Sell also this example sourcecode.
The two methods will almost for sure return different values 'cause an ALSA frame has a different size than a CD sector. This would mean I'd have to divide the data cd-paranoia delivers me somehow, so that they will fit into ALSA's frame structure. Or would it be sufficient just to stream the CD audio data into a big byte array (std::queue<char>) and then, step by step, read as many bytes from this array, so that I will get a complete ALSA "frame"?
Any hints? Thank you.
snd_pcm_writei() handles any number of frames.

Raw files aren't playing, or are playing incorrectly - Oboe (Android-ndk)

I'm attempting to Play a Raw (int16 PCM) encoded audio file in my android application. I've been following and reading through the Oboe documentation/samples to try to get one of my own audio files to play.
The audio file I need to play is roughly 6kb, or 1592 frames (stereo).
Either no sound plays, or sound/jitter plays on startup (with varying output - see bellow)
Troubleshooting
update
I have switched to floats for buffer queuing, instead of keeping everything to int16_t (and converting back to int16_t when done), although now I'm back to no sound.
The audio seems to be either not playing, or playing on startup (which is wrong). The sound should play after I press 'start'.
When the app was implemented with int16_t only, the premature sound was relative to how big the buffer size was. If the buffer size is smaller than the audio file, the sound is very fast and clipped (more drone-like at lower buffer sizes). Bigger than the Raw audio size it seems like it plays on a loop and gets quieter at higher buffer sizes. The sound would also get "softer" when the start button is pressed. I'm not even entirely sure this means the raw audio was playing, it could just be random nonsense jitters from Android.
When filling the buffers with floats, and converting to int16_t afterwards, no audio is played.
(I have tried running systrace, but I honestly don't know what I'm looking for)
The stream opens fine.
The buffer size fails to be ajusted in createPlaybackStream() (although somehow it still sets it to twice the burst size)
The stream starts fine.
The Raw resources are being loaded fine.
Implementation
What I am currently trying in the builder:
Setting the callback to this, or onAudioReady()
Setting the performance mode to LowLatency
Setting the sharing mode to Exclusive
Setting the buffer capacity to (anything bigger than my audio file frame count)
Setting the burst size (frames per call back) to (anything equal to or lower than the buffer capacity / 2)
I am using the Player class and the AAssetManager class from the Rhythm Game sample here: https://github.com/google/oboe/blob/master/samples/RhythmGame. I am using these classes to load my resources and play the sound. Player.renderAudio writes the audio data to the output buffer.
Here are the relevant methods from my audio engine:
void AudioEngine::createPlaybackStream() {
// // Load the RAW PCM data files into memory
std::shared_ptr<AAssetDataSource> soundSource(AAssetDataSource::newFromAssetManager(assetManager, "sound.raw", ChannelCount::Mono));
if (soundSource == nullptr) {
LOGE("Could not load source data for sound");
return;
}
sound = std::make_shared<Player>(soundSource);
AudioStreamBuilder builder;
builder.setCallback(this);
builder.setPerformanceMode(PerformanceMode::LowLatency);
builder.setSharingMode(SharingMode::Exclusive);
builder.setChannelCount(mChannelCount);
Result result = builder.openStream(&stream);
if (result == Result::OK && stream != nullptr) {
mSampleRate = stream->getSampleRate();
mFramesPerBurst = stream->getFramesPerBurst();
int channelCount = stream->getChannelCount();
if (channelCount != mChannelCount) {
LOGW("Requested %d channels but received %d", mChannelCount, channelCount);
}
// Set the buffer size to (burst size * 2) - this will give us the minimum possible latency while minimizing underruns
stream->setBufferSizeInFrames(mFramesPerBurst * 2);
if (setBufferSizeResult != Result::OK) {
LOGW("Failed to set buffer size. Error: %s", convertToText(setBufferSizeResult.error()));
}
// Start the stream - the dataCallback function will start being called
result = stream->requestStart();
if (result != Result::OK) {
LOGE("Error starting stream. %s", convertToText(result));
}
} else {
LOGE("Failed to create stream. Error: %s", convertToText(result));
}
}
DataCallbackResult AudioEngine::onAudioReady(AudioStream *audioStream, void *audioData, int32_t numFrames) {
int16_t *outputBuffer = static_cast<int16_t *>(audioData);
sound->renderAudio(outputBuffer, numFrames);
return DataCallbackResult::Continue;
}
// When the 'start' button is pressed, it calls this method with true
// There should be no sound on app start-up until this button is pressed
// Sound stops when 'stop' is pressed
setPlaying(bool isPlaying) {
sound->setPlaying(isPlaying);
}
Setting the buffer capacity to (anything bigger than my audio file frame count)
You don't need to set the buffer capacity. This will be set automatically at a reasonable level for you. Typically ~3000 frames. Note that buffer capacity is different from buffer size which defaults to 2*framesPerBurst.
Setting the burst size (frames per call back) to (anything equal to or lower than the buffer capacity / 2)
Again, don't do this. onAudioReady will be called every time the stream requires more audio data and numFrames indicates how many frames you should supply. If you override this value with a value which isn't an exact ratio of the audio device's native burst size (typical values are 128, 192 and 240 frames depending on underlying hardware) then you may get audio glitches.
I have switched to floats for buffer queuing
The format which you need to supply data in is determined by the audio stream and it is only known after the stream has been opened. You can get it by calling stream->getFormat().
In the RhythmGame sample (at least the version you're referring to) here's how the formats work:
Source file is converted from 16-bit to float inside AAssetDataSource::newFromAssetManager (floats are the preferred format for any kind of signal processing)
If the stream format is 16-bit then convert it back inside onAudioReady
1592 frames (stereo).
You said that your source was stereo but you're specifying it as mono here:
std::shared_ptr soundSource(AAssetDataSource::newFromAssetManager(assetManager, "sound.raw", ChannelCount::Mono));
Without doubt that will cause audio problems because the AAssetDataSource will have a value for numFrames which is double the correct value. This will cause audio glitches because half the time you'll be playing random parts of system memory.

How to convert a WAV file to RAW Audio in C++?

I have searched for an answer to this question for several hours. I have already removed the 44 byte header, and have transferred the data using an ofstream. The input stereo WAV file is 16 bit PCM at a 44.1k Hz sample rate.
int szm;
char* buff = new char[szm];
ifstream ssn(f_infile,ios::binary);
ssn.seekg(0,ssn.end);
szm = ssn.tellg();
ssn.seekg(0,ssn.beg);
ssn.read(buff,szm);
ssn.close();
ofstream sso(f_outfile,ios::binary);
for(int i =0; i < szm; i++)
{
if(i > 44)
{
word_w(file, buff[i],1);
word_w(file, 0-(buff[i]), 1);
}
}
sso.close();
file.close();
I got the size of the file, and read the data into a buffer. I know all a RAW data file is is binary data, and I thought this simple technique would work. However, I got mixed results.
This first one worked like a charm. It was the original sample I wanted to convert. It is a side by side comparison of the original WAV file [top] and the raw data [bottom] imported into Audacity at 44.1k Hz.
This next one distorted the right channel for some reason, and doubled the length of the file. It is also a stereo wave file, 16 bit PCM, 44.1k Hz sample rate.
This third one is completely distorted, and the length has increased even more than the previous one.
Why did it work on the first file, but not the other ones when they are all in the exact same file format (16 bit, 44.1k Hz sample rate, 2 channels)?

How to optimize time for writing a lot files (saving frames from video)

Background
I am currently working on a small application that grabs the RGB and depth map streams from a Microsoft Kinect device and saves them on disk for future analysis. Whn I run the program it shall output each frame as a separate image on disk.
The framerate of the Kinect is 30fps, but there are two sources so make this (approximately) 60fps. If I naively try to just save each frame when it arrives I will get dropped frames as is demonstrated by the bundled freenect/record.c application.
I rewrote the application to use one thread that grabs the frames from the device and pushes them to the back of a double ended list (std::deque). Then there are two threads that each pop frames from the front of the double ended list and saves the frames to disk.
When the recording is turned off, there is a potentially large number of frames left in the list that still need to be recorded, so before exiting we let the two save threads do their work until finished.
Now the actual problem
Although the problem of dropped frames is solved, writes to the filesystem are still quite slow. Is there any good way to speed up the file creation on disk?
Currently, the function dump_frame looks like this:
static void
dump_frame(struct frame* frame)
{
FILE* fp;
char filename[512]; /* plenty of space! */
sprintf(filename, "d-%f-%u.pgm", get_time, frame->timestamp);
fp = fopen(filename, "w");
fprintf(fp, "P5 %d %d 65535\n", frame->width, frame->height);
fwrite(frame->data, frame->size, 1, fp);
fclose(fp);
}
I am running Fedora 14 x64, so the solution only have to concern Linux as operating system.
You need to measure what takes time in your specific case. Is it creating multiple files or actually writing the image data to disk?
When I tested on my local system with OSX and an Intel SSD X25M 2G I noticed a huge variation in writes when writing multiple 1MB files vs writing 1 multi MB file. This is probably due to housekeeping of the filesystem and will vary depending on the file system you have.
To avoid the housekeeping you could site all your images to the same file and split it later. However, the data you are saving needs about 60MB sustained speed which is quite high.
An alternative if you have a lot of memory is to create a ram disk and store the images there first and later move them on to the persistent file system. With a 6GB ram disk you could store about 100 seconds of video.
A possible improvement would to explicitly set the buffering of fp to full using setvbuf:
const size_t BUFFER_SIZE = 1024 * 16;
fp = fopen(filename, "w");
setvbuf(fp, 0, _IOFBF, BUFFER_SIZE)); /* Must be immediately after the open. */
fprintf(fp, "P5 %d %d 65535\n", frame->width, frame->height);
fwrite(frame->data, frame->size, 1, fp);
fclose(fp);
You could profile using different buffer sizes to determine which provides the best performance.

How to use ALSA's snd_pcm_writei()?

Can someone explain how snd_pcm_writei
snd_pcm_sframes_t snd_pcm_writei(snd_pcm_t *pcm, const void *buffer,
snd_pcm_uframes_t size)
works?
I have used it like so:
for (int i = 0; i < 1; i++) {
f = snd_pcm_writei(handle, buffer, frames);
...
}
Full source code at http://pastebin.com/m2f28b578
Does this mean, that I shouldn't give snd_pcm_writei() the number of
all the frames in buffer, but only
sample_rate * latency = frames
?
So if I e.g. have:
sample_rate = 44100
latency = 0.5 [s]
all_frames = 100000
The number of frames that I should give to snd_pcm_writei() would be
sample_rate * latency = frames
44100*0.5 = 22050
and the number of iterations the for-loop should be?:
(int) 100000/22050 = 4; with frames=22050
and one extra, but only with
100000 mod 22050 = 11800
frames?
Is that how it works?
Louise
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html#gf13067c0ebde29118ca05af76e5b17a9
frames should be the number of frames (samples) you want to write from the buffer. Your system's sound driver will start transferring those samples to the sound card right away, and they will be played at a constant rate.
The latency is introduced in several places. There's latency from the data buffered by the driver while waiting to be transferred to the card. There's at least one buffer full of data that's being transferred to the card at any given moment, and there's buffering on the application side, which is what you seem to be concerned about.
To reduce latency on the application side you need to write the smallest buffer that will work for you. If your application performs a DSP task, that's typically one window's worth of data.
There's no advantage in writing small buffers in a loop - just go ahead and write everything in one go - but there's an important point to understand: to minimize latency, your application should write to the driver no faster than the driver is writing data to the sound card, or you'll end up piling up more data and accumulating more and more latency.
For a design that makes producing data in lockstep with the sound driver relatively easy, look at jack (http://jackaudio.org/) which is based on registering a callback function with the sound playback engine. In fact, you're probably just better off using jack instead of trying to do it yourself if you're really concerned about latency.
I think the reason for the "premature" device closure is that you need to call snd_pcm_drain(handle); prior to snd_pcm_close(handle); to ensure that all data is played before the device is closed.
I did some testing to determine why snd_pcm_writei() didn't seem to work for me using several examples I found in the ALSA tutorials and what I concluded was that the simple examples were doing a snd_pcm_close () before the sound device could play the complete stream sent it to it.
I set the rate to 11025, used a 128 byte random buffer, and for looped snd_pcm_writei() for 11025/128 for each second of sound. Two seconds required 86*2 calls snd_pcm_write() to get two seconds of sound.
In order to give the device sufficient time to convert the data to audio, I put used a for loop after the snd_pcm_writei() loop to delay execution of the snd_pcm_close() function.
After testing, I had to conclude that the sample code didn't supply enough samples to overcome the device latency before the snd_pcm_close function was called which implies that the close function has less latency than the snd_pcm_write() function.
If the ALSA driver's start threshold is not set properly (if in your case it is about 2s), then you will need to call snd_pcm_start() to start the data rendering immediately after snd_pcm_writei().
Or you may set appropriate threshold in the SW params of ALSA device.
ref:
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m___s_w___params.html