Restarting Streaming OpenAL Source? - c++

Why does my streaming OpenAL source somtimes go to AL_STOPPED state, forcing me to call alSourcePlay? This usually happens when I do not call send fast enough, i.e. in debug mode. Does the oal source automatically stop when it doesn't have enough queue buffers? How do I avoid that?
void send(audio_buffer audio) override
{
ALenum state;
alGetSourcei(source_, AL_SOURCE_STATE,&state);
if(state != AL_PLAYING)
alSourcePlay(source_); // This happens sometimes, usually when "send" is not called fast enough.
ALuint buffer = 0;
alSourceUnqueueBuffers(source_, 1, &buffer);
if(buffer)
{
alBufferData(buffer, AL_FORMAT_STEREO16, audio.data(), static_cast<ALsizei>(audio.size()*sizeof(int16_t)), 48000);
alSourceQueueBuffers(source_, 1, &buffer);
}
else
LOG << "Dropped audio.";
}

It sounds like your basic problem is that your audio stream is starved. There are a few options you can use to mitigate this, but they all have their own side effects:
(1) You can configure it to play from a looping buffer, to which you are supplying the relevant data. The downside to this is that it will audibly repeat itself if you starve the buffer too long, but it will have some better performance characteristics (fragmentation, etc).
(2) You can increase the send buffer size. This will only cover up small problems, and potentially increases the latency in dynamic content.
(3) Finally, you can thread the audio send operation, that way so long as the audio thread isn't starved, it can continue to send data in the background.
The high production / quality solution probably involes all three of these. Sorry for the lack of OpenAL specific terminology, but every audio system I've seen has these capabilities.

Related

Streaming images with ZMQ, message_t allocation takes too much time

I've been trying to find out how to stream images with zeromq (i'm using the cppzmq wrapper, but raw API answers are fine). Naively, I set up
zmq::context_t ctx(4);
zmq::socket_t pub_image_socket(ctx, zmq::socket_type::pub);
pub_image_socket.bind("tcp://127.0.0.1:8001");
...
while(true){
//render to image...
zmq::message_t image_message(image_size.x*image_size.y*element_size);
copy_image_to(image, image_message);
pub_image_socket.send(image_message, zmq::send_flags::none);
}
I thought that maybe some other part of the zmq chain was taking a bunch of time, so I did the following (in a debug build):
//zmq::context_t ctx(4);
//zmq::socket_t pub_image_socket(ctx, zmq::socket_type::pub);
//pub_image_socket.bind("tcp://127.0.0.1:8001");
...
while(true){
//render to image...
zmq::message_t image_message(image_size.x*image_size.y*element_size);
//commented out so only message creation left
//copy_image_to(image, image_message);
//pub_image_socket.send(image_message, zmq::send_flags::none);
}
there was no movement in the runtime (and the visual studio profiler could not actually tell me what the slow-down was).
So then I decided to do this:
zmq::context_t ctx(4);
zmq::socket_t pub_image_socket(ctx, zmq::socket_type::pub);
pub_image_socket.bind("tcp://127.0.0.1:8001");
...
while(true){
//render to image...
//zmq::message_t image_message(image_size.x*image_size.y*element_size);
//commented out so only message creation left
//copy_image_to(image, image_message);
//pub_image_socket.send(image_message, zmq::send_flags::none);
}
Commenting out the image_message more than doubled my runtime... I'm not sure what I can do to stop this though, theoretically I could stop this by re-using the allocated memory in a message_t but zmq prohibits this. Allocating megabytes per frame just takes too much time.
I'm starting to think data streaming is impossible for ZeroMQ due to this limitation, and I should just use asio and tcp/udp sockets instead. Is there a way to avoid this massive reallocation cost in ZMQ?
Use zmq_msg_init_data
http://api.zeromq.org/master:zmq-msg-init-data
You can provide the memory pointer/size of your already allocated memory and zeromq will take ownership (skipping the extra allocation). Once its been processed and is no longer needed it will call the associated free function where your own code can clean up.
I have used this approach in the past with a memory pool/ circular buffer and it worked well.

Simple Fast read-process

I want to decompress data from a file on a very slow device (read speed=1Mo/s). The decompression algorithm can at least perform three times this speed.
What is the fastest way to parallelize those tasks in C/C++ so that reading process cannot be slow down by decompression and so use maximum bandwidth.
I already tried two threads with regular pipes. But I don't know if it is the best solution. At least it is not a zero-copy algo.
My current algo is wrong because I cannot success to perform blocking IO on pipes. (tried fcntl or fread/fdopen)
My unparallelized program is very simple. Somethings like
while(remainingToRead > 0){
int nb = fread(buffer, 1, bufferSize);
decompress(buffer, nb, bufferOut);
nb -= remainingToRead
}
One solution here is to use separate threads to read and decompress and to use two buffers so that these operations can be overlapped.
Pseudo code:
Read thread:
while (not finished)
{
while (no buffers free)
wait on condition variable
fill next buffer
mark buffer in use
set condition variable for decompression thread
}
Decompression thread:
while (not finished)
{
while (no buffers full)
wait on condition variable
decompress next buffer
mark buffer free
set condition variable for read thread
}
Note that getting this to work involves getting quite a bit of detail right - multi-threaded programming is always tricky.

Raw files aren't playing, or are playing incorrectly - Oboe (Android-ndk)

I'm attempting to Play a Raw (int16 PCM) encoded audio file in my android application. I've been following and reading through the Oboe documentation/samples to try to get one of my own audio files to play.
The audio file I need to play is roughly 6kb, or 1592 frames (stereo).
Either no sound plays, or sound/jitter plays on startup (with varying output - see bellow)
Troubleshooting
update
I have switched to floats for buffer queuing, instead of keeping everything to int16_t (and converting back to int16_t when done), although now I'm back to no sound.
The audio seems to be either not playing, or playing on startup (which is wrong). The sound should play after I press 'start'.
When the app was implemented with int16_t only, the premature sound was relative to how big the buffer size was. If the buffer size is smaller than the audio file, the sound is very fast and clipped (more drone-like at lower buffer sizes). Bigger than the Raw audio size it seems like it plays on a loop and gets quieter at higher buffer sizes. The sound would also get "softer" when the start button is pressed. I'm not even entirely sure this means the raw audio was playing, it could just be random nonsense jitters from Android.
When filling the buffers with floats, and converting to int16_t afterwards, no audio is played.
(I have tried running systrace, but I honestly don't know what I'm looking for)
The stream opens fine.
The buffer size fails to be ajusted in createPlaybackStream() (although somehow it still sets it to twice the burst size)
The stream starts fine.
The Raw resources are being loaded fine.
Implementation
What I am currently trying in the builder:
Setting the callback to this, or onAudioReady()
Setting the performance mode to LowLatency
Setting the sharing mode to Exclusive
Setting the buffer capacity to (anything bigger than my audio file frame count)
Setting the burst size (frames per call back) to (anything equal to or lower than the buffer capacity / 2)
I am using the Player class and the AAssetManager class from the Rhythm Game sample here: https://github.com/google/oboe/blob/master/samples/RhythmGame. I am using these classes to load my resources and play the sound. Player.renderAudio writes the audio data to the output buffer.
Here are the relevant methods from my audio engine:
void AudioEngine::createPlaybackStream() {
// // Load the RAW PCM data files into memory
std::shared_ptr<AAssetDataSource> soundSource(AAssetDataSource::newFromAssetManager(assetManager, "sound.raw", ChannelCount::Mono));
if (soundSource == nullptr) {
LOGE("Could not load source data for sound");
return;
}
sound = std::make_shared<Player>(soundSource);
AudioStreamBuilder builder;
builder.setCallback(this);
builder.setPerformanceMode(PerformanceMode::LowLatency);
builder.setSharingMode(SharingMode::Exclusive);
builder.setChannelCount(mChannelCount);
Result result = builder.openStream(&stream);
if (result == Result::OK && stream != nullptr) {
mSampleRate = stream->getSampleRate();
mFramesPerBurst = stream->getFramesPerBurst();
int channelCount = stream->getChannelCount();
if (channelCount != mChannelCount) {
LOGW("Requested %d channels but received %d", mChannelCount, channelCount);
}
// Set the buffer size to (burst size * 2) - this will give us the minimum possible latency while minimizing underruns
stream->setBufferSizeInFrames(mFramesPerBurst * 2);
if (setBufferSizeResult != Result::OK) {
LOGW("Failed to set buffer size. Error: %s", convertToText(setBufferSizeResult.error()));
}
// Start the stream - the dataCallback function will start being called
result = stream->requestStart();
if (result != Result::OK) {
LOGE("Error starting stream. %s", convertToText(result));
}
} else {
LOGE("Failed to create stream. Error: %s", convertToText(result));
}
}
DataCallbackResult AudioEngine::onAudioReady(AudioStream *audioStream, void *audioData, int32_t numFrames) {
int16_t *outputBuffer = static_cast<int16_t *>(audioData);
sound->renderAudio(outputBuffer, numFrames);
return DataCallbackResult::Continue;
}
// When the 'start' button is pressed, it calls this method with true
// There should be no sound on app start-up until this button is pressed
// Sound stops when 'stop' is pressed
setPlaying(bool isPlaying) {
sound->setPlaying(isPlaying);
}
Setting the buffer capacity to (anything bigger than my audio file frame count)
You don't need to set the buffer capacity. This will be set automatically at a reasonable level for you. Typically ~3000 frames. Note that buffer capacity is different from buffer size which defaults to 2*framesPerBurst.
Setting the burst size (frames per call back) to (anything equal to or lower than the buffer capacity / 2)
Again, don't do this. onAudioReady will be called every time the stream requires more audio data and numFrames indicates how many frames you should supply. If you override this value with a value which isn't an exact ratio of the audio device's native burst size (typical values are 128, 192 and 240 frames depending on underlying hardware) then you may get audio glitches.
I have switched to floats for buffer queuing
The format which you need to supply data in is determined by the audio stream and it is only known after the stream has been opened. You can get it by calling stream->getFormat().
In the RhythmGame sample (at least the version you're referring to) here's how the formats work:
Source file is converted from 16-bit to float inside AAssetDataSource::newFromAssetManager (floats are the preferred format for any kind of signal processing)
If the stream format is 16-bit then convert it back inside onAudioReady
1592 frames (stereo).
You said that your source was stereo but you're specifying it as mono here:
std::shared_ptr soundSource(AAssetDataSource::newFromAssetManager(assetManager, "sound.raw", ChannelCount::Mono));
Without doubt that will cause audio problems because the AAssetDataSource will have a value for numFrames which is double the correct value. This will cause audio glitches because half the time you'll be playing random parts of system memory.

Qt - how to set audio playback starts from the beginning of buffer?

I have a buffer with a size for example 4096, and store data into it, if the buffer is full, it would be start from the beginning of the buffer. It seems like this works fine.
But I have problem with playing data from buffer.
QByteArray m_buffer;
QBuffer m_audioOutputIODevice;
QAudioOutput* m_audioOutput;
m_audioOutputIODevice.close();
m_audioOutputIODevice.setBuffer(&m_buffer);
m_audioOutputIODevice.open(QIODevice::ReadOnly);
m_audioOutput->start(&m_audioOutputIODevice);
Now I can play the sound from the buffer but when it reaches the end of buffer, play stops.
How could I change the code so when it reaches the end of buffer it would all start from the beginning?
Thank you very much
update codes:
connect(m_audioOutput,SIGNAL(stateChanged(QAudio::State)),SLOT(resetPlayBuffer(QAudio::State)));
void bufferPlayback::resetPlayBuffer (QAudio::State state)
{
if (state == QAudio::IdleState) {
m_audioOutputIODevice.close();
m_audioOutputIODevice.setBuffer(&m_buffer);
m_audioOutputIODevice.open(QIODevice::ReadOnly);
}
}
void stateChanged ( QAudio::State state ) <~ signal for when the player changes. Hook to a slot in your class, and just repeat the playback process when the state is stopped. Simple. One of the reasons I LOVE Qt.
AFAICT QAudioOutput doesn't have any built-in support for audio looping, so I think you would have to simulate audio-looping by sending fresh audio buffers to the QAudioOutput device periodically so that it would never run out of audio bytes to play.
I think the easiest way to do that would be to write your own subclass of QIODevice that pretends to be a very long (infinite?) file, but returns your looping samples over and over again when queried. Then pass your QIODevice-subclass-obect as an argument to QAudioOutput::start().

How to use ALSA's snd_pcm_writei()?

Can someone explain how snd_pcm_writei
snd_pcm_sframes_t snd_pcm_writei(snd_pcm_t *pcm, const void *buffer,
snd_pcm_uframes_t size)
works?
I have used it like so:
for (int i = 0; i < 1; i++) {
f = snd_pcm_writei(handle, buffer, frames);
...
}
Full source code at http://pastebin.com/m2f28b578
Does this mean, that I shouldn't give snd_pcm_writei() the number of
all the frames in buffer, but only
sample_rate * latency = frames
?
So if I e.g. have:
sample_rate = 44100
latency = 0.5 [s]
all_frames = 100000
The number of frames that I should give to snd_pcm_writei() would be
sample_rate * latency = frames
44100*0.5 = 22050
and the number of iterations the for-loop should be?:
(int) 100000/22050 = 4; with frames=22050
and one extra, but only with
100000 mod 22050 = 11800
frames?
Is that how it works?
Louise
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html#gf13067c0ebde29118ca05af76e5b17a9
frames should be the number of frames (samples) you want to write from the buffer. Your system's sound driver will start transferring those samples to the sound card right away, and they will be played at a constant rate.
The latency is introduced in several places. There's latency from the data buffered by the driver while waiting to be transferred to the card. There's at least one buffer full of data that's being transferred to the card at any given moment, and there's buffering on the application side, which is what you seem to be concerned about.
To reduce latency on the application side you need to write the smallest buffer that will work for you. If your application performs a DSP task, that's typically one window's worth of data.
There's no advantage in writing small buffers in a loop - just go ahead and write everything in one go - but there's an important point to understand: to minimize latency, your application should write to the driver no faster than the driver is writing data to the sound card, or you'll end up piling up more data and accumulating more and more latency.
For a design that makes producing data in lockstep with the sound driver relatively easy, look at jack (http://jackaudio.org/) which is based on registering a callback function with the sound playback engine. In fact, you're probably just better off using jack instead of trying to do it yourself if you're really concerned about latency.
I think the reason for the "premature" device closure is that you need to call snd_pcm_drain(handle); prior to snd_pcm_close(handle); to ensure that all data is played before the device is closed.
I did some testing to determine why snd_pcm_writei() didn't seem to work for me using several examples I found in the ALSA tutorials and what I concluded was that the simple examples were doing a snd_pcm_close () before the sound device could play the complete stream sent it to it.
I set the rate to 11025, used a 128 byte random buffer, and for looped snd_pcm_writei() for 11025/128 for each second of sound. Two seconds required 86*2 calls snd_pcm_write() to get two seconds of sound.
In order to give the device sufficient time to convert the data to audio, I put used a for loop after the snd_pcm_writei() loop to delay execution of the snd_pcm_close() function.
After testing, I had to conclude that the sample code didn't supply enough samples to overcome the device latency before the snd_pcm_close function was called which implies that the close function has less latency than the snd_pcm_write() function.
If the ALSA driver's start threshold is not set properly (if in your case it is about 2s), then you will need to call snd_pcm_start() to start the data rendering immediately after snd_pcm_writei().
Or you may set appropriate threshold in the SW params of ALSA device.
ref:
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m___s_w___params.html