I'm attempting to use the SoundTouch C++ library for audio speed and pitch changes in an Android app. I have successfully pushed a Java byte[] array (from a .wav) through JNI, returned it, and played it back with an AudioTrack.
The next step is attempting to push a sample byte[] through the SoundTouch pipeline. I have dissected the source of the SoundStretch console program included with the library and have attempted to adapt it. I am using a stereo, 16-bit source for testing purposes.
With my current temporary setup I am ignoring the RIFF header and converting it along with the .wav data because the Java AudioTrack object does not need to read the header, it just plays raw PCM. Playing the raw byte[] without sending through SoundTouch just results in a small click where the header is.
After sending through the SoundTouch pipeline, I am playing back white noise where the beginning of the audio is supposed to be. I assume I am having a problem at the end of my write() function, where I am casting short's to signed chars. Here, the console app is writing to a file, instead of pushing to a vector:
int res = (int)fwrite(temp, 1, numBytes, fptr);
I have read the documentation for fwrite but I don't know enough about bit twiddling or audio processing to know what to do here to correctly get this information in a char[] instead of writing to a file. I know I am loosing information with the cast, but I am unsure of how to correct it.
In case anyone is extra motivated, the SoundStretch source can be found here:
extern "C" DLL_PUBLIC jbyteArray
(JNIEnv *env, jobject thiz, jbyteArray input, jint length)
const int BUFF_SIZE = 2048000;
SoundTouch soundTouch;
jboolean isCopy;
jbyte* ar = env->GetByteArrayElements(input, &isCopy);
signed char* cBufferIn = (signed char*)ar;
SAMPLETYPE* fBufferIn = new SAMPLETYPE[length];
vector<signed char> fBufferOut;
//converts the chars to floats per the SoundTouch console app.
convertInput16(cBufferIn, fBufferIn, length);
//channels, sampling rate, speed, pitch change
setup(&soundTouch, 2, 44100, 1.0, 0);
//transform floats from fBufferIn to fBufferout
process(&soundTouch, fBufferIn, fBufferOut, BUFF_SIZE);
signed char* res = &fBufferOut[0];
jbyteArray result = env->NewByteArray(length);
env->SetByteArrayRegion(result, 0, fBufferOut.size(), res);
LOGV("fBufferOut Size: %d", fBufferOut.size());
delete[] fBufferIn;
return result;
static void process(SoundTouch* soundTouch, SAMPLETYPE* fBufferIn, vector<signed char>& fBufferOut, int BUFF_SIZE)
int nSamples = BUFF_SIZE / 2; //2 bytes per sample, using 16 bit sample for testing
int buffSizeSamples = BUFF_SIZE / 2; //2 channel stereo
soundTouch->putSamples(fBufferIn, nSamples);
nSamples = soundTouch->receiveSamples(fBufferIn, buffSizeSamples);
write(fBufferIn, fBufferOut, nSamples / 2); //2 channels
} while (nSamples != 0);
nSamples = soundTouch->receiveSamples(fBufferIn, buffSizeSamples);
write(fBufferIn, fBufferOut, nSamples / 2);
LOGV("NUMBER OF SAMPLES: %d", nSamples);
} while (nSamples != 0);
static void write(const float *bufferIn, vector<signed char>& bufferOut, int numElems)
int numBytes;
int bytesPerSample;
if (numElems == 0) return;
bytesPerSample = 16 / 8; //16 bit test sample / bits in a byte
numBytes = numElems * bytesPerSample;
short *temp = (short*)getConvBuffer(numBytes);
switch (bytesPerSample)
case 2: //16 bit encoding per the SoundStretch console app
short *temp2 = (short *)temp;
for (int i = 0; i < numElems; i++)
short value = (short)saturate(bufferIn[i] * 32768.0f, -32768.0f, 32767.0f); //magic to me
temp2[i] = value; //works for little endian only.
for (int i = 0; i < numElems; ++i)
bufferOut.push_back((signed char)temp[i]); //I think my problem is here.
delete[] temp;
//bytesWritten += numBytes;

I just needed to get all the bits in char[]:
for (int i = 0; i < numElems; ++i)
bufferOut.push_back(temp[i] & 0xff);
bufferOut.push_back((temp[i] >> 8) & 0xff);


C++ - XAudio2 - Cracking sound when trying to play a continuous sine sound

Edit: Today I found out that I only encounter this problem when I use my headphones with a cord. It's not the headphones that's the problem, because the same headphones can also be used wireless, and then the problem is gone, similar to when I use my other wireless headphones. I would prefer working with a cord though, as it has a smaller delay. So I hope someone can help me with this mystery having provided this additional information.
I have this code to play a sine wave sound. I'm constantly hearing clicking sounds when I try to play this. I'm pretty sure it's because the playback of the buffers is not going perfectly, because when you change the value of l to a larger value (for instance 44100) the clicks are further apart. I think I have followed as accurate as possible the explanation of how to use the callbacks on the Microsoft website. I create three source voices that take turns playing: one is playing while the next is ready and the one after that is being made. I use a total time (tt) to put into the sin() function, so the first byte of the next buffer should align perfectly with the last byte of the current one.
Does anyone know what's going wrong?
P.S.: The many similar questions did not answer my question. In short: I'm not modifying the playing buffer (I don't think so at least); there should be no discontinuity at the border of one buffer to another; I'm not adjusting the frequency either during playback. So I don't think this is a duplicate.
#include <xaudio2.h>
#include <iostream>
#define PI 3.14159265358979323846f
#define l 4410 //0.1 seconds
IXAudio2MasteringVoice* pMasterVoice;
IXAudio2* pXAudio2;
IXAudio2SourceVoice* pSourceVoice[3];
BYTE pDataBuffer[2*l];
BYTE bytw[2];
int pow16[2];
float w[l];
int i, p;
float tt, ampl;
class VoiceCallback : public IXAudio2VoiceCallback {
HANDLE hBufferEndEvent;
VoiceCallback() : hBufferEndEvent(CreateEvent(NULL, FALSE, FALSE, NULL)) {}
~VoiceCallback() { CloseHandle(hBufferEndEvent); }
//Called when the voice has just finished playing a contiguous audio stream.
void STDMETHODCALLTYPE OnStreamEnd() {SetEvent(hBufferEndEvent);}
//Unused methods are stubs
void STDMETHODCALLTYPE OnVoiceProcessingPassEnd() {}
void STDMETHODCALLTYPE OnVoiceProcessingPassStart(UINT32 SamplesRequired) {}
void STDMETHODCALLTYPE OnBufferEnd(void * pBufferContext) {}
void STDMETHODCALLTYPE OnBufferStart(void * pBufferContext) {}
void STDMETHODCALLTYPE OnLoopEnd(void * pBufferContext) {}
void STDMETHODCALLTYPE OnVoiceError(void * pBufferContext, HRESULT Error) {}
VoiceCallback voiceCallback[3];
int main() {
CoInitializeEx(nullptr, COINIT_MULTITHREADED);
pXAudio2 = nullptr;
XAudio2Create(&pXAudio2, 0, XAUDIO2_DEFAULT_PROCESSOR);
pMasterVoice = nullptr;
tt = 0, p = 660, ampl = 2000;
pow16[0] = 16;
pow16[1] = 4096;
wfx = {0};
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nChannels = (WORD)1; //mono
wfx.nSamplesPerSec = (DWORD)44100; //samplerate
wfx.wBitsPerSample = (WORD)16; //16 bit (signed)
wfx.nBlockAlign = (WORD)2; //2 bytes per sample
wfx.nAvgBytesPerSec = (DWORD)88200; //samplerate*blockalign
wfx.cbSize = (WORD)0;
i = 0;
while (true) {
for (int t = 0; t < l; t++) {
tt = (float)(t + i*l); //total time
w[t] = sin(2.f*PI*tt/p)*ampl;
int intw = (int)w[t];
if (intw < 0) {
intw += 65535;
bytw[0] = 0; bytw[1] = 0;
for (int k = 1; k >= 0; k--) {
//turn integer into a little endian byte array
bytw[k] += (BYTE)(16*(intw/pow16[k]));
intw -= bytw[k]*(pow16[k]/16);
bytw[k] += (BYTE)(intw/(pow16[k]/16));
intw -= (intw/(pow16[k]/16))*pow16[k]/16;
pDataBuffer[2*t] = bytw[0];
pDataBuffer[2*t + 1] = bytw[1];
buffer.AudioBytes = 2*l; //number of bytes per buffer
buffer.pAudioData = pDataBuffer;
buffer.Flags = XAUDIO2_END_OF_STREAM;
if (i > 2) {
pSourceVoice[i%3] = nullptr;
pXAudio2->CreateSourceVoice(&pSourceVoice[i%3], &wfx, 0, XAUDIO2_DEFAULT_FREQ_RATIO, &voiceCallback[i%3], NULL, NULL);
if (i > 1) {
//wait until the current one is done playing
while (pSourceVoice[(i - 2)%3]->GetState(&state), state.BuffersQueued > 0) {
WaitForSingleObjectEx(voiceCallback[(i - 2)%3].hBufferEndEvent, INFINITE, TRUE);
if (i > 0) {
//play the next one while you're writing the one after that in the next iteration
pSourceVoice[(i - 1)%3]->Start(0);
If you want the sound to be 'looping', then submit multiple data-packets to the same source voice -or- set a loop value so it automatically restarts the existing audio packet. If you allow a source voice to run out of data, then you are going to hear the break in between depending upon the latency of your audio output system.
Furthermore, creating and destroy source voices is a relatively expensive operation, so doing it in a loop like this is not particular efficient.
See DirectX Tool Kit for Audio for a complete example of XAudio2 usage, as well as the latest version of the XAudio2 samples from the legacy DirectX SDK on GitHub.

nvJPEG: encode packed BGR

Well, my goal is simple -- trying to create a JPEG encoded image from buffer with packed/interleaved BGR data (could be RGB as well).
The NVidia docs contain an example, the proper image input is essentially described here.
So I tried the following:
#include <nvjpeg.h>
// very simple
typedef struct {
int width;
int height;
unsigned char *buffer;
unsigned long data_size;
} my_bitmap_type;
std::vector<unsigned char> BitmapToJpegCUDA(const my_bitmap_type *image)
nvjpegHandle_t nv_handle;
nvjpegEncoderState_t nv_enc_state;
nvjpegEncoderParams_t nv_enc_params;
cudaStream_t stream = NULL;
nvjpegStatus_t er;
nvjpegEncoderStateCreate(nv_handle, &nv_enc_state, stream);
nvjpegEncoderParamsCreate(nv_handle, &nv_enc_params, stream);
nvjpegImage_t nv_image;[0] = image->buffer;
nv_image.pitch[0] = 3 * image->width;
// Nope, that's for planar images!
//[0] = image->buffer;
//[1] = image->buffer + image->width * image->height;
//[2] = image->buffer + 2 * image->width * image->height;
// nv_image.pitch[0] = image->width;
// nv_image.pitch[1] = image->width;
// nv_image.pitch[2] = image->width;
er = nvjpegEncodeImage(nv_handle, nv_enc_state, nv_enc_params, &nv_image,
NVJPEG_INPUT_BGRI, image->width, image->height, stream);
LOG(ERROR) << "enc " << er;
size_t length = 0;
nvjpegEncodeRetrieveBitstream(nv_handle, nv_enc_state, NULL, &length, stream);
std::vector<unsigned char> jpeg(length);
nvjpegEncodeRetrieveBitstream(nv_handle, nv_enc_state,, &length, 0);
return jpeg;
The logger says that nvjpegEncodeImage just returns NVJPEG_STATUS_INVALID_PARAMETER, meaning nothing works. In case you suspect my_bitmap_type to be filled wrong, here's the similar turbojpeg-powered encoding:
#include <turbojpeg.h>
std::vector<unsigned char> BitmapToJpegBuffer(const my_bitmap_type *image)
std::vector<unsigned char> out_data(3 * image->width * image->height);
cudaError_t err = cudaMemcpy(, image->buffer, image->data_size, cudaMemcpyDeviceToHost);
if (cudaSuccess != err) {
LOG(ERROR) << "failed to copy CUDA memory: " << err;
tjhandle jpeg = tjInitCompress();
unsigned char *encoded_buf = nullptr;
long unsigned int encoded_sz = 0;
int tjres = tjCompress2(jpeg,,
image->width * 3,
if (tjres != 0) {
LOG(ERROR) << "jpeg compession failed!";
return {};
std::vector<unsigned char> result(encoded_buf, encoded_buf + encoded_sz);
return result;
... aaand it works pretty fine.
I'm desperate trying to figure out, what's missing in the code. Would gratefully appreciate any help or advice.
Using CentOS 7 / libnvjpeg-11-1.x86_64 (CUDA 11.1) / gcc 4.8.5
Okaaay, that's strange a bit, but after some time spent on trial and error it occured that NVidia docs lack an essential detail:
nvjpegEncoderStateCreate(nv_handle, &nv_enc_state, stream);
nvjpegEncoderParamsCreate(nv_handle, &nv_enc_params, stream);
// This has to be done, default params are not sufficient
nvjpegEncoderParamsSetSamplingFactors(nv_enc_params, NVJPEG_CSS_444, stream);
Although the docs clearly state that the default subsampling for JPEG compression is 4:4:4, encoding's not working with the default encoder params, subsampling has to be explicitly set.
So, that one line of code fixes everything.

Crash trying to convert PCM to MP3 using AudioKit

I am trying to convert in real time the audio from my iPhone mic to MP3.
I have it setup as such:
let format = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16,
sampleRate: 44100.0,
channels: 1,
interleaved: true)
mic.avAudioUnitOrNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount((format?.sampleRate)!), format: format, block: { (buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
let audioBuffer : AVAudioBuffer = buffer
self.audioProcessor?.processBuffer( audioBuffer.mutableAudioBufferList)
-(void)processBuffer: (AudioBufferList*) audioBufferList;
const int PCM_SIZE = 8192;
const int MP3_SIZE = 8192;
short int pcm_buffer[PCM_SIZE*2];
unsigned char mp3_buffer[MP3_SIZE];
int write = lame_encode_buffer_interleaved(mLame, pcm_buffer,(int*) audioBufferList->mBuffers[0].mData, mp3_buffer, MP3_SIZE);
//some other stuff
but I am getting a crash as soon as I get to the encoding portion.
I got it to stop crashing, but the audio quality is pretty harsh:
int size = audioBufferList->mBuffers[0].mDataByteSize / 2;
unsigned char mp3_buffer[size * 4];
int write = lame_encode_buffer(mLame, audioBufferList->mBuffers[0].mData, audioBufferList->mBuffers[0].mData, size, mp3_buffer, size*4);
There was a mismatch on the sampling rates between the source audio and the encoder.

RtAudio - Playing samples from wav file

I am currently trying to learn audio programming. My goal is to open a wav file, extract everything and play the samples with RtAudio.
I made a WaveLoader class which let's me extract the samples and meta data. I used this guide to do that and I checked that everything is correct with 010 editor. Here is a snapshot of 010 editor showing the structure and data.
And this is how i store the raw samples inside WaveLoader class:
data = new short[wave_data.payloadSize]; // - Allocates memory size of chunk size
if (!fread(data, 1, wave_data.payloadSize, sound_file))
throw ("Could not read wav data");
If i print out each sample I get : 1, -3, 4, -5 ... which seems ok.
The problem is that I am not sure how I can play them. This is what I've done:
* Using PortAudio to play samples
bool Player::Play()
RtAudio::StreamParameters oParameters; //, iParameters;
oParameters.deviceId = rt.getDefaultOutputDevice();
oParameters.firstChannel = 0;
oParameters.nChannels = mAudio.channels;
//iParameters.deviceId = rt.getDefaultInputDevice();
//iParameters.nChannels = 2;
unsigned int sampleRate = mAudio.sampleRate;
// Use a buffer of 512, we need to feed callback with 512 bytes everytime!
unsigned int nBufferFrames = 512;
RtAudio::StreamOptions options;
//&parameters, NULL, RTAUDIO_FLOAT64,sampleRate, &bufferFrames, &mCallback, (void *)&rawData
try {
rt.openStream(&oParameters, NULL, RTAUDIO_SINT16, sampleRate, &nBufferFrames, &mCallback, (void*) &mAudio);
catch (RtAudioError& e) {
std::cout << e.getMessage() << std::endl;
return false;
return true;
* RtAudio Callback
int mCallback(void * outputBuffer, void * inputBuffer, unsigned int nBufferFrames, double streamTime, RtAudioStreamStatus status, void * userData)
unsigned int i = 0;
short *out = static_cast<short*>(outputBuffer);
auto *data = static_cast<Player::AUDIO_DATA*>(userData);
// if i is more than our data size, we are done!
if (i > data->dataSize) return 1;
// First time callback is called data->ptr is 0, this means that the offset is 0
// Second time data->ptr is 1, this means offset = nBufferFrames (512) * 1 = 512
unsigned int offset = nBufferFrames * data->ptr++;
printf("Offset: %i\n", offset);
// First time callback is called offset is 0, we are starting from 0 and looping nBufferFrames (512) times, this gives us 512 bytes
// Second time, the offset is 1, we are starting from 512 bytes and looping to 512 + 512 = 1024
for (i = offset; i < offset + nBufferFrames; ++i)
short sample = data->rawData[i]; // Get raw sample from our struct
*out++ = sample; // Pass to output buffer for playback
printf("Current sample value: %i\n", sample); // this is showing 1, -3, 4, -5 check 010 editor
printf("Current time: %f\n", streamTime);
return 0;
Inside callback function, when I print out sample values I get exactly like 010 editor? Why isnt rtaudio playing them. What is wrong here? Do I need to normalize sample values to between -1 and 1?
The wav file I am trying to play:
Chunksize: 16
Format: 1
Channel: 1
SampleRate: 48000
ByteRate: 96000
BlockAlign: 2
BitPerSample: 16
Size of raw samples total: 2217044 bytes
For some reason it works when I pass input parameters to the openStream()
RtAudio::StreamParameters oParameters, iParameters;
oParameters.deviceId = rt.getDefaultOutputDevice();
oParameters.firstChannel = 0;
//oParameters.nChannels = mAudio.channels;
oParameters.nChannels = mAudio.channels;
iParameters.deviceId = rt.getDefaultInputDevice();
iParameters.nChannels = 1;
unsigned int sampleRate = mAudio.sampleRate;
// Use a buffer of 512, we need to feed callback with 512 bytes everytime!
unsigned int nBufferFrames = 512;
RtAudio::StreamOptions options;
//&parameters, NULL, RTAUDIO_FLOAT64,sampleRate, &bufferFrames, &mCallback, (void *)&rawData
try {
rt.openStream(&oParameters, &iParameters, RTAUDIO_SINT16, sampleRate, &nBufferFrames, &mCallback, (void*) &mAudio);
catch (RtAudioError& e) {
std::cout << e.getMessage() << std::endl;
return false;
return true;
It was so random when I was trying to playback my mic. I left input parameters and my wav file was suddenly playing. Is this is a bug?

Get audio samples from byte array

How to get data samples from QAudioInput
I found in this examples code from audioinput example code
void InputTest::readMore()
qint64 len = m_audioInput->bytesReady();
if(len > 4096)
len = 4096;
qint64 l = m_input->read(, len);
if(l > 0) {
m_audioInfo->write(m_buffer.constData(), l);
I understood that m_buffer contains audio data samples
but my audio processing library receives short samples
How I can convert this to short pointer
My audio library function like this
putSample( short *Sample, int numberOfSample)
I can get number of samples from
Q_ASSERT(m_format.sampleSize() % 8 == 0);
const int channelBytes = m_format.sampleSize() / 8;
const int sampleBytes = m_format.channels() * channelBytes;
Q_ASSERT(len % sampleBytes == 0);
const int numSamples = len / sampleBytes;
This page indicates read() is expecting a char* to store the data in. If you have set up the format of the audio device properly the data will indeed be 'segmented' as shorts in the char array and you can simply cast the char* to a short* before passing it to your library.