I am looking to use 'Packet Loss Concealment' to conceal lost PCM frames in an audio stream. Unfortunately, I cannot find a library that is accessible without all the licensing restrictions and code bloat (...up for some suggestions though).
I have located some GPL code written by Steve Underwood for the Asterisk project which implements PLC. There are several limitations; although, as Steve suggests in his code, his algorithm can be applied to different streams with a bit of work. Currently, the code works with 8kHz 16-bit signed mono streams.
Variations of the code can be found through a simple search of Google Code Search.
My hope is that I can adapt the code to work with other streams. Initially, the goal is to adjust the algorithm for 8+ kHz, 16-bit signed, multichannel audio (all in a C++ environment). Eventually, I'm looking to make the code available under the GPL license in hopes that it could be of benefit to others...
Attached is the code below with my efforts. The code includes a main function that will "drop" a number of frames with a given probability. Unfortunately, the code does not quite work as expected. I'm receiving EXC_BAD_ACCESS when running in gdb, but I don't get a trace from gdb when using 'bt' command. Clearly, I'm trampimg on memory some where but not sure exactly where. When I comment out the amdf_pitch function, the code runs without crashing...
int main (int argc, char *argv[])
{
std::ifstream fin("C:\\cc32kHz.pcm");
if(!fin.is_open())
{
std::cout << "Failed to open input file" << std::endl;
return 1;
}
std::ofstream fout_repaired("C:\\cc32kHz_repaired.pcm");
if(!fout_repaired.is_open())
{
std::cout << "Failed to open output repaired file" << std::endl;
return 1;
}
std::ofstream fout_lossy("C:\\cc32kHz_lossy.pcm");
if(!fout_lossy.is_open())
{
std::cout << "Failed to open output repaired file" << std::endl;
return 1;
}
audio::PcmConcealer Concealer;
Concealer.Init(1, 16, 32000);
//Generate random numbers;
srand( time(NULL) );
int value = 0;
int probability = 5;
while(!fin.eof())
{
char arr[2];
fin.read(arr, 2);
//Generate's random number;
value = rand() % 100 + 1;
if(value <= probability)
{
char blank[2] = {0x00, 0x00};
fout_lossy.write(blank, 2);
//Fill in data;
Concealer.Fill((int16_t *)blank, 1);
fout_repaired.write(blank, 2);
}
else
{
//Write data to file;
fout_repaired.write(arr, 2);
fout_lossy.write(arr, 2);
Concealer.Receive((int16_t *)arr, 1);
}
}
fin.close();
fout_repaired.close();
fout_lossy.close();
return 0;
}
PcmConcealer.hpp
/*
* Code adapted from Steve Underwood of the Asterisk Project. This code inherits
* the same licensing restrictions as the Asterisk Project.
*/
#ifndef __PCMCONCEALER_HPP__
#define __PCMCONCEALER_HPP__
/**
1. What does it do?
The packet loss concealment module provides a suitable synthetic fill-in signal,
to minimise the audible effect of lost packets in VoIP applications. It is not
tied to any particular codec, and could be used with almost any codec which does not
specify its own procedure for packet loss concealment.
Where a codec specific concealment procedure exists, the algorithm is usually built
around knowledge of the characteristics of the particular codec. It will, therefore,
generally give better results for that particular codec than this generic concealer will.
2. How does it work?
While good packets are being received, the plc_rx() routine keeps a record of the trailing
section of the known speech signal. If a packet is missed, plc_fillin() is called to produce
a synthetic replacement for the real speech signal. The average mean difference function
(AMDF) is applied to the last known good signal, to determine its effective pitch.
Based on this, the last pitch period of signal is saved. Essentially, this cycle of speech
will be repeated over and over until the real speech resumes. However, several refinements
are needed to obtain smooth pleasant sounding results.
- The two ends of the stored cycle of speech will not always fit together smoothly. This can
cause roughness, or even clicks, at the joins between cycles. To soften this, the
1/4 pitch period of real speech preceeding the cycle to be repeated is blended with the last
1/4 pitch period of the cycle to be repeated, using an overlap-add (OLA) technique (i.e.
in total, the last 5/4 pitch periods of real speech are used).
- The start of the synthetic speech will not always fit together smoothly with the tail of
real speech passed on before the erasure was identified. Ideally, we would like to modify
the last 1/4 pitch period of the real speech, to blend it into the synthetic speech. However,
it is too late for that. We could have delayed the real speech a little, but that would
require more buffer manipulation, and hurt the efficiency of the no-lost-packets case
(which we hope is the dominant case). Instead we use a degenerate form of OLA to modify
the start of the synthetic data. The last 1/4 pitch period of real speech is time reversed,
and OLA is used to blend it with the first 1/4 pitch period of synthetic speech. The result
seems quite acceptable.
- As we progress into the erasure, the chances of the synthetic signal being anything like
correct steadily fall. Therefore, the volume of the synthesized signal is made to decay
linearly, such that after 50ms of missing audio it is reduced to silence.
- When real speech resumes, an extra 1/4 pitch period of sythetic speech is blended with the
start of the real speech. If the erasure is small, this smoothes the transition. If the erasure
is long, and the synthetic signal has faded to zero, the blending softens the start up of the
real signal, avoiding a kind of "click" or "pop" effect that might occur with a sudden onset.
3. How do I use it?
Before audio is processed, call plc_init() to create an instance of the packet loss
concealer. For each received audio packet that is acceptable (i.e. not including those being
dropped for being too late) call plc_rx() to record the content of the packet. Note this may
modify the packet a little after a period of packet loss, to blend real synthetic data smoothly.
When a real packet is not available in time, call plc_fillin() to create a sythetic substitute.
That's it!
*/
/*! Minimum allowed pitch (66 Hz) */
#define PLC_PITCH_MIN(SAMPLE_RATE) ((double)(SAMPLE_RATE) / 66.6)
/*! Maximum allowed pitch (200 Hz) */
#define PLC_PITCH_MAX(SAMPLE_RATE) ((SAMPLE_RATE) / 200)
/*! Maximum pitch OLA window */
//#define PLC_PITCH_OVERLAP_MAX(SAMPLE_RATE) ((PLC_PITCH_MIN(SAMPLE_RATE)) >> 2)
/*! The length over which the AMDF function looks for similarity (20 ms) */
#define CORRELATION_SPAN(SAMPLE_RATE) ((20 * (SAMPLE_RATE)) / 1000)
/*! History buffer length. The buffer must also be at leat 1.25 times
PLC_PITCH_MIN, but that is much smaller than the buffer needs to be for
the pitch assessment. */
//#define PLC_HISTORY_LEN(SAMPLE_RATE) ((CORRELATION_SPAN(SAMPLE_RATE)) + (PLC_PITCH_MIN(SAMPLE_RATE)))
namespace audio
{
typedef struct
{
/*! Consecutive erased samples */
int missing_samples;
/*! Current offset into pitch period */
int pitch_offset;
/*! Pitch estimate */
int pitch;
/*! Buffer for a cycle of speech */
float *pitchbuf;//[PLC_PITCH_MIN];
/*! History buffer */
short *history;//[PLC_HISTORY_LEN];
/*! Current pointer into the history buffer */
int buf_ptr;
} plc_state_t;
class PcmConcealer
{
public:
PcmConcealer();
~PcmConcealer();
void Init(int channels, int bit_depth, int sample_rate);
//Process a block of received audio samples.
int Receive(short amp[], int frames);
//Fill-in a block of missing audio samples.
int Fill(short amp[], int frames);
void Destroy();
private:
int amdf_pitch(int min_pitch, int max_pitch, short amp[], int channel_index, int frames);
void save_history(plc_state_t *s, short *buf, int channel_index, int frames);
void normalise_history(plc_state_t *s);
/** Holds the states of each of the channels **/
std::vector< plc_state_t * > ChannelStates;
int plc_pitch_min;
int plc_pitch_max;
int plc_pitch_overlap_max;
int correlation_span;
int plc_history_len;
int channel_count;
int sample_rate;
bool Initialized;
};
}
#endif
PcmConcealer.cpp
/*
* Code adapted from Steve Underwood of the Asterisk Project. This code inherits
* the same licensing restrictions as the Asterisk Project.
*/
#include "audio/PcmConcealer.hpp"
/* We do a straight line fade to zero volume in 50ms when we are filling in for missing data. */
#define ATTENUATION_INCREMENT 0.0025 /* Attenuation per sample */
#if !defined(INT16_MAX)
#define INT16_MAX (32767)
#define INT16_MIN (-32767-1)
#endif
#ifdef WIN32
inline double rint(double x)
{
return floor(x + 0.5);
}
#endif
inline short fsaturate(double damp)
{
if (damp > 32767.0)
return INT16_MAX;
if (damp < -32768.0)
return INT16_MIN;
return (short)rint(damp);
}
namespace audio
{
PcmConcealer::PcmConcealer() : Initialized(false)
{
}
PcmConcealer::~PcmConcealer()
{
Destroy();
}
void PcmConcealer::Init(int channels, int bit_depth, int sample_rate)
{
if(Initialized)
return;
if(channels <= 0 || bit_depth != 16)
return;
Initialized = true;
channel_count = channels;
this->sample_rate = sample_rate;
//////////////
double min = PLC_PITCH_MIN(sample_rate);
int imin = (int)min;
double max = PLC_PITCH_MAX(sample_rate);
int imax = (int)max;
plc_pitch_min = imin;
plc_pitch_max = imax;
plc_pitch_overlap_max = (plc_pitch_min >> 2);
correlation_span = CORRELATION_SPAN(sample_rate);
plc_history_len = correlation_span + plc_pitch_min;
//////////////
for(int i = 0; i < channel_count; i ++)
{
plc_state_t *t = new plc_state_t;
memset(t, 0, sizeof(plc_state_t));
t->pitchbuf = new float[plc_pitch_min];
t->history = new short[plc_history_len];
ChannelStates.push_back(t);
}
}
void PcmConcealer::Destroy()
{
if(!Initialized)
return;
while(ChannelStates.size())
{
plc_state_t *s = ChannelStates.at(0);
if(s)
{
if(s->history) delete s->history;
if(s->pitchbuf) delete s->pitchbuf;
memset(s, 0, sizeof(plc_state_t));
delete s;
}
ChannelStates.erase(ChannelStates.begin());
}
ChannelStates.clear();
Initialized = false;
}
//Process a block of received audio samples.
int PcmConcealer::Receive(short amp[], int frames)
{
if(!Initialized)
return 0;
int j = 0;
for(int k = 0; k < ChannelStates.size(); k++)
{
int i;
int overlap_len;
int pitch_overlap;
float old_step;
float new_step;
float old_weight;
float new_weight;
float gain;
plc_state_t *s = ChannelStates.at(k);
if (s->missing_samples)
{
/* Although we have a real signal, we need to smooth it to fit well
with the synthetic signal we used for the previous block */
/* The start of the real data is overlapped with the next 1/4 cycle
of the synthetic data. */
pitch_overlap = s->pitch >> 2;
if (pitch_overlap > frames)
pitch_overlap = frames;
gain = 1.0 - s->missing_samples * ATTENUATION_INCREMENT;
if (gain < 0.0)
gain = 0.0;
new_step = 1.0/pitch_overlap;
old_step = new_step*gain;
new_weight = new_step;
old_weight = (1.0 - new_step)*gain;
for (i = 0; i < pitch_overlap; i++)
{
int index = (i * channel_count) + j;
amp[index] = fsaturate(old_weight * s->pitchbuf[s->pitch_offset] + new_weight * amp[index]);
if (++s->pitch_offset >= s->pitch)
s->pitch_offset = 0;
new_weight += new_step;
old_weight -= old_step;
if (old_weight < 0.0)
old_weight = 0.0;
}
s->missing_samples = 0;
}
save_history(s, amp, j, frames);
j++;
}
return frames;
}
//Fill-in a block of missing audio samples.
int PcmConcealer::Fill(short amp[], int frames)
{
if(!Initialized)
return 0;
int j =0;
for(int k = 0; k < ChannelStates.size(); k++)
{
short *tmp = new short[plc_pitch_overlap_max];
int i;
int pitch_overlap;
float old_step;
float new_step;
float old_weight;
float new_weight;
float gain;
short *orig_amp;
int orig_len;
orig_amp = amp;
orig_len = frames;
plc_state_t *s = ChannelStates.at(k);
if (s->missing_samples == 0)
{
// As the gap in real speech starts we need to assess the last known pitch,
//and prepare the synthetic data we will use for fill-in
normalise_history(s);
s->pitch = amdf_pitch(plc_pitch_min, plc_pitch_max, s->history + plc_history_len - correlation_span - plc_pitch_min, j, correlation_span);
// We overlap a 1/4 wavelength
pitch_overlap = s->pitch >> 2;
// Cook up a single cycle of pitch, using a single of the real signal with 1/4
//cycle OLA'ed to make the ends join up nicely
// The first 3/4 of the cycle is a simple copy
for (i = 0; i < s->pitch - pitch_overlap; i++)
s->pitchbuf[i] = s->history[plc_history_len - s->pitch + i];
// The last 1/4 of the cycle is overlapped with the end of the previous cycle
new_step = 1.0/pitch_overlap;
new_weight = new_step;
for ( ; i < s->pitch; i++)
{
s->pitchbuf[i] = s->history[plc_history_len - s->pitch + i]*(1.0 - new_weight) + s->history[plc_history_len - 2*s->pitch + i]*new_weight;
new_weight += new_step;
}
// We should now be ready to fill in the gap with repeated, decaying cycles
// of what is in pitchbuf
// We need to OLA the first 1/4 wavelength of the synthetic data, to smooth
// it into the previous real data. To avoid the need to introduce a delay
// in the stream, reverse the last 1/4 wavelength, and OLA with that.
gain = 1.0;
new_step = 1.0/pitch_overlap;
old_step = new_step;
new_weight = new_step;
old_weight = 1.0 - new_step;
for (i = 0; i < pitch_overlap; i++)
{
int index = (i * channel_count) + j;
amp[index] = fsaturate(old_weight * s->history[plc_history_len - 1 - i] + new_weight * s->pitchbuf[i]);
new_weight += new_step;
old_weight -= old_step;
if (old_weight < 0.0)
old_weight = 0.0;
}
s->pitch_offset = i;
}
else
{
gain = 1.0 - s->missing_samples*ATTENUATION_INCREMENT;
i = 0;
}
for ( ; gain > 0.0 && i < frames; i++)
{
int index = (i * channel_count) + j;
amp[index] = s->pitchbuf[s->pitch_offset]*gain;
gain -= ATTENUATION_INCREMENT;
if (++s->pitch_offset >= s->pitch)
s->pitch_offset = 0;
}
for ( ; i < frames; i++)
{
int index = (i * channel_count) + j;
amp[i] = 0;
}
s->missing_samples += orig_len;
save_history(s, amp, j, frames);
delete [] tmp;
j++;
}
return frames;
}
void PcmConcealer::save_history(plc_state_t *s, short *buf, int channel_index, int frames)
{
if (frames >= plc_history_len)
{
/* Just keep the last part of the new data, starting at the beginning of the buffer */
//memcpy(s->history, buf + len - plc_history_len, sizeof(short)*plc_history_len);
int frames_to_copy = plc_history_len;
for(int i = 0; i < frames_to_copy; i ++)
{
int index = (channel_count * (i + frames - plc_history_len)) + channel_index;
s->history[i] = buf[index];
}
s->buf_ptr = 0;
return;
}
if (s->buf_ptr + frames > plc_history_len)
{
/* Wraps around - must break into two sections */
//memcpy(s->history + s->buf_ptr, buf, sizeof(short)*(plc_history_len - s->buf_ptr));
short *hist_ptr = s->history + s->buf_ptr;
int frames_to_copy = plc_history_len - s->buf_ptr;
for(int i = 0; i < frames_to_copy; i ++)
{
int index = (channel_count * i) + channel_index;
hist_ptr[i] = buf[index];
}
frames -= (plc_history_len - s->buf_ptr);
//memcpy(s->history, buf + (plc_history_len - s->buf_ptr), sizeof(short)*len);
frames_to_copy = frames;
for(int i = 0; i < frames_to_copy; i ++)
{
int index = (channel_count * (i + (plc_history_len - s->buf_ptr))) + channel_index;
s->history[i] = buf[index];
}
s->buf_ptr = frames;
return;
}
/* Can use just one section */
//memcpy(s->history + s->buf_ptr, buf, sizeof(short)*len);
short *hist_ptr = s->history + s->buf_ptr;
int frames_to_copy = frames;
for(int i = 0; i < frames_to_copy; i ++)
{
int index = (channel_count * i) + channel_index;
hist_ptr[i] = buf[index];
}
s->buf_ptr += frames;
}
void PcmConcealer::normalise_history(plc_state_t *s)
{
short *tmp = new short[plc_history_len];
if (s->buf_ptr == 0)
return;
memcpy(tmp, s->history, sizeof(short)*s->buf_ptr);
memcpy(s->history, s->history + s->buf_ptr, sizeof(short)*(plc_history_len - s->buf_ptr));
memcpy(s->history + plc_history_len - s->buf_ptr, tmp, sizeof(short)*s->buf_ptr);
s->buf_ptr = 0;
delete [] tmp;
}
int PcmConcealer::amdf_pitch(int min_pitch, int max_pitch, short amp[], int channel_index, int frames)
{
int i;
int j;
int acc;
int min_acc;
int pitch;
pitch = min_pitch;
min_acc = INT_MAX;
for (i = max_pitch; i <= min_pitch; i++)
{
acc = 0;
for (j = 0; j < frames; j++)
{
int index1 = (channel_count * (i+j)) + channel_index;
int index2 = (channel_count * j) + channel_index;
//std::cout << "Index 1: " << index1 << ", Index 2: " << index2 << std::endl;
acc += abs(amp[index1] - amp[index2]);
}
if (acc < min_acc)
{
min_acc = acc;
pitch = i;
}
}
std::cout << "Pitch: " << pitch << std::endl;
return pitch;
}
}
P.S. - I must confess that digital audio is not my forte...
Fixed the problem. The problem lay within the amdf_pitch function. There were some minor bugs elsewhere too (which have been repaired). As a result, the code will now run the testbed inserting blank for a given probability.
Using Audacity I have studied the raw PCM streams that have been created via the testbed. When a blank set of frames is encountered, smoothing occurs from received to blank as expected; however, when we change from blank to valid/received data, we gets clicks because the smoothing doesn't appear to be working during this phase. Any suggestions?
I have attached the updated code:
int main (int argc, char *argv[])
{
std::ifstream fin("C:\\cc32kHz.pcm", std::ios::binary);
if(!fin.is_open())
{
std::cout << "Failed to open input file" << std::endl;
return 1;
}
std::ofstream fout_repaired("C:\\cc32kHz_repaired.pcm", std::ios::binary);
if(!fout_repaired.is_open())
{
std::cout << "Failed to open output repaired file" << std::endl;
return 1;
}
std::ofstream fout_lossy("C:\\cc32kHz_lossy.pcm", std::ios::binary);
if(!fout_lossy.is_open())
{
std::cout << "Failed to open output repaired file" << std::endl;
return 1;
}
audio::PcmConcealer Concealer;
Concealer.Init(1, 16, 32000); //1-channel, 16-bit, 32kHz
//Generate random numbers;
srand( time(NULL) );
int value = 0;
int probability = 3;
int old_bytes_read = 0;
while(!fin.eof())
{
char arr[1024];
fin.read(arr, 1024);
int total_bytes_read = fin.tellg();
int bytes_read = total_bytes_read - old_bytes_read;
old_bytes_read = total_bytes_read;
if(!bytes_read)
continue; //Probably reached EOF;
//Generate's random number;
value = rand() % 100 + 1;
if(value <= probability)
{
char blank[1024] = {0x00, 0x00};
fout_lossy.write(blank, 1024);
//Fill in data;
Concealer.Fill((int16_t *)blank, 512);
fout_repaired.write(blank, 1024);
}
else
{
//Write data to file;
fout_repaired.write(arr, 1024);
fout_lossy.write(arr, 1024);
Concealer.Receive((int16_t *)arr, 512);
}
}
fin.close();
fout_repaired.close();
fout_lossy.close();
return 0;
}
PcmConcealer.hpp
/*
* PcmConcealer.hpp
* Code adapted from Steve Underwood of the Asterisk Project. This code inherits
* the same licensing restrictions as the Asterisk Project.
*/
#ifndef __PCMCONCEALER_HPP__
#define __PCMCONCEALER_HPP__
/**
1. What does it do?
The packet loss concealment module provides a suitable synthetic fill-in signal,
to minimise the audible effect of lost packets in VoIP applications. It is not
tied to any particular codec, and could be used with almost any codec which does not
specify its own procedure for packet loss concealment.
Where a codec specific concealment procedure exists, the algorithm is usually built
around knowledge of the characteristics of the particular codec. It will, therefore,
generally give better results for that particular codec than this generic concealer will.
2. How does it work?
While good packets are being received, the plc_rx() routine keeps a record of the trailing
section of the known speech signal. If a packet is missed, plc_fillin() is called to produce
a synthetic replacement for the real speech signal. The average mean difference function
(AMDF) is applied to the last known good signal, to determine its effective pitch.
Based on this, the last pitch period of signal is saved. Essentially, this cycle of speech
will be repeated over and over until the real speech resumes. However, several refinements
are needed to obtain smooth pleasant sounding results.
- The two ends of the stored cycle of speech will not always fit together smoothly. This can
cause roughness, or even clicks, at the joins between cycles. To soften this, the
1/4 pitch period of real speech preceeding the cycle to be repeated is blended with the last
1/4 pitch period of the cycle to be repeated, using an overlap-add (OLA) technique (i.e.
in total, the last 5/4 pitch periods of real speech are used).
- The start of the synthetic speech will not always fit together smoothly with the tail of
real speech passed on before the erasure was identified. Ideally, we would like to modify
the last 1/4 pitch period of the real speech, to blend it into the synthetic speech. However,
it is too late for that. We could have delayed the real speech a little, but that would
require more buffer manipulation, and hurt the efficiency of the no-lost-packets case
(which we hope is the dominant case). Instead we use a degenerate form of OLA to modify
the start of the synthetic data. The last 1/4 pitch period of real speech is time reversed,
and OLA is used to blend it with the first 1/4 pitch period of synthetic speech. The result
seems quite acceptable.
- As we progress into the erasure, the chances of the synthetic signal being anything like
correct steadily fall. Therefore, the volume of the synthesized signal is made to decay
linearly, such that after 50ms of missing audio it is reduced to silence.
- When real speech resumes, an extra 1/4 pitch period of sythetic speech is blended with the
start of the real speech. If the erasure is small, this smoothes the transition. If the erasure
is long, and the synthetic signal has faded to zero, the blending softens the start up of the
real signal, avoiding a kind of "click" or "pop" effect that might occur with a sudden onset.
3. How do I use it?
Before audio is processed, call plc_init() to create an instance of the packet loss
concealer. For each received audio packet that is acceptable (i.e. not including those being
dropped for being too late) call plc_rx() to record the content of the packet. Note this may
modify the packet a little after a period of packet loss, to blend real synthetic data smoothly.
When a real packet is not available in time, call plc_fillin() to create a sythetic substitute.
That's it!
*/
/*! Minimum allowed pitch (66 Hz) */
#define PLC_PITCH_MIN(SAMPLE_RATE) ((double)(SAMPLE_RATE) / 66.6)
/*! Maximum allowed pitch (200 Hz) */
#define PLC_PITCH_MAX(SAMPLE_RATE) ((SAMPLE_RATE) / 200)
/*! Maximum pitch OLA window */
//#define PLC_PITCH_OVERLAP_MAX(SAMPLE_RATE) ((PLC_PITCH_MIN(SAMPLE_RATE)) >> 2)
/*! The length over which the AMDF function looks for similarity (20 ms) */
#define CORRELATION_SPAN(SAMPLE_RATE) ((20 * (SAMPLE_RATE)) / 1000)
/*! History buffer length. The buffer must also be at leat 1.25 times
PLC_PITCH_MIN, but that is much smaller than the buffer needs to be for
the pitch assessment. */
//#define PLC_HISTORY_LEN(SAMPLE_RATE) ((CORRELATION_SPAN(SAMPLE_RATE)) + (PLC_PITCH_MIN(SAMPLE_RATE)))
namespace audio
{
typedef struct
{
/*! Consecutive erased samples */
int missing_samples;
/*! Current offset into pitch period */
int pitch_offset;
/*! Pitch estimate */
int pitch;
/*! Buffer for a cycle of speech */
float *pitchbuf;//[PLC_PITCH_MIN];
/*! History buffer */
short *history;//[PLC_HISTORY_LEN];
/*! Current pointer into the history buffer */
int buf_ptr;
} plc_state_t;
class PcmConcealer
{
public:
PcmConcealer();
~PcmConcealer();
void Init(int channels, int bit_depth, int sample_rate);
//Process a block of received audio samples.
int Receive(short amp[], int frames);
//Fill-in a block of missing audio samples.
int Fill(short amp[], int frames);
void Destroy();
private:
inline int amdf_pitch(int min_pitch, int max_pitch, short amp[], int frames);
void save_history(plc_state_t *s, short *buf, int channel_index, int frames);
void normalise_history(plc_state_t *s);
/** Holds the states of each of the channels **/
std::vector< plc_state_t * > ChannelStates;
int plc_pitch_min;
int plc_pitch_max;
int plc_pitch_overlap_max;
int correlation_span;
int plc_history_len;
int channel_count;
int sample_rate;
bool Initialized;
};
}
#endif
PcmConcealer.cpp
/*
* PcmConcealer.cpp
*
* Code adapted from Steve Underwood of the Asterisk Project. This code inherits
* the same licensing restrictions as the Asterisk Project.
*/
#include "audio/PcmConcealer.hpp"
/* We do a straight line fade to zero volume in 50ms when we are filling in for missing data. */
#define ATTENUATION_INCREMENT 0.0025 /* Attenuation per sample */
#ifndef INT16_MAX
#define INT16_MAX (32767)
#endif
#ifndef INT16_MIN
#define INT16_MIN (-32767-1)
#endif
#ifdef WIN32
inline double rint(double x)
{
return floor(x + 0.5);
}
#endif
inline short fsaturate(double damp)
{
if (damp > 32767.0)
return INT16_MAX;
if (damp < -32768.0)
return INT16_MIN;
return (short)rint(damp);
}
namespace audio
{
PcmConcealer::PcmConcealer() : Initialized(false)
{
}
PcmConcealer::~PcmConcealer()
{
Destroy();
}
void PcmConcealer::Init(int channels, int bit_depth, int sample_rate)
{
if(Initialized)
return;
if(channels <= 0 || bit_depth != 16)
return;
Initialized = true;
channel_count = channels;
this->sample_rate = sample_rate;
//////////////
double min = PLC_PITCH_MIN(sample_rate);
int imin = (int)min;
double max = PLC_PITCH_MAX(sample_rate);
int imax = (int)max;
plc_pitch_min = imin;
plc_pitch_max = imax;
plc_pitch_overlap_max = (plc_pitch_min >> 2);
correlation_span = CORRELATION_SPAN(sample_rate);
plc_history_len = correlation_span + plc_pitch_min;
//////////////
for(int i = 0; i < channel_count; i ++)
{
plc_state_t *t = new plc_state_t;
memset(t, 0, sizeof(plc_state_t));
t->pitchbuf = new float[plc_pitch_min];
t->history = new short[plc_history_len];
ChannelStates.push_back(t);
}
}
void PcmConcealer::Destroy()
{
if(!Initialized)
return;
while(ChannelStates.size())
{
plc_state_t *s = ChannelStates.at(0);
if(s)
{
if(s->history) delete s->history;
if(s->pitchbuf) delete s->pitchbuf;
memset(s, 0, sizeof(plc_state_t));
delete s;
}
ChannelStates.erase(ChannelStates.begin());
}
ChannelStates.clear();
Initialized = false;
}
//Process a block of received audio samples.
int PcmConcealer::Receive(short amp[], int frames)
{
if(!Initialized)
return 0;
int j = 0;
for(int k = 0; k < ChannelStates.size(); k++)
{
int i;
int overlap_len;
int pitch_overlap;
float old_step;
float new_step;
float old_weight;
float new_weight;
float gain;
plc_state_t *s = ChannelStates.at(k);
if (s->missing_samples)
{
/* Although we have a real signal, we need to smooth it to fit well
with the synthetic signal we used for the previous block */
/* The start of the real data is overlapped with the next 1/4 cycle
of the synthetic data. */
pitch_overlap = s->pitch >> 2;
if (pitch_overlap > frames)
pitch_overlap = frames;
gain = 1.0 - s->missing_samples * ATTENUATION_INCREMENT;
if (gain < 0.0)
gain = 0.0;
new_step = 1.0/pitch_overlap;
old_step = new_step*gain;
new_weight = new_step;
old_weight = (1.0 - new_step)*gain;
for (i = 0; i < pitch_overlap; i++)
{
int index = (i * channel_count) + j;
amp[index] = fsaturate(old_weight * s->pitchbuf[s->pitch_offset] + new_weight * amp[index]);
if (++s->pitch_offset >= s->pitch)
s->pitch_offset = 0;
new_weight += new_step;
old_weight -= old_step;
if (old_weight < 0.0)
old_weight = 0.0;
}
s->missing_samples = 0;
}
save_history(s, amp, j, frames);
j++;
}
return frames;
}
//Fill-in a block of missing audio samples.
int PcmConcealer::Fill(short amp[], int frames)
{
if(!Initialized)
return 0;
int j =0;
for(int k = 0; k < ChannelStates.size(); k++)
{
short *tmp = new short[plc_pitch_overlap_max];
int i;
int pitch_overlap;
float old_step;
float new_step;
float old_weight;
float new_weight;
float gain;
short *orig_amp;
int orig_len;
orig_amp = amp;
orig_len = frames;
plc_state_t *s = ChannelStates.at(k);
if (s->missing_samples == 0)
{
// As the gap in real speech starts we need to assess the last known pitch,
//and prepare the synthetic data we will use for fill-in
normalise_history(s);
s->pitch = amdf_pitch(plc_pitch_min, plc_pitch_max, s->history + (plc_history_len - correlation_span - plc_pitch_min), correlation_span);
// We overlap a 1/4 wavelength
pitch_overlap = s->pitch >> 2;
// Cook up a single cycle of pitch, using a single of the real signal with 1/4
//cycle OLA'ed to make the ends join up nicely
// The first 3/4 of the cycle is a simple copy
for (i = 0; i < s->pitch - pitch_overlap; i++)
s->pitchbuf[i] = s->history[plc_history_len - s->pitch + i];
// The last 1/4 of the cycle is overlapped with the end of the previous cycle
new_step = 1.0/pitch_overlap;
new_weight = new_step;
for ( ; i < s->pitch; i++)
{
s->pitchbuf[i] = s->history[plc_history_len - s->pitch + i]*(1.0 - new_weight) + s->history[plc_history_len - 2*s->pitch + i]*new_weight;
new_weight += new_step;
}
// We should now be ready to fill in the gap with repeated, decaying cycles
// of what is in pitchbuf
// We need to OLA the first 1/4 wavelength of the synthetic data, to smooth
// it into the previous real data. To avoid the need to introduce a delay
// in the stream, reverse the last 1/4 wavelength, and OLA with that.
gain = 1.0;
new_step = 1.0/pitch_overlap;
old_step = new_step;
new_weight = new_step;
old_weight = 1.0 - new_step;
for (i = 0; (i < pitch_overlap) && (i < frames); i++)
{
int index = (i * channel_count) + j;
amp[index] = fsaturate(old_weight * s->history[plc_history_len - 1 - i] + new_weight * s->pitchbuf[i]);
new_weight += new_step;
old_weight -= old_step;
if (old_weight < 0.0)
old_weight = 0.0;
}
s->pitch_offset = i;
}
else
{
gain = 1.0 - s->missing_samples*ATTENUATION_INCREMENT;
i = 0;
}
for ( ; gain > 0.0 && i < frames; i++)
{
int index = (i * channel_count) + j;
amp[index] = s->pitchbuf[s->pitch_offset]*gain;
gain -= ATTENUATION_INCREMENT;
if (++s->pitch_offset >= s->pitch)
s->pitch_offset = 0;
}
for ( ; i < frames; i++)
{
int index = (i * channel_count) + j;
amp[i] = 0;
}
s->missing_samples += orig_len;
save_history(s, amp, j, frames);
delete [] tmp;
j++;
}
return frames;
}
void PcmConcealer::save_history(plc_state_t *s, short *buf, int channel_index, int frames)
{
if (frames >= plc_history_len)
{
/* Just keep the last part of the new data, starting at the beginning of the buffer */
//memcpy(s->history, buf + len - plc_history_len, sizeof(short)*plc_history_len);
int frames_to_copy = plc_history_len;
for(int i = 0; i < frames_to_copy; i ++)
{
int index = (channel_count * (i + frames - plc_history_len)) + channel_index;
s->history[i] = buf[index];
}
s->buf_ptr = 0;
return;
}
if (s->buf_ptr + frames > plc_history_len)
{
/* Wraps around - must break into two sections */
//memcpy(s->history + s->buf_ptr, buf, sizeof(short)*(plc_history_len - s->buf_ptr));
short *hist_ptr = s->history + s->buf_ptr;
int frames_to_copy = plc_history_len - s->buf_ptr;
for(int i = 0; i < frames_to_copy; i ++)
{
int index = (channel_count * i) + channel_index;
hist_ptr[i] = buf[index];
}
frames -= (plc_history_len - s->buf_ptr);
//memcpy(s->history, buf + (plc_history_len - s->buf_ptr), sizeof(short)*len);
frames_to_copy = frames;
for(int i = 0; i < frames_to_copy; i ++)
{
int index = (channel_count * (i + (plc_history_len - s->buf_ptr))) + channel_index;
s->history[i] = buf[index];
}
s->buf_ptr = frames;
return;
}
/* Can use just one section */
//memcpy(s->history + s->buf_ptr, buf, sizeof(short)*len);
short *hist_ptr = s->history + s->buf_ptr;
int frames_to_copy = frames;
for(int i = 0; i < frames_to_copy; i ++)
{
int index = (channel_count * i) + channel_index;
hist_ptr[i] = buf[index];
}
s->buf_ptr += frames;
}
void PcmConcealer::normalise_history(plc_state_t *s)
{
short *tmp = new short[plc_history_len];
if (s->buf_ptr == 0)
return;
memcpy(tmp, s->history, sizeof(short)*s->buf_ptr);
memcpy(s->history, s->history + s->buf_ptr, sizeof(short)*(plc_history_len - s->buf_ptr));
memcpy(s->history + plc_history_len - s->buf_ptr, tmp, sizeof(short)*s->buf_ptr);
s->buf_ptr = 0;
delete [] tmp;
}
int PcmConcealer::amdf_pitch(int min_pitch, int max_pitch, short amp[], int frames)
{
int i;
int j;
int acc;
int min_acc;
int pitch;
pitch = min_pitch;
min_acc = INT_MAX;
for (i = max_pitch; i <= min_pitch; i++)
{
acc = 0;
/*for (j = 0; j < frames; j++)
{
int index1 = (channel_count * (i+j)) + channel_index;
int index2 = (channel_count * j) + channel_index;
//std::cout << "Index 1: " << index1 << ", Index 2: " << index2 << std::endl;
acc += abs(amp[index1] - amp[index2]);
}*/
for (j = 0; j < frames; j++)
acc += abs(amp[i + j] - amp[j]);
if (acc < min_acc)
{
min_acc = acc;
pitch = i;
}
}
//std::cout << "Pitch: " << pitch << std::endl;
return pitch;
}
}
Related
I am trying to make brown noise in C++, and to play the sound of it. You can hear the brown noise, but I constantly hear clicking in the background and I don't know why.
Here is my code:
#include <xaudio2.h>
#include <iostream>
#include <random>
using namespace std;
#define PI2 6.28318530717958647692f
#define l 2205 //0.05 seconds
bool init();
bool loop();
random_device rd;
mt19937 gen(rd());
uniform_real_distribution<> dis(-.01, .01);
IXAudio2MasteringVoice* pMasterVoice;
IXAudio2* pXAudio2;
IXAudio2SourceVoice* pSourceVoice;
XAUDIO2_BUFFER buffer;
WAVEFORMATEX wfx;
XAUDIO2_VOICE_STATE state;
BYTE pDataBuffer[2*l];
BYTE bytw[2];
int pow16[2];
float w[l];
int frame, p;
float tt, ampl;
bool loop() {
w[0] = w[l - 1] + dis(gen)*ampl;
for (int t = 1; t < l; t++) {
tt = (float)(t + frame*l); //total time
w[t] = w[t - 1] + dis(gen)*ampl;
if (w[t] > ampl) {
cout << "upper edge ";
w[t] = ampl - fmod(w[t], ampl);
}
if (w[t] < -ampl) {
cout << "lower edge ";
w[t] = -fmod(w[t], ampl) - ampl;
}
//w[t] = sin(PI2*tt/p)*ampl;
//w[t] = (fmod(tt/p, 1) < .5 ? ampl : -ampl)*(.5f - 2.f*fmod(tt/p, .5f));
int intw = (int)w[t];
if (intw < 0) {
intw += 65535;
}
bytw[0] = 0; bytw[1] = 0;
for (int k = 1; k >= 0; k--) {
//turn integer into a little endian byte array
bytw[k] += (BYTE)(16*(intw/pow16[k]));
intw -= bytw[k]*(pow16[k]/16);
bytw[k] += (BYTE)(intw/(pow16[k]/16));
intw -= (intw/(pow16[k]/16))*pow16[k]/16;
}
pDataBuffer[2*t] = bytw[0];
pDataBuffer[2*t + 1] = bytw[1];
}
cout << endl << endl;
if (frame > 1) {
//wait until the current one is done playing
while (pSourceVoice->GetState(&state), state.BuffersQueued > 1) {}
}
buffer.AudioBytes = 2*l; //number of bytes per buffer
buffer.pAudioData = pDataBuffer;
buffer.Flags = XAUDIO2_END_OF_STREAM;
pSourceVoice->SubmitSourceBuffer(&buffer);
if (frame == 1) {
pSourceVoice->Start(0, 0);
}
frame++;
return true;
}
bool init() {
CoInitializeEx(nullptr, COINIT_MULTITHREADED);
pXAudio2 = nullptr;
XAudio2Create(&pXAudio2, 0, XAUDIO2_DEFAULT_PROCESSOR);
pMasterVoice = nullptr;
pXAudio2->CreateMasteringVoice(&pMasterVoice);
wfx = {0};
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nChannels = (WORD)1; //mono
wfx.nSamplesPerSec = (DWORD)44100; //samplerate
wfx.wBitsPerSample = (WORD)16; //16 bit (signed)
wfx.nBlockAlign = (WORD)2; //2 bytes per sample
wfx.nAvgBytesPerSec = (DWORD)88200; //samplerate*blockalign
wfx.cbSize = (WORD)0;
pSourceVoice = nullptr;
pXAudio2->CreateSourceVoice(&pSourceVoice, &wfx);
tt = 0, p = 1000, ampl = 10000;
pow16[0] = 16;
pow16[1] = 4096;
frame = 0;
return true;
}
int main() {
if (!init()) return 1;
cout << "start";
while (loop()) {}
return 0;
}
The line before the for-loop in loop() is to make sure that the first element nicely attaches itself to the last element of the previous iteration.
To make sure that w doesn't go over ampl or under -ampl, I have added a couple lines that make them bounce back, and I make it output "upper edge" or "lower edge" respectively so that you know when this is happening. As you notice, the clicking also happens when the w is not near the edges.
As a test to make sure it isn't because of XAudio2 being implemented wrongly, you can comment the first line in loop() that defines the first element of w; make the for-loop (in the next line) start from 0; comment the lines that create the brown noise; and uncomment one of the two lines after that: the first line to hear a sine wave sound, the second line to hear a square wave sound (both with a frequency of about 44100/1000 = 44.1 Hz, which you can change around by changing how p is initialized in init()). You will (hopefully) hear a clean sine/square wave sound.
So what is going wrong?
You have two issues in your code:
You only have a single buffer therefore its near impossible to submit a new buffer for playing quickly enough after the buffer stops playing for there to not be a gap between buffers. You are also modifying the buffer data whilst it is being played which will corrupt the output. You should use multiple buffers. With enough buffers this would also allow you to add some short sleeps to your while loop which is checking BuffersQueued to reduce the CPU usage.
You never set pDataBuffer[0] or pDataBuffer[1] so they will always be 0.
This code works:
#include <xaudio2.h>
#include <iostream>
#include <random>
#include <array>
#include <thread>
using namespace std;
#define PI2 6.28318530717958647692f
#define l 2205 //0.05 seconds
bool init();
bool loop();
random_device rd;
mt19937 gen(rd());
uniform_real_distribution<> dis(-.01, .01);
IXAudio2MasteringVoice* pMasterVoice;
IXAudio2* pXAudio2;
IXAudio2SourceVoice* pSourceVoice;
const size_t bufferCount = 64;
std::array<XAUDIO2_BUFFER, bufferCount> buffers;
WAVEFORMATEX wfx;
XAUDIO2_VOICE_STATE state;
std::array<std::array<BYTE,2 * l>, bufferCount> pDataBuffers;
BYTE bytw[2];
int pow16[2];
float w[l];
int frame, p;
float tt, ampl;
bool loop() {
float prevW = w[l - 1];
auto& pDataBuffer = pDataBuffers[frame & (bufferCount-1)];
auto& buffer = buffers[frame & (bufferCount - 1)];
for (int t = 0; t < l; t++) {
tt = (float)(t + frame * l); //total time
w[t] = prevW + dis(gen) * ampl;
if (w[t] > ampl) {
//cout << "upper edge ";
w[t] = ampl - fmod(w[t], ampl);
}
if (w[t] < -ampl) {
//cout << "lower edge ";
w[t] = -fmod(w[t], ampl) - ampl;
}
//w[t] = sin(PI2*tt/p)*ampl;
//w[t] = (fmod(tt/p, 1) < .5 ? ampl : -ampl)*(.5f - 2.f*fmod(tt/p, .5f));
prevW = w[t];
int intw = (int)w[t];
if (intw < 0) {
intw += 65535;
}
bytw[0] = 0; bytw[1] = 0;
for (int k = 1; k >= 0; k--) {
//turn integer into a little endian byte array
bytw[k] += (BYTE)(16 * (intw / pow16[k]));
intw -= bytw[k] * (pow16[k] / 16);
bytw[k] += (BYTE)(intw / (pow16[k] / 16));
intw -= (intw / (pow16[k] / 16)) * pow16[k] / 16;
}
pDataBuffer[2 * t] = bytw[0];
pDataBuffer[2 * t + 1] = bytw[1];
}
//cout << endl << endl;
if (frame > 1) {
//wait until the current one is done playing
while (pSourceVoice->GetState(&state), state.BuffersQueued > 1) { std::this_thread::sleep_for(std::chrono::milliseconds(1); }
}
buffer.AudioBytes = 2 * l; //number of bytes per buffer
buffer.pAudioData = pDataBuffer.data();
buffer.Flags = 0;
pSourceVoice->SubmitSourceBuffer(&buffer);
if (frame == 1) {
pSourceVoice->Start(0, 0);
}
frame++;
return true;
}
bool init() {
CoInitializeEx(nullptr, COINIT_MULTITHREADED);
pXAudio2 = nullptr;
XAudio2Create(&pXAudio2, 0, XAUDIO2_DEFAULT_PROCESSOR);
pMasterVoice = nullptr;
pXAudio2->CreateMasteringVoice(&pMasterVoice);
wfx = { 0 };
wfx.wFormatTag = WAVE_FORMAT_PCM;
wfx.nChannels = (WORD)1; //mono
wfx.nSamplesPerSec = (DWORD)44100; //samplerate
wfx.wBitsPerSample = (WORD)16; //16 bit (signed)
wfx.nBlockAlign = (WORD)2; //2 bytes per sample
wfx.nAvgBytesPerSec = (DWORD)88200; //samplerate*blockalign
wfx.cbSize = (WORD)0;
pSourceVoice = nullptr;
pXAudio2->CreateSourceVoice(&pSourceVoice, &wfx);
tt = 0, p = 1000, ampl = 10000;
pow16[0] = 16;
pow16[1] = 4096;
frame = 0;
return true;
}
int main() {
if (!init()) return 1;
while (loop()) {}
return 0;
}
I haven't tried to follow all of your logic but it seems over complicated and could definitely be simplified.
The massive use of global variables is also not a great way to write a program. You should move variables inside the functions where possible, otherwise either pass them to the function as arguments or use a class to hold the state.
I have assigment to optimize some c++ code, I'm bad at coding but I made some attempts so the original is:
#include "stdafx.h"
#include "HistogramStretching.h"
void CHistogramStretching::HistogramStretching(BYTE** pImage, int nW, int nH)
{
//find minimal value
int nMin = pImage[0][0];
for(int j = 0; j < nW; j++)
for(int i = 0; i < nH; i++)
if(pImage[i][j] < nMin)
nMin = pImage[i][j];
//find maximal value
int nMax = pImage[0][0];
for(int j = 0; j < nW; j++)
for(int i = 0; i < nH; i++)
if(pImage[i][j] > nMax)
nMax = pImage[i][j];
//stretches histogram
for(int j = 0; j < nW; j++)
for(int i = 0; i < nH; i++)
{
if(nMax != nMin)
{
float fScale = (nMax - nMin)/100.0;//calculates scale
float fVal = (pImage[i][j] - nMin)/fScale;//scales pixel value
int nVal = (int)(fVal + 0.5);//rounds floating point number to integer
//checks BYTE range (must be 0-255)
if(nVal < 0)
nVal = 0;
if(nVal > 255)
nVal = 255;
pImage[i][j] = nVal;
}
else
pImage[i][j] = 0;//if all pixel values are the same, the image is changed to black
}
}
And my verison is:
#include "stdafx.h"
#include "HistogramStretching.h"
void CHistogramStretching::HistogramStretching(BYTE** pImage, int nW, int nH)
{
//find minimal value
int nMin = pImage[0][0];
int nMax = pImage[0][0];
for (int j = 0; j < nW; j++) {
for (int i = 0; i < nH; i++) {
if (pImage[i][j] < nMin)
nMin = pImage[i][j];
if (pImage[i][j] > nMax)
nMax = pImage[i][j];
}
}
if (nMax != nMin) {
float fScale = (nMax - nMin) / 100.0;//calculates scale
fScale = 1 / fScale;
//stretches histogram
for (int j = 0; j < nW; j++)
for (int i = 0; i < nH; i++)
{
float fVal = (pImage[i][j] - nMin) * fScale;//scales pixel value
int nVal = (int)(fVal + 0.5);//rounds floating point number to integer
//checks BYTE range (must be 0-255)
if (nVal < 0)
nVal = 0;
if (nVal > 255)
nVal = 255;
pImage[i][j] = nVal;
}
//if all pixel values are the same, the image is changed to black
}
else {
pImage[0][0] = 0;
}
}
So I merged the first two loops to one but still the first if make ~15% CPU time, next step was to pull the if statement outside the loops and changing division for multiplication and here that division takes ~8% of CPU time and float to int casting takes ~5% but I think I can't do much with casting. With this "correcions" my code is still some like 6-7 times slower than refference code. I test both code on the same machines. Can you point me to something I can make better?
I think tadman gave you the correct answer.
Replace
for (int j = 0; j < nW; j++) {
for (int i = 0; i < nH; i++) {
if (pImage[i][j] < nMin)
...
}
}
with
for (int i = 0; i < nH; i++) {
for (int j = 0; j < nW; j++) {
if (pImage[i][j] < nMin)
...
}
}
This way your data access becomes cache/memory aligned, which should be way faster.
All modern compilers can vectorize this nicely, when compiled at full optimization (/O2 for MSVC, -O3 for gcc and clang).
The idea is to give the compiler some help so that it can see that the code can be in fact vectorized:
Let the inner loop operate on a single pointer, not on indices, and without accessing anything but the pointed-to value.
Perform the scaling as an integer operation - and don't forget rounding :)
Try to set up operations such that additional range checks are unnecessary, e.g. your checks for BYTE being less than 0. By having the offset and scale set up properly, the result will be guaranteed to fall into the desired range.
The inner loops will get unrolled, and will be vectorized to process 4 bytes at a time. I've tried the recent gcc, clang and MSVC releases and they produce pretty fast code for this.
You're doing something "weird" in that you purposefully scale the results to a 0-99 range. Thus you lose the resolution of the data - you've got a full byte to work with, so why not scale it to 255?
But if you want to scale to 100 values, it's fine. Note that 100(dec) = 0x64. We can make the outputSpan flexible - it will work for any value <= 255.
Thus:
/* Code Part 1 */
#include <cstdint>
constexpr uint32_t outputSpan = 100;
static constexpr uint32_t scale_16(uint8_t min, uint8_t max)
{
return (outputSpan * 0x10000) / (1+max-min);
}
// scale factor in 16.16 fixed point unsigned format
// empty histogram produces scale = outputSpan
static_assert(scale_16(10, 10) == outputSpan * 0x10000, "Scale calculation is wrong");
static constexpr uint8_t scale_pixel(uint8_t const pixel, uint8_t min, uint32_t const scale)
{
uint32_t px = (pixel - min) * scale;
// result in 16.16 fixed point format
return (px + 0x8080u) >> 16;
// round to an integer value
}
We work with fixed-point numbers (instead of floating-point). The scale is in 16.16 format, thus 16 digits in the integer part, and 16 digits in the fractional part, e.g. 0x1234.5678. The value 1.0(dec) would be 0x1.0000.
The pixel scaling simply multiplies the pixel by the scale, rounds it, and returns the truncated integer part.
The rounding is "interesting". You'd think that it'd suffice to add 0.5(dec) = 0x0.8 to the result to round it. That's not the case. The value needs to be a bit larger than that, and 0x0.808 does the job. It pre-biases the value, so that the error range around the exact value has a zero mean. In all cases, the error is at most ±0.5 - thus the result, rounded to an integer, does not lose accuracy.
We use scale_16 and scale_pixel functions to implement the stretcher:
/* Code Part 2 */
void stretchHistogram(uint8_t **pImage, int const nW, int const nH)
{
uint8_t nMin = 255, nMax = 0;
for (uint8_t **row = pImage, **rowEnd = pImage + nH; row != rowEnd; ++row)
for (const uint8_t *p = *row, *pEnd = p + nW; p != pEnd; ++p)
{
auto const px = *p;
if (px < nMin) nMin = px;
if (px > nMax) nMax = px;
}
auto const scale = scale_16(nMin, nMax);
for (uint8_t **row = pImage, **rowEnd = pImage + nH; row != rowEnd; ++row)
for (uint8_t *p = *row, *pEnd = p + nW; p != pEnd; ++p)
*p = scale_pixel(*p, nMin, scale);
}
This also produces decent code on architectures without FPU, such as FPU-less ARM and AVR.
We can also do some manual checks. Suppose that min = 0x10, max = 0xEF, and pixel = 0x32. Let's remember that the scale is in 16.16 format:
scale = 0x64.0000 / (1 + max - min)
= 0x64.0000 / (1 + 0xEF - 0x10)
= 0x64.0000 / (1 + 0xDF)
= 0x64.0000 / 0xE0
Long division:
0x .7249
0x64.0000 / 0xE0
---------
64.0
- 62.0
------
2.00
- 1.C0
-------
.400
- .380
--------
. 800
- . 7E0
---------
. 20
So, we have scale = 0x0.7249. It's less than one (0x1.0), and also a bit less than 1/2 (0x0.8), since we map 224 values onto 100 values - a bit less than half as many.
Now
px = (pixel - min) * scale
= (0x32 - 0x10) * 0x0.7249
= 0x22 * 0x0.7249
Long multiplication:
0x 0.7249
* 0x .0022
------------
.E492
+ E.492
------------
0x F.2DB2
Thus, px = 0xF.2DB2 ≈ 0xF. We have to round it to an integer:
return = (px + 0x0.8080u) >> 16
= (0xF.2DB2 + 0x0.8080) >> 16
= 0xF.AE32 >> 16
≈ 0xF
Let's check in decimal system:
100 / (max-min+1) * (pixel-min) =
= 100 / (239 - 16 + 1) * (50 - 16)
= 100 / 224 * 34
= 100 * 34 / 224
= 3400 / 224
≈ 15.17
≈ 15
≈ 0xF
Here's a test case that ensures that there's no rounding bias for all combinations of min, max, and input pixel value, and that the error is bounded to [-0.5, 0.5]. Just append it to the code above and it should compile and run and produce the following output:
-0.5 0.5 1
For scaling to outputSpan = 256 values (instead of 100), it'd output:
-0.498039 0.498039 0.996078
/* Code Part 3 */
#include <cassert>
#include <cmath>
#include <iostream>
int main()
{
double errMin = 0, errMax = 0;
for (uint16_t min = 0; min <= 255; ++min)
for (uint16_t max = min; max <= 255; ++max)
for (uint16_t val = min; val <= max; ++val)
{
uint8_t const nMin = min, nMax = max;
uint8_t const span = nMax - nMin;
uint8_t const val_src = val;
uint8_t p_val = val_src;
uint8_t *const p = &p_val;
assert(nMin <= nMax);
assert(val >= nMin && val <= nMax);
auto const scale = scale_16(nMin, nMax);
*p = scale_pixel(*p, nMin, scale);
auto pValTarget = (val_src - nMin) * 256.0/(1.0+span);
auto error = pValTarget - *p;
if (error < errMin) errMin = error;
if (error > errMax) errMax = error;
}
std::cout << '\n' << errMin << ' ' << errMax << ' ' << errMax-errMin << std::endl;
assert((errMax-errMin) <= 1.0); // constrain the error
assert(std::abs(errMax+errMin) == 0.0); // constrain the error average
}
Apologizing if the answer is on the site somewhere (couldn't find). I'm a hobbyist who tries to load WAV file, get it's magnitude and phase data (for modification), generate spectrogram and then save it back as a new WAV file.
I use C++ (Qt) and FFTW library.
My problem is that resulting WAV differs from original even when no modifications are made. If FFT operations are performed on whole sample sequence, then it looks just like the original. But I have to use STFTs with overlapping windows. In this case I get distortions resulting in periodic cracking/throttling sounds, also waveform of audio is significantly changed.
This can be seen in following examples (viewed in Audacity):
original / processed in one chunk:
original
processed (windowSize=2048, hopSize=1024, no window function):
processed ws=2048, hs=1024, wf=none
I can't post more examples with my reputation, but performing Hamming window function after ISTFT (not before STFT) with the method I use to combine resulting windowed samples gives good sound. But waveform is still quite different, mainly significant loss in peaks is observed.
I think the way I combine result of ISTFT into new sample sequence is the problem. What is the proper way to do this? Example in C++ would be really appreciated.
EDIT *
As correctly pointed out by SleuthEye I made a mistake in code.
Code is adjusted. Waveform and sound seems to be perfect now even without applying a window function. Still, is the method correct for such an operation?
Here's the relevant source:
// getSampleNormalized(n) returns sample n of 1 channel in -1.0 to 1.0 range
// getSampleCount() returns sample count of 1 channel
// quint32 is just unsigned int
quint32 windowSize = 2048;
quint32 windowSizeHalf = windowSize / 2 + 1;
quint32 slideWindowBy = 1024; // hopSize
quint32 windowCount = getSampleCount() / slideWindowBy;
if ( (windowCount * slideWindowBy) < getSampleCount()){
windowCount += 1;
}
quint32 newSampleCount = windowCount * slideWindowBy + ( windowSize - slideWindowBy );
double *window = new double[windowSize];
fftw_complex *fftResult = new fftw_complex[windowSizeHalf];
fftw_complex *fftWindow = new fftw_complex[windowSizeHalf];
double *result = new double[windowSize];
double **magnitudes = new double*[windowCount];
double **phases = new double*[windowCount];
double **signalWindows = new double*[windowCount];
for (int i = 0; i < windowCount; ++i){
magnitudes[i] = new double[windowSizeHalf];
phases[i] = new double[windowSizeHalf];
signalWindows[i] = new double[windowSize];
}
double *sampleSignals = new double[newSampleCount];
fftw_plan fftPlan = fftw_plan_dft_r2c_1d( windowSize, window, fftResult, FFTW_ESTIMATE );
fftw_plan ifftPlan = fftw_plan_dft_c2r_1d( windowSize, fftWindow, result, FFTW_ESTIMATE );
// STFT
for ( int currentWindow = 0; currentWindow < windowCount; ++currentWindow ){
for (int i = 0; i < windowSize; ++i){
quint32 currentSample = currentWindow * slideWindowBy + i;
if ( ( currentSample ) < getSampleCount() ){
window[i] = getSampleNormalized( currentSample ); // * ( windowHamming( i, windowSize ) );
}
else{
window[i] = 0.0;
}
}
fftw_execute(fftPlan);
for (int i = 0; i < windowSizeHalf; ++i){
magnitudes[currentWindow][i] = sqrt( fftResult[i][0]*fftResult[i][0] + fftResult[i][1]*fftResult[i][1] );
phases[currentWindow][i] = atan2( fftResult[i][1], fftResult[i][0] );
}
}
// INVERSE STFT
for ( int currentWindow = 0; currentWindow < windowCount; ++currentWindow ){
for ( int i = 0; i < windowSizeHalf; ++i ){
fftWindow[i][0] = magnitudes[currentWindow][i] * cos( phases[currentWindow][i] ); // Real
fftWindow[i][1] = magnitudes[currentWindow][i] * sin( phases[currentWindow][i] ); // Imaginary
}
fftw_execute(ifftPlan);
for ( int i = 0; i < windowSize; ++i ){
signalWindows[currentWindow][i] = result[i] / windowSize; // getting normalized result
//signalWindows[currentWindow][i] *= (windowHamming( i, windowSize )); // applying Hamming window function
}
}
quint32 pos;
// HERE WE COMBINE RESULTED WINDOWS
// COMBINE AND AVERAGE
// 1st window should be full replace
for ( int i = 0; i < windowSize; ++i ){
sampleSignals[i] = signalWindows[0][i];
}
// 2nd window and onwards: combine with previous ones
for ( int currentWindow = 1; currentWindow < windowCount; ++currentWindow ){
// combine and average with data from previous window
for ( int i = 0; i < (windowSize - slideWindowBy); ++i ){
pos = currentWindow * slideWindowBy + i;
sampleSignals[pos] = (sampleSignals[pos] + signalWindows[currentWindow][i]) * 0.5;
}
// simply replace for the rest
for ( int i = (windowSize - slideWindowBy); i < windowSize; ++i ){
pos = currentWindow * slideWindowBy + i;
sampleSignals[pos] = signalWindows[currentWindow][i];
}
}
// then just save the wav file...
I'm trying to mix some audio samples with the following algorithm:
short* FilterGenerator::mixSources(std::vector<RawData>rawsources, int numframes)
{
short* output = new short[numframes * 2]; // multiply 2 for channels
for (int sample = 0; sample < numframes * 2; ++sample)
{
for (int sourceCount = 0; sourceCount < rawsources.size(); ++sourceCount)
{
if (sample <= rawsources.at(sourceCount).frames * 2)
{
short outputSample = rawsources.at(sourceCount).data[sample];
output[sample] += outputSample;
}
}
}
// post mixing volume compression
for (int sample = 0; sample < numframes; ++sample)
{
output[sample] /= (float)rawsources.size();
}
return output;
}
I get the output I want except for the fact that when one of the sources are done, the other sources start playing louder. I know why this is but I don't know how to solve it properly.
Also, this is a screenshot from Audacity from the audio I output:
As you can see there's definitely something wrong. You can see that the audio hasn't got zero at the center anymore and you can see the audio getting louder once one of the sources are done playing.
Most of all I'd like to fix the volume problem but any other tweaks I can do are very appreciated!
Some extra info: I know that this code doesn't allow mono sources but that's ok. I'm only going to use stereo interleaved audio samples.
Usually mixing don't divide by the number of sources. This mean that mix a normal track with a mute track can halve its amplitude. If you want you can eventually normalize the track so that it is in his range.
The code is not tested, there may be errors:
#include <algorithm> // for std::max
#include <cmath> // for std::fabs
short* FilterGenerator::mixSources(std::vector<RawData>rawsources, int numframes)
{
// We can not use shorts immediately because can overflow
// I use floats because in the renormalization not have distortions
float *outputFloating = new float [numframes * 2];
// The maximum of the absolute value of the signal
float maximumOutput = 0;
for (int sample = 0; sample < numframes * 2; ++sample)
{
// makes sure that at the beginning is zero
outputFloating[sample] = 0;
for (int sourceCount = 0; sourceCount < rawsources.size(); ++sourceCount)
{
// I think that should be a '<'
if (sample < rawsources.at(sourceCount).frames * 2)
outputFloating[sample] += rawsources.at(sourceCount).data[sample];
}
// Calculates the maximum
maximumOutput = std::max (maximumOutput, std::fabs(outputFloating[sample]));
}
// A short buffer
short* output = new short [numframes * 2]; // multiply 2 for channels
float multiplier = maximumOutput > 32767 ? 32767 / maximumOutput : 1;
// Renormalize the track
for (int sample = 0; sample < numframes * 2; ++sample)
output[sample] = (short) (outputFloating[sample] * multiplier);
delete[] outputFloating;
return output;
}
Since you're adding up everything into a short before you divide, you're probably getting overflow. You need to add to an intermediary that's bigger. Also the final scaling shouldn't be dependent on the number of samples, it should be a constant - determine it before you call your function.
short* FilterGenerator::mixSources(std::vector<RawData>rawsources, int numframes, double gain = 0.5)
{
short* output = new short[numframes * 2]; // multiply 2 for channels
for (int sample = 0; sample < numframes * 2; ++sample)
{
long newSample = 0;
for (int sourceCount = 0; sourceCount < rawsources.size(); ++sourceCount)
{
if (sample <= rawsources.at(sourceCount).frames * 2)
{
short outputSample = rawsources.at(sourceCount).data[sample];
newSample += outputSample;
}
}
output[sample] = (short)(newSample * gain);
}
return output;
}
You don't really have to do the "post mixing volume compression". Simply add up all the sources and don't allow the sum to overflow. This should work:
short* FilterGenerator::mixSources(std::vector<RawData>rawsources, int numframes)
{
short* output = new short[numframes * 2]; // multiply 2 for channels
for (int sample = 0; sample < numframes * 2; ++sample)
{
long sum = 0;
for (int sourceCount = 0; sourceCount < rawsources.size(); ++sourceCount)
{
if (sample < rawsources.at(sourceCount).frames * 2)
{
short outputSample = rawsources.at(sourceCount).data[sample];
sum += outputSample;
output[sample] += outputSample;
}
if (sum > 32767) sum = 32767;
if (sum < -32768) sum = -32768;
output[sample] = (short)sum;
}
}
return output;
}
I have written an MPI code in C++ for my Raspberry Pi cluster, which generates an image of the Mandelbrot Set. What happens is on each node (excluding the master, processor 0) part of the Mandelbrot Set is calculated, resulting in each node having a 2D array of ints that indicates whether each xy point is in the set.
It appears to work well on each node individually, but when all the arrays are gathered to the master using this command:
MPI_Gather(&inside, 1, MPI_INT, insideFull, 1, MPI_INT, 0, MPI_COMM_WORLD);
it corrupts the data, and the result is an array full of garbage.
(inside is the nodes' 2D arrays of part of the set. insideFull is also a 2D array but it holds the whole set)
Why would it be doing this?
(This led to me wondering if it corrupting because the master isn't sending its array to itself (or at least I don't want it to). So part of my question also is is there an MPI_Gather variant that doesn't send anything from the root process, just collects from everything else?)
Thanks
EDIT: here's the whole code. If anyone can suggest better ways of how I'm transferring the arrays, please say.
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
// ONLY USE MULTIPLES OF THE NUMBER OF SLAVE PROCESSORS
#define ImageHeight 128
#define ImageWidth 128
double MinRe = -1.9;
double MaxRe = 0.5;
double MinIm = -1.2;
double MaxIm = MinIm + (MaxRe - MinRe)*ImageHeight / ImageWidth;
double Re_factor = (MaxRe - MinRe) / (ImageWidth - 1);
double Im_factor = (MaxIm - MinIm) / (ImageHeight - 1);
unsigned n;
unsigned MaxIterations = 50;
int red;
int green;
int blue;
// MPI variables ****
int processorNumber;
int processorRank;
//*******************//
int main(int argc, char** argv) {
// Initialise MPI
MPI_Init(NULL, NULL);
// Get the number of procesors
MPI_Comm_size(MPI_COMM_WORLD, &processorNumber);
// Get the rank of this processor
MPI_Comm_rank(MPI_COMM_WORLD, &processorRank);
// Get the name of this processor
char processorName[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processorName, &name_len);
// A barrier just to sync all the processors, make timing more accurate
MPI_Barrier(MPI_COMM_WORLD);
// Make an array that stores whether each point is in the Mandelbrot Set
int inside[ImageWidth / processorNumber][ImageHeight / processorNumber];
if(processorRank == 0) {
printf("Generating Mandelbrot Set\n");
}
// We don't want the master to process the Mandelbrot Set, only the slaves
if(processorRank != 0) {
// Determine which coordinates to test on each processor
int xMin = (ImageWidth / (processorNumber - 1)) * (processorRank - 1);
int xMax = ((ImageWidth / (processorNumber - 1)) * (processorRank - 1)) - 1;
int yMin = (ImageHeight / (processorNumber - 1)) * (processorRank - 1);
int yMax = ((ImageHeight / (processorNumber - 1)) * (processorRank - 1)) - 1;
// Check each value to see if it's in the Mandelbrot Set
for (int y = yMin; y <= yMax; y++) {
double c_im = MaxIm - y *Im_factor;
for (int x = xMin; x <= xMax; x++) {
double c_re = MinRe + x*Re_factor;
double Z_re = c_re, Z_im = c_im;
int isInside = 1;
for (n = 0; n <= MaxIterations; ++n) {
double Z_re2 = Z_re * Z_re, Z_im2 = Z_im * Z_im;
if (Z_re2 + Z_im2 > 10) {
isInside = 0;
break;
}
Z_im = 2 * Z_re * Z_im + c_im;
Z_re = Z_re2 - Z_im2 + c_re;
}
if (isInside == 1) {
inside[x][y] = 1;
}
else{
inside[x][y] = 0;
}
}
}
}
// Wait for all processors to finish computing
MPI_Barrier(MPI_COMM_WORLD);
int insideFull[ImageWidth][ImageHeight];
if(processorRank == 0) {
printf("Sending parts of set to master\n");
}
// Send all the arrays to the master
MPI_Gather(&inside[0][0], 1, MPI_INT, &insideFull[0][0], 1, MPI_INT, 0, MPI_COMM_WORLD);
// Output the data to an image
if(processorRank == 0) {
printf("Generating image\n");
FILE * image = fopen("mandelbrot_set.ppm", "wb");
fprintf(image, "P6 %d %d 255\n", ImageHeight, ImageWidth);
for(int y = 0; y < ImageHeight; y++) {
for(int x = 0; x < ImageWidth; x++) {
if(insideFull[x][y]) {
putc(0, image);
putc(0, image);
putc(255, image);
}
else {
putc(0, image);
putc(0, image);
putc(0, image);
}
// Just to see what values return, no actual purpose
printf("%d, %d, %d\n", x, y, insideFull[x][y]);
}
}
fclose(image);
printf("Complete\n");
}
MPI_Barrier(MPI_COMM_WORLD);
// Finalise MPI
MPI_Finalize();
}
You call MPI_Gether with the following parameters:
const void* sendbuf : &inside[0][0] Starting address of send buffer
int sendcount : 1 Number of elements in send buffer
const MPI::Datatype& sendtype : MPI_INT Datatype of send buffer elements
void* recvbuf : &insideFull[0][0]
int recvcount : 1 Number of elements for any single receive
const MPI::Datatype& recvtype : MPI_INT Datatype of recvbuffer elements
int root : 0 Rank of receiving process
MPI_Comm comm : MPI_COMM_WORLD Communicator (handle).
Sending/receiving only one element is not sufficient. Instead of 1 use
(ImageWidth / processorNumber)*(ImageHeight / processorNumber)
Then think about the different memory layout of your source and target 2D arrays:
int inside[ImageWidth / processorNumber][ImageHeight / processorNumber];
vs.
int insideFull[ImageWidth][ImageHeight];
As the copy is a memory bloc copy, and not an intelligent 2D array copy, all your source integers will be transfered contiguously to the target adress, regardless of the different size of the lines.
I'd recommend to send the data fisrt into an array of the same size as the source, and then in the receiving process, to copy the elements to the right lines & columns in the full array, for example with a small function like:
// assemble2d():
// copys a source int sarr[sli][sco] to a destination int darr[dli][sli]
// using an offset to starting at darr[doffli][doffco].
// The elements that are out of bounds are ignored. Negative offset possible.
void assemble2D(int*darr, int dli, int dco, int*sarr, int sli, int sco, int doffli=0, int doffco=0)
{
for (int i = 0; i < sli; i++)
for (int j = 0; j < sco; j++)
if ((i + doffli >= 0) && (j + doffco>=0) && (i + doffli<dli) && (j + doffco<dco))
darr[(i+doffli)*dli + j+doffco] = sarr[i*sli+j];
}