drawing waveform - converting to DB squashes it - c++

I have a wave file, i have a function that retrieves 2 samples per pixel then i draw lines with them. quick and painless before i deal with zooming. i can display the amplitude values no problem
that is an accurate image of the waveform. to do this i used the following code
//tempAllChannels[numOfSamples] holds amplitude data for the entire wav
//oneChannel[numOfPixels*2] will hold 2 values per pixel in display area, an average of min amp, and average of max
for(int i = 0; i < numOfSamples; i++)//loop through all samples in wave file
{
if (tempAllChannels[i] < 0) min += tempAllChannels[i];//if neg amp value, add amp value to min
if (tempAllChannels[i] >= 0) max += tempAllChannels[i];
if(i%factor==0 && i!=0) //factor is (numofsamples in wav)/(numofpixels) in display area
{
min = min/factor; //get average amp value
max = max/factor;
oneChannel[j]=max;
oneChannel[j+1]=min;
j+=2; //iterate for next time
min = 0; //reset for next time
max = 0;
}
}
and that's great but I need to display in db so quieter wave images arent ridiculously small, but when i make the following change to the above code
oneChannel[j]=10*log10(max);
oneChannel[j+1]=-10*log10(-min);
the wave image looks like this.
which isnt accurate, it looks like its being squashed. Is there something wrong with what I'm doing? I need to find a way to convert from amplitude to decibels whilst maintaining dynamics. im thinking i shouldnt be taking an average when converted to DB.

Don't convert to dB for overviews. No one does that.
Instead of finding the average over a block, you should find the max of the absolute value. By averaging, you will loose a lot of amplitude in your high frequency peaks.

Related

How can I smooth the rate of change of a number data?

I'm a collage student and I'm new here. I've been having a problem with my assignment which I have to calculate the speed of a car. I've done with the right algorithm (using a 1/60s timer) and formula except that I have a problem with displaying the speed number. The output data changes very fast in 1 seconds (Yes, it will change very frequently in 1s since I use a 1/60s timer). Is there any way to smooth the rate of change within that output?
I've tried to round the number but the rate of change still very quick.
//For example Car1 object is moving along the x axis
//My method to calculate the speed with a 1/60s timer
//every 1/60s timeout:
if(distanceToggler == true ){
vDistance[0] = car->getCarPos().x();
}
else {
vDistance[1] = car->getCarPos().x();
}
//if Ture assign to vDistance[0] else assign to vDistance[1]
distanceToggler = !distanceToggler;
if ( (vDistance[1] - vDistance[0]) >= 0 ){
defaultSetting.editCurrentCarSpeed( (vDistance[1]-vDistance[0]) / (0.6f) );
}
currentCarSpeed = (vDistance[0]-vDistance[1]) / (0.6f);
A simple way to smooth noisy values arriving frequently is to keep a kind of running average and only adjust it by a percentage of each new value:
const float smooth_factor = 0.05f;
// Assume the first sample is correct (alternatively you could initialize to 0)
float smooth_v;
std::cin >> smooth_v;
// Read samples and output filtered samples
for(float v; std::cin >> v; )
{
smooth_v = (1.0f - smooth_factor) * smooth_v + smooth_factor * v;
std::cout << smooth_v << std::endl;
}
The smaller you make smooth_factor, the slower the "smooth" value will change in response to new data. You can tweak this value to something suitable to your application.
This is a fast alternative to taking an unweighted windowed average (although such averages can be computed in constant time), although it's slightly different in that every historical value has some effect (which reduces with time).

Fast, good quality pixel interpolation for extreme image downscaling

In my program, I am downscaling an image of 500px or larger to an extreme level of approx 16px-32px. The source image is user-specified so I do not have control over its size. As you can imagine, few pixel interpolations hold up and inevitably the result is heavily aliased.
I've tried bilinear, bicubic and square average sampling. The square average sampling actually provides the most decent results but the smaller it gets, the larger the sampling radius has to be. As a result, it gets quite slow - slower than the other interpolation methods.
I have also tried an adaptive square average sampling so that the smaller it gets the greater the sampling radius, while the closer it is to its original size, the smaller the sampling radius. However, it produces problems and I am not convinced this is the best approach.
So the question is: What is the recommended type of pixel interpolation that is fast and works well on such extreme levels of downscaling?
I do not wish to use a library so I will need something that I can code by hand and isn't too complex. I am working in C++ with VS 2012.
Here's some example code I've tried as requested (hopefully without errors from my pseudo-code cut and paste). This performs a 7x7 average downscale and although it's a better result than bilinear or bicubic interpolation, it also takes quite a hit:
// Sizing control
ctl(0): "Resize",Range=(0,800),Val=100
// Variables
float fracx,fracy;
int Xnew,Ynew,p,q,Calc;
int x,y,p1,q1,i,j;
//New image dimensions
Xnew=image->width*ctl(0)/100;
Ynew=image->height*ctl(0)/100;
for (y=0; y<image->height; y++){ // rows
for (x=0; x<image->width; x++){ // columns
p1=(int)x*image->width/Xnew;
q1=(int)y*image->height/Ynew;
for (z=0; z<3; z++){ // channels
for (i=-3;i<=3;i++) {
for (j=-3;j<=3;j++) {
Calc += (int)(src(p1-i,q1-j,z));
} //j
} //i
Calc /= 49;
pset(x, y, z, Calc);
} // channels
} // columns
} // rows
Thanks!
The first point is to use pointers to your data. Never use indexes at every pixel. When you write: src(p1-i,q1-j,z) or pset(x, y, z, Calc) how much computation is being made? Use pointers to data and manipulate those.
Second: your algorithm is wrong. You don't want an average filter, but you want to make a grid on your source image and for every grid cell compute the average and put it in the corresponding pixel of the output image.
The specific solution should be tailored to your data representation, but it could be something like this:
std::vector<uint32_t> accum(Xnew);
std::vector<uint32_t> count(Xnew);
uint32_t *paccum, *pcount;
uint8_t* pin = /*pointer to input data*/;
uint8_t* pout = /*pointer to output data*/;
for (int dr = 0, sr = 0, w = image->width, h = image->height; sr < h; ++dr) {
memset(paccum = accum.data(), 0, Xnew*4);
memset(pcount = count.data(), 0, Xnew*4);
while (sr * Ynew / h == dr) {
paccum = accum.data();
pcount = count.data();
for (int dc = 0, sc = 0; sc < w; ++sc) {
*paccum += *i;
*pcount += 1;
++pin;
if (sc * Xnew / w > dc) {
++dc;
++paccum;
++pcount;
}
}
sr++;
}
std::transform(begin(accum), end(accum), begin(count), pout, std::divides<uint32_t>());
pout += Xnew;
}
This was written using my own library (still in development) and it seems to work, but later I changed the variables names in order to make it simpler here, so I don't guarantee anything!
The idea is to have a local buffer of 32 bit ints which can hold the partial sum of all pixels in the rows which fall in a row of the output image. Then you divide by the cell count and save the output to the final image.
The first thing you should do is to set up a performance evaluation system to measure how much any change impacts on the performance.
As said precedently, you should not use indexes but pointers for (probably) a substantial
speed up & not simply average as a basic averaging of pixels is basically a blur filter.
I would highly advise you to rework your code to be using "kernels". This is the matrix representing the ratio of each pixel used. That way, you will be able to test different strategies and optimize quality.
Example of kernels:
https://en.wikipedia.org/wiki/Kernel_(image_processing)
Upsampling/downsampling kernel:
http://www.johncostella.com/magic/
Note, from the code it seems you apply a 3x3 kernel but initially done on a 7x7 kernel. The equivalent 3x3 kernel as posted would be:
[1 1 1]
[1 1 1] * 1/9
[1 1 1]

FFTW3 compute cross-correlation in the same signal

I am currently creating a C code, which takes as an input a wav file (specifically just one channel of the original wav file), and it performs the short-time Fourier transform.
The main part of the code is this one:
stft_data = (fftw_complex*)(fftw_malloc(sizeof(fftw_complex)*windowSize));
fft_result= (fftw_complex*)(fftw_malloc(sizeof(fftw_complex)*windowSize));
storage = (fftw_complex*)(fftw_malloc(sizeof(fftw_complex)*storage_capacity));
//define the fftw plane
fftw_plan plan_forward;
plan_forward = fftw_plan_dft_1d(windowSize, stft_data, fft_result, FFTW_FORWARD, FFTW_ESTIMATE);
//integer indexes
int i,counter ;
counter = 0 ;
//create a Hamming window
double hamming_result[windowSize];
hamming(windowSize, hamming_result);
//implement the stft position indexes
int chunkPosition = 0; //actual chunk position
int readIndex ; //read the index of the wav file
while (chunkPosition < wav_length ){
//read the window
for(i=0; i<windowSize; i++){
readIndex = chunkPosition + i;
if (readIndex < wav_length){
stft_data[i] = wav_data[readIndex]*hamming_result[i]*_Complex_I + 0.0*I;
}
else{
//if we are beyond the wav_length
stft_data[i] = 0.0*_Complex_I + 0.0*I;//padding
break;
}
}
//compute the fft
fftw_execute(plan_forward);
//store the stft in a data structure
for (i=0; i<windowSize;i++)
{
//printf("RE: %.2f IM: %.2f\n", creal(fft_result[i]),cimag(fft_result[i]));
storage[counter] = creal(fft_result[i]) + cimag(fft_result[i]);
counter+=1;
}
//update indexes
chunkPosition += hop_size;
printf("Chunk Position %d\n", chunkPosition);
printf("Counter position %d\n", counter);
printf("Fourier transform done\n");
}
Once the FFT has been computed onto the selected window, I am storing the FFT real and imaginary part into a storage variable.
After that I would like to compute the cross correlation among the data points in each of the N windows I have in the end.
As an example, I would like to compute the correlation between the first data point of the first window ( storage[0] ) with the first element of the second window (storage[windowSize+1]).
However, I am facing some problems and I don't have reasonable values. According to what I studied, the correlation in the Fourier space it is just the complex multiplication between two Fourier terms. Thus,
what I am doing is something like :
correlation = storage[0]*conj(storage[windowSize+1]);
However, I got very huge values, which makes me wonder if I am really computing a correlation.
Where am I wrong?
How should I scale my correlation results?
How should I compute the correlation with the Fourier values?
and then, how should I plot the Fourier values I have from FFTW3 calculations? should I shift all the values or are they already shifted?
Thanks very much
The line storage[counter] = creal(fft_result[i]) + cimag(fft_result[i]); makes storage purely real. Since computing correlation = storage[0]*conj(storage[windowSize+1]); is the next step in the computation of the cross correlation, there is a problem. Indeed, there is no point in conjugating a real number.
Trying storage[counter] = fft_result[i]; could partly resolve the issue.
In addition, correlation = storage[0]*conj(storage[windowSize+1]); should be modified to correlation = storage[0]*conj(storage[windowSize]);
By performing correlation = storage[0]*conj(storage[windowSize]);, the magnitude of index [0] of the DFT of the correlation is obtained. Indeed, storage[0] corresponds to the average of the first frame, while storage[windowSize] corresponds to the average of the second frame. It is not equal to the averages, but much larger, as it is scaled by the length of the frame windowSize.
To compute the correlation between the two signals, the next step should be:
for (i=0; i<windowSize;i++)
{
dftofcorrelation[i]=storage[i]*conj(storage[i+windowSize]
}
Then, the inverse DFT must be applied to the array dftofcorrelation to get the correlation as an array. It must be kept in mind that neither the forward nor the backward DFT of FFTW include any scaling, see what FFTW really computes:
fftw_execute(plan_backward);
If two scalars are to be retained of this correlation array, it's its maximum (high if the signal are similar up to a delay) and the index of the maximum, that is the estimated time offset between the two signals.
The overall scaling induced by FFTW is a power of windowSize (windowSize^3?). It can be checked by computing the autocorrelation of a uniform signal (which is uniform).

fftw analysing frequencies from mic input on pc

I am using fftw to analyse the frequency spectrum of audio input to a computer from the mic input. I am using portaudio c++ libraries to capture the windows of time-domain audio data and then fftw to do a real to complex r2c transformation of this data to the frequency domain. Below is my function which I call everytime I receive the block of data.
The sample rate is 44100 samples per second , the sample type is short (signed 16 bit integer)and I am taking 250ms blocks of data in each window. The fft resolution is therefore 4Hz.
The problem is , i'm not sure how to interpret the data which I am receiving after the transformation. When no audio is played , I am getting amplitudes of around 1000 to 4000 for every frequency component, as soon as audio is played from an instrument for example, all of the amplitudes go negative.
I have tried doing a normalisation before the fft, by dividing by the average power and then the data makes more sense. All amplitudes are from 200 to 500 when nothing is played, then for example if I play a tone of 76Hz, the amplitude for this component increases to around 2000. So that is something along the lines of what I expect, but still not sure if this process can be implemented better.
My question is, am I doing the right thing here? Must the data be normalised and am I doing it right? Why am I still receiving high amplitudes on the frequencies that are not being played. Has anyone any experience of doing something similar and maybe give some tips. Many thanks in advance.
void AudioProcessor::GetFFT(void* inputData, void* freqSpectrum)
{
double* input = (double*)inputData;
short* freq_spectrum = (short*)freqSpectrum;
fftPlan = fftw_plan_dft_r2c_1d(FRAMES_PER_BUFFER, input, complexOut, FFTW_ESTIMATE);
fftw_execute(fftPlan);
////
for (int k = 0; k < (FRAMES_PER_BUFFER + 1) / 2; ++k)
{
freq_spectrum[k] = (short)(sqrt(complexOut[k][0] * complexOut[k][0] + complexOut[k][1] * complexOut[k][1]));
}
if (FRAMES_PER_BUFFER % 2 == 0) /* frames per buffer is even number */
{
freq_spectrum[FRAMES_PER_BUFFER / 2] = (short)(sqrt(complexOut[FRAMES_PER_BUFFER / 2][0] * complexOut[FRAMES_PER_BUFFER / 2][0] + complexOut[FRAMES_PER_BUFFER / 2][1] * complexOut[FRAMES_PER_BUFFER / 2][1])); /* Nyquist freq. */
}
}

colorbalance in an image using c++ and opencv

I'm trying to score the colorbalance of an image using c++ and opencv.
To do this the easiest way is to count the number of pixels in each color and then see if one of the colors is more prevalent.
I figured I should probably used calcHist and with the split function I can split a image in R, G, and B histograms. However I am unsure about what to do next. I could probably walk through all the bins and just see how many pixels are in there but this seems like a lot of work (I currently use 256 bins).
Is there a faster way to count the pixels in a color range? Also I am not sure how it would work if white or black are the more prevalant colors?
Automatic color balance algorithm is described in this link http://web.stanford.edu/~sujason/ColorBalancing/simplestcb.html
For C++ Code you can refer to this link : https://www.morethantechnical.com/2015/01/14/simplest-color-balance-with-opencv-wcode/
/// perform the Simplest Color Balancing algorithm
void SimplestCB(Mat& in, Mat& out, float percent) {
assert(in.channels() == 3);
assert(percent > 0 && percent < 100);
float half_percent = percent / 200.0f;
vector<Mat> tmpsplit; split(in,tmpsplit);
for(int i=0;i<3;i++) {
//find the low and high precentile values (based on the input percentile)
Mat flat; tmpsplit[i].reshape(1,1).copyTo(flat);
cv::sort(flat,flat,CV_SORT_EVERY_ROW + CV_SORT_ASCENDING);
int lowval = flat.at<uchar>(cvFloor(((float)flat.cols) * half_percent));
int highval = flat.at<uchar>(cvCeil(((float)flat.cols) * (1.0 - half_percent)));
cout << lowval << " " << highval << endl;
//saturate below the low percentile and above the high percentile
tmpsplit[i].setTo(lowval,tmpsplit[i] < lowval);
tmpsplit[i].setTo(highval,tmpsplit[i] > highval);
//scale the channel
normalize(tmpsplit[i],tmpsplit[i],0,255,NORM_MINMAX);
}
merge(tmpsplit,out);
}
// Usage example
void main() {
Mat tmp,im = imread("lily.png");
SimplestCB(im,tmp,1);
imshow("orig",im);
imshow("balanced",tmp);
waitKey(0);
return;
}
Colour balance is normally looking at a white (or gray) surface and checking the ratios of red/blue to green. A perfectly balanced system would have equal signal levels in red/blue.
You can then simply work out the average red/blue from the test gray card image and apply the same scaling to your real image.
Doing it on a live image with no reference is trickier, you have to find areas that are probably white (ie bright and nearly r=g=b) and use them as the reference
There's no definitive algorithm for colour balance, so anything you might implement, however good it is, will probably fail in some conditions.
One of the simplest algorithms is called Grey World, and assumes that statistically the average colour of a scene should be grey. And if it isn't, it means that it needs to be corrected to grey. So, very simply (in pseudo-python), if you have an image RGB:
cc[0] = np.mean(RGB[:,0]) # calculating channel-wise average
cc[1] = np.mean(RGB[:,1])
cc[2] = np.mean(RGB[:,2])
cc = cc / np.sqrt((cc**2).sum()) # normalise the light (you might want to
# play with this a bit
RGB /= cc # divide every pixel by the estimated light
Note that here I'm assuming that RGB is an array of floats with values between 0 and 1. Something else that helps is to exclude from the average pixels that contain values below and above certain thresholds (e.g., below 0.05 and above 0.95). This way you ignore pixels whose value is heavily influenced by noise (small values) and pixels that saturated the camera sensor and whose colour may not be reliable (large values).