Interpretation of DirectSound buffer elements from mic capture device - c++

I am doing some maintenance work involving DirectSound buffers. I would like to know how to interpret the elements in the buffer, that is, to know what each value in the buffer represents. This data is coming from a microphone.
This wave format is being used:
WAVEFORMATEXTENSIBLE format = {
{ WAVE_FORMAT_EXTENSIBLE, 1, sample_rate, sample_rate * 4, 4, 32, 22 },
{ 32 }, 0, KSDATAFORMAT_SUBTYPE_IEEE_FLOAT
};
My goal is to detect microphone silence. I am currently accomplishing this by simply determining if all values in the buffer fail to exceed some threshold volume value, assuming that the intensity of each buffer element directly corresponds to volume.
This what I am currently trying:
bool is_mic_silent(float * data, unsigned int num_samples, float threshold)
{
float * max_iter = std::max_element(data, data + num_samples);
if(!max_iter) {
return true;
}
float max = *max_iter;
if(max < threshold) {
return true;
}
return false; // At least one value is sufficiently loud.
}

As MSN said the samples are in 32-bit floats. To detect a silence you would normally calculate the RMS value: Take the average of the squared sample values over some time interval (say 20-50 ms) and compare (square root of) this average to a threshold.
The noise inherent in the microphone signal may let single samples reach above the threshold while the ambient sound would still be considered silence. The averaging over a short interval will result in a value that corresponds better with our perception.

From here, floating point PCM values are from [-1, 1].

In addition to Han's suggestion to average samples, als consider calibrating your threshold value. Under different environments, with different microphones and different audio channels, "silence" can mean a lot of things.
The simple way would be loowing to configure the threshold. Alternatively, allow a "Noise floor measurement" where you acqurie a threshold value.
Note that the samples are linear, but levels in audio processing are usually given in dB. So depending on yoru target audience, you may want to convert readings and inputs to/from dB.

Related

What are the fastest algorithms for rendering the mandelbrot set?

I've tried many algorithms for the rendering of the Mandelbrot set, inclusive of the naive escape time algorithm, as well as the optimized escape time algorithm. But, are there faster algorithms that are used to produce really deep zooms efficiently like the ones we see on YouTube. Also, I would love to get some ideas on how to increase my precision beyond the C/C++ double
Even High end CPU will be much slower in comparison to average GPU. You can get to real time rendering even with naive iteration algo on GPU. So using better algorithms on GPU could get to high zooms however for any decent algo you need:
multi pass rendering as we can not self modify a texture on GPU
high precision floating point as floats/doubles are not enough.
Here few related QAs:
GLSL RT Mandelbrot
Interior Distance Estimate algorithm for the Mandelbrot set
I get infinitely small numbers for fractals
Perturbation theory
which might get you kick started...
One way to speed up is you can use fractional escape like I did in the first link. It improves image quality while keeping max iteration low.
The second link will get you approximation of which parts of fractal are in and out and how far. Its not very accurate but can be used to avoid computing iterations for parts that are "outside for sure".
Next link will show you how to achieve better precision.
Last link is about Perturbation The idea is that you use high precision math only for some reference points and use that to compute its neighbor points with low precision math without loosing precision. Never used that however but looks promising.
And finally once you achieved fast rendering you might want to aim for this:
How to adjust panning while zooming Mandelbrot set
Here a small example of 3* 64bit double used for single value in GLSL:
// high precision float (very slow)
dvec3 fnor(dvec3 a)
{
dvec3 c=a;
if (abs(c.x)>1e-5){ c.y+=c.x; c.x=0.0; }
if (abs(c.y)>1e+5){ c.z+=c.y; c.y=0.0; }
return c;
}
double fget(dvec3 a){ return a.x+a.y+a.z; }
dvec3 fset(double a){ return fnor(dvec3(a,0.0,0.0)); }
dvec3 fadd(dvec3 a,double b){ return fnor(a+fset(b)); }
dvec3 fsub(dvec3 a,double b){ return fnor(a-fset(b)); }
dvec3 fmul(dvec3 a,double b){ return fnor(a*b); }
dvec3 fadd(dvec3 a,dvec3 b){ return fnor(a+b); }
dvec3 fsub(dvec3 a,dvec3 b){ return fnor(a-b); }
dvec3 fmul(dvec3 a,dvec3 b)
{
dvec3 c;
c =fnor(a*b.x);
c+=fnor(a*b.y);
c+=fnor(a*b.z);
return fnor(c);
}
so each hi precision value is dvec3 ... the thresholds in fnor can be changed to any ranges. You can convert this to vec3 and float ...
[Edit1] "fast" C++ example
Ok I wanted to try my new SSD1306 driver along with my AVR32 MCU to compute Mandelbrot So I can compare speed with this Arduino + 3D + Pong + Mandelbrot. I used AT32UC3A3256 with ~66MHz no FPU no GPU and 128x64x1bpp display. No external memory only internal 16+32+32 KByte. Naive Mandlebrot was way to slow (~2.5sec per frame) so I busted up something like this (taking advantage of that position and zoom of the view is sort of continuous):
reduce resolution by 2
to make room for dithering as my output is just B&W
use variable max iteration n based on zoom
On change of n invalidate last frame to enforce full recompute. I know this is slow but it happens only 3 times on transitions between zoom ranges.
Scaling count from last frame is not looking good as its not linear.
Its possible to use the last counts but for that it would be needed also to remember the complex variables used for iteration and that would take too much memory.
remember last frame and also which x,y screen coordinate mapped to which Mandelbrot coordinate.
On each frame compute the mapping between screen coordinates and Mandelbrot coordinates.
remap last frame to adjust to new position and zoom
so simply look at the data from #3,#4 and if we have there the same positions in both last and actual frame (closer then half of pixel size), copy the pixels. And recompute the rest.
This will hugely improve performance if your view is smooth (so position and zoom does not change a lot on per frame basis).
I know its a bit vague description so here a C++ code where you can infer all doubts:
//---------------------------------------------------------------------------
//--- Fast Mandelbrot set ver: 1.000 ----------------------------------------
//---------------------------------------------------------------------------
template<int xs,int ys,int sh> void mandelbrot_draw(float mx,float my,float zoom)
{
// xs,ys - screen resolution
// sh - log2(pixel_size) ... dithering pixel size
// mx,my - Mandelbrot position (center of view) <-1.5,+0.5>,<-1.0,+1.0>
// zoom - zoom
// ----------------
// (previous/actual) frame
static U8 p[xs>>sh][ys>>sh]; // intensities (raw Mandelbrot image)
static int n0=0; // max iteraions
static float px[(xs>>sh)+1]={-1000.0}; // pixel x position in Mandlebrot
static float py[(ys>>sh)+1]; // pixel y position in Mandlebrot
// temp variables
U8 shd; // just pattern for dithering
int ix,iy,i,n,jx,jy,kx,ky,sz; // index variables
int nx=xs>>sh,ny=ys>>sh; // real Mandelbrot resolution
float fx,fy,fd; // floating Mandlebrot position and pixel step
float x,y,xx,yy,q; // Mandelbrot iteration stuff (this need to be high precision)
int qx[xs>>sh],qy[ys>>sh]; // maping of pixels between last and actual frame
float px0[xs>>sh],py0[ys>>sh]; // pixel position in Mandlebrot from last frame
// init vars
if (zoom< 10.0) n= 31;
else if (zoom< 100.0) n= 63;
else if (zoom< 1000.0) n=127;
else n=255;
sz=1<<sh;
ix=xs; if (ix>ys) ix=ys; ix/=sz;
fd=2.0/(float(ix-1)*zoom);
mx-=float(xs>>(1+sh))*fd;
my-=float(ys>>(1+sh))*fd;
// init buffers
if ((px[0]<-999.0)||(n0!=n))
{
n0=n;
for (ix=0;ix<nx;ix++) px[ix]=-999.0;
for (iy=0;iy<ny;iy++) py[iy]=-999.0;
for (ix=0;ix<nx;ix++)
for (iy=0;iy<ny;iy++)
p[ix][iy]=0;
}
// store old and compute new float positions of pixels in Mandelbrot to px[],py[],px0[],py0[]
for (fx=mx,ix=0;ix<nx;ix++,fx+=fd){ px0[ix]=px[ix]; px[ix]=fx; qx[ix]=-1; }
for (fy=my,iy=0;iy<ny;iy++,fy+=fd){ py0[iy]=py[iy]; py[iy]=fy; qy[iy]=-1; }
// match old and new x coordinates to qx[]
for (ix=0,jx=0;(ix<nx)&&(jx<nx);)
{
x=px[ix]; y=px0[jx];
xx=(x-y)/fd; if (xx<0.0) xx=-xx;
if (xx<=0.5){ qx[ix]=jx; px[ix]=y; }
if (x<y) ix++; else jx++;
}
// match old and new y coordinates to qy[]
for (ix=0,jx=0;(ix<ny)&&(jx<ny);)
{
x=py[ix]; y=py0[jx];
xx=(x-y)/fd; if (xx<0.0) xx=-xx;
if (xx<=0.5){ qy[ix]=jx; py[ix]=y; }
if (x<y) ix++; else jx++;
}
// remap p[][] by qx[]
for (ix=0,jx=nx-1;ix<nx;ix++,jx--)
{
i=qx[ix]; if ((i>=0)&&(i>=ix)) for (iy=0;iy<ny;iy++) p[ix][iy]=p[i][iy];
i=qx[jx]; if ((i>=0)&&(i<=jx)) for (iy=0;iy<ny;iy++) p[jx][iy]=p[i][iy];
}
// remap p[][] by qy[]
for (iy=0,jy=ny-1;iy<ny;iy++,jy--)
{
i=qy[iy]; if ((i>=0)&&(i>=iy)) for (ix=0;ix<nx;ix++) p[ix][iy]=p[ix][i];
i=qy[jy]; if ((i>=0)&&(i<=jy)) for (ix=0;ix<nx;ix++) p[ix][jy]=p[ix][i];
}
// Mandelbrot
for (iy=0,ky=0,fy=py[iy];iy<ny;iy++,ky+=sz,fy=py[iy]) if ((fy>=-1.0)&&(fy<=+1.0))
for (ix=0,kx=0,fx=px[ix];ix<nx;ix++,kx+=sz,fx=px[ix]) if ((fx>=-1.5)&&(fx<=+0.5))
{
// invalid qx,qy ... recompute Mandelbrot
if ((qx[ix]<0)||(qy[iy]<0))
{
for (x=0.0,y=0.0,xx=0.0,yy=0.0,i=0;(i<n)&&(xx+yy<4.0);i++)
{
q=xx-yy+fx;
y=(2.0*x*y)+fy;
x=q;
xx=x*x;
yy=y*y;
}
i=(16*i)/(n-1); if (i>16) i=16; if (i<0) i=0;
i=16-i; p[ix][iy]=i;
}
// use stored intensity
else i=p[ix][iy];
// render point with intensity i coresponding to ix,iy position in map
for (i<<=3 ,jy=0;jy<sz;jy++)
for (shd=shade8x8[i+(jy&7)],jx=0;jx<sz;jx++)
lcd.pixel(kx+jx,ky+jy,shd&(1<<(jx&7)));
}
}
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
//---------------------------------------------------------------------------
the lcd and shade8x8 stuff can be found in the linked SSD1306 QA. However you can ignore it its just dithering and outputting a pixel so you can instead output the i directly (even without the scaling to <0..16>.
Here preview (on PC as I was lazy to connect camera ...):
so it 64x32 Mandelbrot pixels displayed as 128x64 dithered image. On my AVR32 is this may be 8x time faster than naive method (maybe 3-4fps)... The code might be more optimized however take in mind Mandelbrot is not the only stuff running as I have some ISR handlers on backround to handle the LCD and also my TTS engine based on this which I upgraded a lot since then and use for debugging of this (yes it can speek in parallel to rendering). Also I am low on memory as my 3D engine takes a lot ~11 KByte away (mostly depth buffer).
The preview was done with this code (inside timer):
static float zoom=1.0;
mandelbrot_draw<128,64,1>(+0.37,-0.1,zoom);
zoom*=1.02; if (zoom>100000) zoom=1.0;
Also for non AVR32 C++ environment use this:
//------------------------------------------------------------------------------------------
#ifndef _AVR32_compiler_h
#define _AVR32_compiler_h
//------------------------------------------------------------------------------------------
typedef int32_t S32;
typedef int16_t S16;
typedef int8_t S8;
typedef uint32_t U32;
typedef uint16_t U16;
typedef uint8_t U8;
//------------------------------------------------------------------------------------------
#endif
//------------------------------------------------------------------------------------------
[Edit2] higher float precision in GLSL
The main problem with Mandelbrot is that it need to add numbers with very big exponent difference. For +,- operations we need to align mantissa of both operands, add them as integer and normalize back to scientific notation. However if the exponent difference is big then the result mantissa need more bits than can fit into 32 bit float so only 24 most significant bits are preserved. This creates the rounding errors causing your pixelation. If you look at 32bit float in binary you will see this:
float a=1000.000,b=0.00001,c=a+b;
//012345678901234567890123456789 ... just to easy count bits
a=1111101000b // a=1000
b= 0.00000000000000001010011111000101101011b // b=0.00000999999974737875
c=1111101000.00000000000000001010011111000101101011b // not rounded result
c=1111101000.00000000000000b // c=1000 rounded to 24 bits of mantissa
Now the idea is to enlarge the number of mantissa bits. The easiest trick is to have 2 floats instead of one:
//012345678901234567890123 ... just to easy count bits
a=1111101000b // a=1000
//012345678901234567890123 ... just to easy count b= 0.0000000000000000101001111100010110101100b // b=0.00000999999974737875
c=1111101000.00000000000000001010011111000101101011b // not rounded result
c=1111101000.00000000000000b // c=1000 rounded to 24
+ .0000000000000000101001111100010110101100b
//012345678901234567890123 ... just to easy count bits
so some part of the result is in one float and the rest in the other... The more floats per single value we have the bigger the mantissa. However doing this on bit exact division of big mantissa into 24 bit chunks would be complicated and slow in GLSL (if even possible due to GLSL limitations). Instead we can select for each of the float some range of exponents (just like in example above).
So in the example we got 3 floats (vec3) per single (float) value. Each of the coordinates represent different range:
abs(x) <= 1e-5
1e-5 < abs(y) <= 1e+5
1e+5 < abs(z)
and value = (x+y+z) so we kind of have 3*24 bit mantissa however the ranges are not exactly matching 24 bits. For that the exponent range should be divided by:
log10(2^24)=7.2247198959355486851297334733878
instead of 10... for example something like this:
abs(x) <= 1e-7
1e-7 < abs(y) <= 1e+0
1e+0 < abs(z)
Also the ranges must be selected so they handle the ranges of values you use otherwise it would be for nothing. So if your numbers are <4 its pointless to have range >10^+5 So first you need to see what bounds of values you have, then dissect it to exponent ranges (as many as you have floats per value).
Beware some (but much less than native float) rounding still occurs !!!
Now doing operations on such numbers is slightly more complicated than on normal floats as you need to handle each value as bracketed sum of all components so:
(x0+y0+z0) + (x1+y1+z1) = (x0+x1 + y0+y1 + z0+z1)
(x0+y0+z0) - (x1+y1+z1) = (x0-x1 + y0-y1 + z0-z1)
(x0+y0+z0) * (x1+y1+z1) = x0*(x1+y1+z1) + y0*(x1+y1+z1) + z0*(x1+y1+z1)
And do not forget to normalize the values back to the defined ranges. Avoid adding small and big (abs) values so avoid x0+z0 etc ...
[Edit3] new win32 demo CPU vs. GPU
win32 Mandelbrot demo 64bit floats
Both executables are preset to the same location and zoom to show when the doubles starts to round off. I had to upgrade slightly the way how px,py coordinates are computed as around 10^9 the y axis started to deviate at this location (The threshold still might be too big for other locations)
Here preview CPU vs. GPU for high zoom (n=1600):
RT GIF capture of CPU (n=62++, GIF 4x scaled down):
The optimized escape algorithm should be fast enough to draw the Mandelbrot set in real time. You can use multiple threads so that your implementation will be faster (this is very easy using OpenMP for example). You can also manually vectorize your code using SIMD instructions to make it even faster if needed. You could even run this directly on the GPU using either shaders and/or GPU computing frameworks (OpenCL or CUDA) if this is still not fast enough to you (although this is a bit complex to do efficiently). Finally you should tune the number of iterations so it is rather small.
Zooming should not have any direct impact on the performance. It just changes the input window of the computation. However, it does have an indirect impact since the actual number of iterations will change. Points outside the window should not be computed.
Double precision should also be enough for drawing Mandelbrot set correctly. But if you really want more precise calculation, you can use double-double precision which gives a quite good precision and not too bad performance. However, implementing double-double precision manually is a bit tricky and it is still significantly slower than using just double precision.
My fastest solutions avoid iterating over large areas of the same depth by following a contour boundary and filling. There is a penalty that it is possible to nip off small buds instead of going around them, but all-in-all a small price to pay for a quick zoom.
One possible efficiency is that if a zoom doubles the scale, you already have ΒΌ of the points.
For animation, I file each frame's values, doubling the scale each time, and interpolate the in-between frames on playback in real time, so the animation doubles once per second. The double type allows more than 50 key frames to be stored, giving an animation that lasts more than a minute (in and then back out).
The actual iteration is done by hand-crafted assembler, so one pixel is iterated entirely in the FPU.

Intel integrated performance primitives Fourier Transform magnitudes

When I am using Intel IPP's ippsFFTFwd_RToCCS_64f and then ippsMagnitude_64fc I get a massive peak at zero index in magnitudes array.
My sine wave is long and main component I am interested is somewhere between 0.15 Hz and 0.25 Hz. I take the sample with 500Hz sampling frequency. If I reduce mean from the signal before FFT I get really small zero component not that peak anymore. Below is a pic of magnitudes array head:
Also the magnitude scaling seems to be 10 times the magnitude I see in the time series of the signal e.g. if amplitude is 29 in magnitudes it is 290.
I Am not sure why this is so and my question is 1. Do I really need to address the zero index peak with mean reduction and 2. Where does this scale of 10 come?
void CalculateForwardTransform(array<double> ^signal, array<double> ^transformedSignal, array<double> ^magnitudes)
{
// source signal
pin_ptr<double> pinnedSignal = &signal[0];
double *pSignal = pinnedSignal;
int order = (int)Math::Round(Math::Log(signal->Length, 2));
// get sizes
int sizeSpec = 0, sizeInit = 0, sizeBuf = 0;
int status = ippsFFTGetSize_R_64f(order, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone, &sizeSpec, &sizeInit, &sizeBuf);
// memory allocation
IppsFFTSpec_R_64f* pSpec;
Ipp8u *pSpecMem = (Ipp8u*)ippMalloc(sizeSpec);
Ipp8u *pMemInit = (Ipp8u*)ippMalloc(sizeInit);
// FFT specification structure initialized
status = ippsFFTInit_R_64f(&pSpec, order, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone, pSpecMem, pMemInit);
// transform
pin_ptr<double> pinnedTransformedSignal = &transformedSignal[0];
double *pDst = pinnedTransformedSignal;
Ipp8u *pBuffer = (Ipp8u*)ippMalloc(sizeBuf);
status = ippsFFTFwd_RToCCS_64f(pSignal, pDst, pSpec, pBuffer);
// get magnitudes
pin_ptr<double> pinnedMagnitudes = &magnitudes[0];
double *pMagn = pinnedMagnitudes;
status = ippsMagnitude_64fc((Ipp64fc*)pDst, pMagn, magnitudes->Length); // magnitudes is half of signal len
// free memory
ippFree(pSpecMem);
ippFree(pMemInit);
ippFree(pBuffer);
}
Do I really need to address the zero index peak with mean reduction?
For low frequency signal analysis a small bias can really interfere (especially due to spectral leakage). For sake of illustration, consider the following low-frequency signal tone and another one with a constant bias tone_with_bias:
fs = 1;
f0 = 0.15;
for (int i = 0; i < N; i++)
{
tone[i] = 0.001*cos(2*M_PI*i*f0/fs);
tone_with_bias[i] = 1 + tone[i];
}
If we plot the frequency spectrum of a 100 sample window of these signals, you should notice that the spectrum of tone_with_bias completely drowns out the spectrum of tone:
So yes it's better if you can remove that bias. However, it should be emphasized that this is possible provided that you know the nature of the bias. If you know that the bias is indeed a constant, removing it will reveal the low-frequency component. Otherwise, removing the mean from the signal may not achieve the desired effect if the bias is just a very low-frequency component.
Where does this scale of 10 come?
Scaling of the magnitude by the FFT should be expected, as described in this answer of approximately 0.5*N (where N is the FFT size). If you were processing a small chunk of 20 samples, then you should get such a factor of 10 scaling. If you scale the output of the FFT by 2/N (or equivalently scale by 2 while also using the IPP_FFT_DIV_FWD_BY_N flag) you should get results that have similar magnitudes as the time-domain signal.

Fast, good quality pixel interpolation for extreme image downscaling

In my program, I am downscaling an image of 500px or larger to an extreme level of approx 16px-32px. The source image is user-specified so I do not have control over its size. As you can imagine, few pixel interpolations hold up and inevitably the result is heavily aliased.
I've tried bilinear, bicubic and square average sampling. The square average sampling actually provides the most decent results but the smaller it gets, the larger the sampling radius has to be. As a result, it gets quite slow - slower than the other interpolation methods.
I have also tried an adaptive square average sampling so that the smaller it gets the greater the sampling radius, while the closer it is to its original size, the smaller the sampling radius. However, it produces problems and I am not convinced this is the best approach.
So the question is: What is the recommended type of pixel interpolation that is fast and works well on such extreme levels of downscaling?
I do not wish to use a library so I will need something that I can code by hand and isn't too complex. I am working in C++ with VS 2012.
Here's some example code I've tried as requested (hopefully without errors from my pseudo-code cut and paste). This performs a 7x7 average downscale and although it's a better result than bilinear or bicubic interpolation, it also takes quite a hit:
// Sizing control
ctl(0): "Resize",Range=(0,800),Val=100
// Variables
float fracx,fracy;
int Xnew,Ynew,p,q,Calc;
int x,y,p1,q1,i,j;
//New image dimensions
Xnew=image->width*ctl(0)/100;
Ynew=image->height*ctl(0)/100;
for (y=0; y<image->height; y++){ // rows
for (x=0; x<image->width; x++){ // columns
p1=(int)x*image->width/Xnew;
q1=(int)y*image->height/Ynew;
for (z=0; z<3; z++){ // channels
for (i=-3;i<=3;i++) {
for (j=-3;j<=3;j++) {
Calc += (int)(src(p1-i,q1-j,z));
} //j
} //i
Calc /= 49;
pset(x, y, z, Calc);
} // channels
} // columns
} // rows
Thanks!
The first point is to use pointers to your data. Never use indexes at every pixel. When you write: src(p1-i,q1-j,z) or pset(x, y, z, Calc) how much computation is being made? Use pointers to data and manipulate those.
Second: your algorithm is wrong. You don't want an average filter, but you want to make a grid on your source image and for every grid cell compute the average and put it in the corresponding pixel of the output image.
The specific solution should be tailored to your data representation, but it could be something like this:
std::vector<uint32_t> accum(Xnew);
std::vector<uint32_t> count(Xnew);
uint32_t *paccum, *pcount;
uint8_t* pin = /*pointer to input data*/;
uint8_t* pout = /*pointer to output data*/;
for (int dr = 0, sr = 0, w = image->width, h = image->height; sr < h; ++dr) {
memset(paccum = accum.data(), 0, Xnew*4);
memset(pcount = count.data(), 0, Xnew*4);
while (sr * Ynew / h == dr) {
paccum = accum.data();
pcount = count.data();
for (int dc = 0, sc = 0; sc < w; ++sc) {
*paccum += *i;
*pcount += 1;
++pin;
if (sc * Xnew / w > dc) {
++dc;
++paccum;
++pcount;
}
}
sr++;
}
std::transform(begin(accum), end(accum), begin(count), pout, std::divides<uint32_t>());
pout += Xnew;
}
This was written using my own library (still in development) and it seems to work, but later I changed the variables names in order to make it simpler here, so I don't guarantee anything!
The idea is to have a local buffer of 32 bit ints which can hold the partial sum of all pixels in the rows which fall in a row of the output image. Then you divide by the cell count and save the output to the final image.
The first thing you should do is to set up a performance evaluation system to measure how much any change impacts on the performance.
As said precedently, you should not use indexes but pointers for (probably) a substantial
speed up & not simply average as a basic averaging of pixels is basically a blur filter.
I would highly advise you to rework your code to be using "kernels". This is the matrix representing the ratio of each pixel used. That way, you will be able to test different strategies and optimize quality.
Example of kernels:
https://en.wikipedia.org/wiki/Kernel_(image_processing)
Upsampling/downsampling kernel:
http://www.johncostella.com/magic/
Note, from the code it seems you apply a 3x3 kernel but initially done on a 7x7 kernel. The equivalent 3x3 kernel as posted would be:
[1 1 1]
[1 1 1] * 1/9
[1 1 1]

fftw analysing frequencies from mic input on pc

I am using fftw to analyse the frequency spectrum of audio input to a computer from the mic input. I am using portaudio c++ libraries to capture the windows of time-domain audio data and then fftw to do a real to complex r2c transformation of this data to the frequency domain. Below is my function which I call everytime I receive the block of data.
The sample rate is 44100 samples per second , the sample type is short (signed 16 bit integer)and I am taking 250ms blocks of data in each window. The fft resolution is therefore 4Hz.
The problem is , i'm not sure how to interpret the data which I am receiving after the transformation. When no audio is played , I am getting amplitudes of around 1000 to 4000 for every frequency component, as soon as audio is played from an instrument for example, all of the amplitudes go negative.
I have tried doing a normalisation before the fft, by dividing by the average power and then the data makes more sense. All amplitudes are from 200 to 500 when nothing is played, then for example if I play a tone of 76Hz, the amplitude for this component increases to around 2000. So that is something along the lines of what I expect, but still not sure if this process can be implemented better.
My question is, am I doing the right thing here? Must the data be normalised and am I doing it right? Why am I still receiving high amplitudes on the frequencies that are not being played. Has anyone any experience of doing something similar and maybe give some tips. Many thanks in advance.
void AudioProcessor::GetFFT(void* inputData, void* freqSpectrum)
{
double* input = (double*)inputData;
short* freq_spectrum = (short*)freqSpectrum;
fftPlan = fftw_plan_dft_r2c_1d(FRAMES_PER_BUFFER, input, complexOut, FFTW_ESTIMATE);
fftw_execute(fftPlan);
////
for (int k = 0; k < (FRAMES_PER_BUFFER + 1) / 2; ++k)
{
freq_spectrum[k] = (short)(sqrt(complexOut[k][0] * complexOut[k][0] + complexOut[k][1] * complexOut[k][1]));
}
if (FRAMES_PER_BUFFER % 2 == 0) /* frames per buffer is even number */
{
freq_spectrum[FRAMES_PER_BUFFER / 2] = (short)(sqrt(complexOut[FRAMES_PER_BUFFER / 2][0] * complexOut[FRAMES_PER_BUFFER / 2][0] + complexOut[FRAMES_PER_BUFFER / 2][1] * complexOut[FRAMES_PER_BUFFER / 2][1])); /* Nyquist freq. */
}
}

How to efficiently determine the minimum necessary size of a pre-rendered sine wave audio buffer for looping?

I've written a program that generates a sine-wave at a user-specified frequency, and plays it on a 96kHz audio channel. To save a few CPU cycles I employ the old trick of pre-rendering a short section of audio into a buffer, and then playing back the buffer in a loop, so that I can avoid calling the sin() function 96000 times per second for the duration of the program and just do simple memory-copying instead.
My problem is efficiently determining what the minimum usable size of this pre-rendered buffer would be. For some frequencies it is easy -- for example, an 8kHz sine wave can be perfectly represented by generating a 12-sample buffer and playing it in a looping, because (8000*12 == 96000). For other frequencies, however, a single cycle of the sine wave requires a non-integral number of samples to represent, and therefore looping a single cycle's worth of samples would cause unacceptable glitching.
For some of those frequencies, however, it's possible to get around that problem by pre-rendering more than one cycle of the sine wave and looping that -- if I can figure out how many cycles are required so that the number of cycles present in the buffer will be integral, while also guaranteeing that the number of samples in the buffer are integral. For example, a sine-wave frequency of 12.8kHz translates to a single-cycle buffer-size of 7.5 samples, which won't loop cleanly, but if I render two consecutive cycles of the sine wave into a 15-sample buffer, then I can cleanly loop the result.
My current approach to solving this issue is brute force: I try all possible cycle-counts and see if any of them result in a buffer size with an integral number of samples in it. I think that approach is unsatisfactory for the following reasons:
1) It's very inefficient. For example, the program shown below (which prints buffer-size results for 480,000 possible frequency values between 0Hz and 48kHz) takes 35 minutes to complete on my 2.7GHz machine. I think there must be a much faster way to do this.
2) I suspect that the results are not 100% accurate, due to floating-point errors.
3) The algorithm gives up if it can't find an acceptable buffer size less than 10 seconds long. (I could make the limit higher, but of course that would make the algorithm even slower).
So, is there any way to calculate the minimum-usable-buffer-size analytically, preferably in O(1) time? It seems like it should be easy, but I haven't been able to figure out what kind of math I should use.
Thanks in advance for any advice!
#include <stdio.h>
#include <math.h>
static const long long SAMPLES_PER_SECOND = 96000;
static const long long MAX_ALLOWED_BUFFER_SIZE_SAMPLES = (SAMPLES_PER_SECOND * 10);
// Returns the length of the pre-render buffer needed to properly
// loop a sine wave at the given frequence, or -1 on failure.
static int GetNumCyclesNeededForPreRenderedBuffer(float freqHz)
{
double oneCycleLengthSamples = SAMPLES_PER_SECOND/freqHz;
for (int count=1; (count*oneCycleLengthSamples) < MAX_ALLOWED_BUFFER_SIZE_SAMPLES; count++)
{
double remainder = fmod(oneCycleLengthSamples*count, 1.0);
if (remainder > 0.5) remainder = 1.0-remainder;
if (remainder <= 0.0) return count;
}
return -1;
}
int main(int, char **)
{
for (int i=0; i<48000*10; i++)
{
double freqHz = ((double)i)/10.0f;
int numCyclesNeeded = GetNumCyclesNeededForPreRenderedBuffer(freqHz);
if (numCyclesNeeded >= 0)
{
double oneCycleLengthSamples = SAMPLES_PER_SECOND/freqHz;
printf("For %.1fHz, use a pre-render-buffer size of %f samples (%i cycles, %f samples/cycle)\n", freqHz, (numCyclesNeeded*oneCycleLengthSamples), numCyclesNeeded, oneCycleLengthSamples);
}
else printf("For %.1fHz, there was no suitable pre-render-buffer size under the allowed limit!\n", freqHz);
}
return 0;
}
number_of_cycles/size_of_buffer = frequency/samples_per_second
This implies that if you can simplify your frequency/samples_per_second fraction, you can find the size of your buffer and the number of cycles in the buffer. If frequency and samples_per_second are integers, you can simplify the fraction by finding the greatest common divisor, otherwise you can use the method of continued fractions.
Example:
Say your frequency is 1234.5, and your samples_per_second is 96000. We can make these into two integers by multiplying by 10, so we get the ratio:
frequency/samples_per_second = 12345/960000
The greatest common divisor is 15, so it can be reduced to 823/64000.
So you would need 823 cycles in a 64000 sample buffer to reproduce the frequency exactly.