currently i m working on the Tizen IDE.
I had read the input data from the MicroPhone and Try to apply FFT on it...
but everytime I gets the nan output from the FFT.
Here is my code..
ShortBuffer *pBuffer1 = pData->AsShortBufferN();
fft = new KissFFT(BUFFER_SIZE);
std::vector<short> input(pBuffer1->GetPointer(),
pBuffer1->GetPointer() + BUFFER_SIZE); // this contains audio data
std::vector<float> specturm(BUFFER_SIZE);
fft->spectrum(input, specturm);
applying FFT
void KissFFT::spectrum(KissFFTO* fft, std::vector<short>& samples2,
std::vector<float>& spectrum) {
int len = fft->numSamples / 2 + 1;
kiss_fft_scalar* samples = (kiss_fft_scalar*) &samples2[0];
kiss_fftr(fft->config, samples, fft->spectrum);
for (int i = 0; i < len; i++) {
float re = scale(fft->spectrum[i].r) * fft->numSamples;
float im = scale(fft->spectrum[i].i) * fft->numSamples;
if (i > 0)
spectrum[i] = sqrtf(re * re + im * im) / (fft->numSamples / 2);
else
spectrum[i] = sqrtf(re * re + im * im) / (fft->numSamples);
AppLog("specturm %d",spectrum[i]); // everytime returns returns nan output
}
}
KissFFTO* KissFFT::create(int numSamples) {
KissFFTO* fft = new KissFFTO();
fft->config = kiss_fftr_alloc(numSamples/2, 0, NULL, NULL);
fft->spectrum = new kiss_fft_cpx[numSamples / 2 + 1];
fft->numSamples = numSamples;
return fft;
}
scaling
static inline float scale(kiss_fft_scalar val) {
if (val < 0)
return val * (1 / 32768.0f);
else
return val * (1 / 32767.0f);
}
AppLog("specturm %d",spectrum[i]); // everytime returns returns nan output
Try using %f rather than %d.
Related
I have a small program that outputs an rgb image. And I need it to be in .pfm format.
So, I have some data in the range [0, 255].
float * data;
data = new float[PixelWidth * PixelHeight * 3];
for (int i = 0; i < PixelWidth * PixelHeight * 3; i += 3) {
int idx = i / 3;
data[i] = img[idx].x;
data[i + 1] = img[idx].y;
data[i + 2] = img[idx].z;
}
(img[] here is Vec3[] of unsigned char)
Now I generate the image.
char sizes[256];
f = fopen("outputimage.pfm", "wb");
double scale = -1.0;
fprintf(f, "PF\n%d %d\n%lf\n", PixelWidth, PixelHeight, scale);
for (int i = 0; i < PixelWidth*PixelHeight*3; i++) {
float d = data[i];
fwrite((void *)&d, 1, 4, f);
}
fclose(f);
But somehow I get a grayscale image instead of RGB.
The data is fine. I tried to output it as .ppm and it works fine.
I guess the problem is with scaling, but I am not really sure how it should be done correctly.
To close the question.
I just had to convert all the values from [0-255] range to [0.0-1.0]. So, I divided each rgb value by 255.
I am trying to compute real world xyz coordinates using a Kinect v2 camera (in Linux), but my computation give me wrong results.
Here is the code:
cv::Point3f xyzWorld={0.0f};
xyzWorld.z = pointDepth;
xyzWorld.x = (float) ((float)x -(depthcx)) * xyzWorld.z / depthfx;
xyzWorld.y = (float) ((float)y - (depthcy)) * xyzWorld.z / depthfy;
xyzWorld.z = pointDepth;
return xyzWorld;
I think the problem is due to the depth value of fx, fy, cx and cy.
Can someone help me?
I am using freenect2.
Why not just use the OpenNi implementation
OniStatus VideoStream::convertDepthToWorldCoordinates(float depthX, float depthY, float depthZ, float* pWorldX, float* pWorldY, float* pWorldZ)
{
if (m_pSensorInfo->sensorType != ONI_SENSOR_DEPTH)
{
m_errorLogger.Append("convertDepthToWorldCoordinates: Stream is not from DEPTH\n");
return ONI_STATUS_NOT_SUPPORTED;
}
float normalizedX = depthX / m_worldConvertCache.resolutionX - .5f;
float normalizedY = .5f - depthY / m_worldConvertCache.resolutionY;
OniVideoMode videoMode;
int size = sizeof(videoMode);
getProperty(ONI_STREAM_PROPERTY_VIDEO_MODE, &videoMode, &size);
float const convertToMillimeters = (videoMode.pixelFormat == ONI_PIXEL_FORMAT_DEPTH_100_UM) ? 10.f : 1.f;
*pWorldX = (normalizedX * depthZ * m_worldConvertCache.xzFactor) / convertToMillimeters;
*pWorldY = (normalizedY * depthZ * m_worldConvertCache.yzFactor) / convertToMillimeters;
*pWorldZ = depthZ / convertToMillimeters;
return ONI_STATUS_OK;
}
and
OniStatus VideoStream::convertWorldToDepthCoordinates(float worldX, float worldY, float worldZ, float* pDepthX, float* pDepthY, float* pDepthZ)
{
if (m_pSensorInfo->sensorType != ONI_SENSOR_DEPTH)
{
m_errorLogger.Append("convertWorldToDepthCoordinates: Stream is not from DEPTH\n");
return ONI_STATUS_NOT_SUPPORTED;
}
*pDepthX = m_worldConvertCache.coeffX * worldX / worldZ + m_worldConvertCache.halfResX;
*pDepthY = m_worldConvertCache.halfResY - m_worldConvertCache.coeffY * worldY / worldZ;
*pDepthZ = worldZ;
return ONI_STATUS_OK;
}
and the world conversion cache :
void VideoStream::refreshWorldConversionCache()
{
if (m_pSensorInfo->sensorType != ONI_SENSOR_DEPTH)
{
return;
}
OniVideoMode videoMode;
int size = sizeof(videoMode);
getProperty(ONI_STREAM_PROPERTY_VIDEO_MODE, &videoMode, &size);
size = sizeof(float);
float horizontalFov;
float verticalFov;
getProperty(ONI_STREAM_PROPERTY_HORIZONTAL_FOV, &horizontalFov, &size);
getProperty(ONI_STREAM_PROPERTY_VERTICAL_FOV, &verticalFov, &size);
m_worldConvertCache.xzFactor = tan(horizontalFov / 2) * 2;
m_worldConvertCache.yzFactor = tan(verticalFov / 2) * 2;
m_worldConvertCache.resolutionX = videoMode.resolutionX;
m_worldConvertCache.resolutionY = videoMode.resolutionY;
m_worldConvertCache.halfResX = m_worldConvertCache.resolutionX / 2;
m_worldConvertCache.halfResY = m_worldConvertCache.resolutionY / 2;
m_worldConvertCache.coeffX = m_worldConvertCache.resolutionX / m_worldConvertCache.xzFactor;
m_worldConvertCache.coeffY = m_worldConvertCache.resolutionY / m_worldConvertCache.yzFactor;
}
struct WorldConversionCache
{
float xzFactor;
float yzFactor;
float coeffX;
float coeffY;
int resolutionX;
int resolutionY;
int halfResX;
int halfResY;
} m_worldConvertCache;
all taken from
OpenNI GitHub repository
The horizontal and vertical fov you can just get directly from the from the description of each frame.
Here is latest version that produce effect close to the desired
void DeleteFrequencies(short *audioDataBuffer, const int bufferSize, int lowestFrequency, int highestFrequency, int sampleRate )
{
int frequencyInHzPerSample = sampleRate / bufferSize;
/* __________________________
/* ___________ __________________________ filter kernel */
int nOfPointsInFilterKernel = (lowestFrequency / frequencyInHzPerSample) + ( bufferSize - highestFrequency / frequencyInHzPerSample);
U u;
double *RealX = new double[bufferSize];
double *ImmX = new double[bufferSize];
ShortArrayToDoubleArray(audioDataBuffer, RealX, bufferSize);
// padd with zeroes, so that inputSignalSamplesNumber + kernelLength - 1 = bufferSize
// convert to frequency domain
ForwardRealFFT(RealX, ImmX, bufferSize);
// cut frequences < 300 && > 3400
int Multiplyer = 1;
for (int i = 0; i < 512; ++i)
{
if (i * 8000 / 1024 > 3400 || i * 8000 / bufferSize < 300 )
{
RealX[i] = 0;
ImmX[i] = 0;
}
if (i < lowestFrequency / frequencyInHzPerSample || i > highestFrequency / frequencyInHzPerSample )
Multiplyer = 0;
else
Multiplyer = 1;
RealX[i] = RealX[i] * Multiplyer /*ReH[f]*/ - ImmX[i] * Multiplyer;
ImmX[i] = ImmX[i] * Multiplyer + RealX[i] * Multiplyer;
}
ReverseRealFFT(RealX, ImmX, bufferSize);
DoubleArrayToShortArray(RealX, audioDataBuffer, bufferSize);
delete [] RealX;
delete [] ImmX;
}
but why it works this way???
Important that I just started learning DSP, so I can be unaware of some important ideas
(i appologise for that, but I have task which I need to solve: i need to reduce background noise in the recorder speeach, I try to approach that by cuting off from recorded speech frequencies in ranges <300 && > 3700 (as human voice in [300;3700] range) I started from that method as it is simple, but I found
out - it can`t be applied (please see - https://dsp.stackexchange.com/questions/6220/why-is-it-a-bad-idea-to-filter-by-zeroing-out-fft-bins/6224#6224 - thanks to #SleuthEye for reference).
So can you please suggest me simple solution based on the FFT usage that will allow me at least remove given ranges of frequneces?
I am trying to implement ideal band pass filter. But it isn't working as I expect - only high frequencies are cut.
Here is my implementation description:
Read ampiltude values from PCM (raw) 16 bit format with sampling rate 8000 hz to the buffer of shorts of size 1024
Apply FFT to go from time domain to the frequency domain
Zero all frequencies < 300 and > 3700:
Inverse FFT
union U
{
char ch[2];
short sh;
};
std::fstream in;
std::fstream out;
short audioDataBuffer[1024];
in.open ("mySound.pcm", std::ios::in | std::ios::binary);
out.open("mySoundFilteres.pcm", std::ios::out | std::ios::binary);
int i = 0;
bool isDataInBuffer = true;
U u;
while (in.good())
{
int j = 0;
for (int i = 0; i < 1024 * 2; i+=2)
{
if (false == in.good() && j < 1024) // padd with zeroes
{
audioDataBuffer[j] = 0;
}
in.read((char*)&audioDataBuffer[j], 2);
cout << audioDataBuffer[j];
++j;
}
// Algorithm
double RealX [1024] = {0};
double ImmX [1024] = {0};
ShortArrayToDoubleArray(audioDataBuffer, RealX, 1024);
// convert to frequency domain
ForwardRealFFT(RealX, ImmX, 1024);
// cut frequences < 300 && > 3400
for (int i = 0; i < 512; ++i)
{
if (i * 8000 / 1024 > 3400 || i * 8000 / 1024 < 300 )
{
RealX[i] = 0;
ImmX[i] = 0;
}
}
ReverseRealFFT(RealX, ImmX, 1024);
DoubleArrayToShortArray(RealX, audioDataBuffer, 1024);
for (int i = 0; i < 1024; ++i) // 7 6 5 4 3 2 1 0 - byte order hence we write ch[1] then ch[0]
{
u.sh = audioDataBuffer[i];
out.write(&u.ch[1], 1);
out.write(&u.ch[0], 1);
}
}
in.close();
out.close();
when I write result to a file, open it audacity and check spectr analysis, and see that high frequences are cut, but low still remains (they starts from 0)
What I am doing wrong?
Here is sound frequency spectr before
Here is sound frequency after I zeroed needed values
Please help!
Update:
Here is code I came up with, what I should padd with Zeroes???
void DeleteFrequencies(short *audioDataBuffer, const int bufferSize, int lowestFrequency, int highestFrequency, int sampleRate )
{
// FFT must be the same length as output segment - to prevent circular convultion
//
int frequencyInHzPerSample = sampleRate / bufferSize;
/* __________________________
/* ___________ __________________________ filter kernel */
int nOfPointsInFilterKernel = (lowestFrequency / frequencyInHzPerSample) + ( bufferSize - highestFrequency / frequencyInHzPerSample);
U u;
double *RealX = new double[bufferSize];
double *ImmX = new double[bufferSize];
ShortArrayToDoubleArray(audioDataBuffer, RealX, bufferSize);
// padd with zeroes, so that inputSignalSamplesNumber + kernelLength - 1 = bufferSize
// convert to frequency domain
ForwardRealFFT(RealX, ImmX, bufferSize);
// cut frequences < 300 && > 3400
int Multiplyer = 1;
for (int i = 0; i < 512; ++i)
{
/*if (i * 8000 / 1024 > 3400 || i * 8000 / bufferSize < 300 )
{
RealX[i] = 0;
ImmX[i] = 0;
}*/
if (i < lowestFrequency / frequencyInHzPerSample || i > highestFrequency / frequencyInHzPerSample )
Multiplyer = 0;
else
Multiplyer = 1;
RealX[i] = RealX[i] * Multiplyer /*ReH[f]*/ - ImmX[i] * Multiplyer;
ImmX[i] = ImmX[i] * Multiplyer + RealX[i] * Multiplyer;
}
ReverseRealFFT(RealX, ImmX, bufferSize);
DoubleArrayToShortArray(RealX, audioDataBuffer, bufferSize);
delete [] RealX;
delete [] ImmX;
}
it produce the following spectrum (low frequencies are cut, but high not)
void ForwardRealFFT(double* RealX, double* ImmX, int nOfSamples)
{
short nh, i, j, nMinus1, nDiv2, nDiv4Minus1, im, ip, ip2, ipm, nOfCompositionSteps, LE, LE2, jm1;
double ur, ui, sr, si, tr, ti;
// Step 1 : separate even from odd points
nh = nOfSamples / 2 - 1;
for (i = 0; i <= nh; ++i)
{
RealX[i] = RealX[2*i];
ImmX[i] = RealX[2*i + 1];
}
// Step 2: calculate nOfSamples/2 points using complex FFT
// advantage in efficiency, as nOfSamples/2 requires 1/2 of the time as nOfSamples point FFT
nOfSamples /= 2;
ForwardDiscreteFT(RealX, ImmX, nOfSamples );
nOfSamples *= 2;
// Step 3: even/odd frequency domain decomposition
nMinus1 = nOfSamples - 1;
nDiv2 = nOfSamples / 2;
nDiv4Minus1 = nOfSamples / 4 - 1;
for (i = 1; i <= nDiv4Minus1; ++i)
{
im = nDiv2 - i;
ip2 = i + nDiv2;
ipm = im + nDiv2;
RealX[ip2] = (ImmX[i] + ImmX[im]) / 2;
RealX[ipm] = RealX[ip2];
ImmX[ip2] = -(RealX[i] - RealX[im]) / 2;
ImmX[ipm] = - ImmX[ip2];
RealX[i] = (RealX[i] + RealX[im]) / 2;
RealX[im] = RealX[i];
ImmX[i] = (ImmX[i] - ImmX[im]) / 2;
ImmX[im] = - ImmX[i];
}
RealX[nOfSamples * 3 / 4] = ImmX[nOfSamples / 4];
RealX[nDiv2] = ImmX[0];
ImmX[nOfSamples * 3 / 4] = 0;
ImmX[nDiv2] = 0;
ImmX[nOfSamples / 4] = 0;
ImmX[0] = 0;
// 3-rd step: combine the nOfSamples frequency spectra in the exact reverse order
// that the time domain decomposition took place
nOfCompositionSteps = log((double)nOfSamples) / log(2.0);
LE = pow(2.0,nOfCompositionSteps);
LE2 = LE / 2;
ur = 1;
ui = 0;
sr = cos(M_PI/LE2);
si = -sin(M_PI/LE2);
for (j = 1; j <= LE2; ++j)
{
jm1 = j - 1;
for (i = jm1; i <= nMinus1; i += LE)
{
ip = i + LE2;
tr = RealX[ip] * ur - ImmX[ip] * ui;
ti = RealX[ip] * ui + ImmX[ip] * ur;
RealX[ip] = RealX[i] - tr;
ImmX[ip] = ImmX[i] - ti;
RealX[i] = RealX[i] + tr;
ImmX[i] = ImmX[i] + ti;
}
tr = ur;
ur = tr * sr - ui * si;
ui = tr * si + ui * sr;
}
}
Fast convolution filtering with an FFT/IFFT requires zero padding to at least twice the length of the filter (and usually to the next power of 2 for performance reasons) and then using overlap add or overlap save methods to remove circular convolution artifacts.
You may want to have a look at this answer for an explanation for the effects you are observing.
Otherwise, the 'ideal' filter you are trying to achieve is more a mathematical tool than a practical implementation since the rectangular function in the frequency domain (with a zero-transition and infinite stopband attenuation) corresponds to an infinite length impulse response in the time domain.
To obtain a more practical filter, you must first define desired filter characteristics such as transition width and stopband attenuation based on your specific application needs.
Based on these specifications, the filter coefficients can be derived using one of various filter design methods such as:
Window method
Frequency sampling method
Parks-McClellan method
Application of the bilinear-transform on an analog template
...
Perhaps the closest to what you're doing is the Window method. Using that method, something as simple as a triangular window can help increase the stopband attenuation, but you may want to experiment with other window choices (many available from the same link). Increasing the window length would help reduce the transition width.
Once you have completed your filter design, you can apply the filter in the frequency-domain using the overlap-add method or the overlap-save method. Using either of these methods, you would split your input signal in chunks of length L, and pad to some convenient size N >= L+M-1, where M is the number of filter coefficients (for example if you have a filter with 42 coefficients, you might choose N = 128, from which L = N-M+1 = 87).
After doing the real FFT you get your spectral data twice: Once in the bins from 0 to 512, and a mirror spectrum in the bins 513 to 1024. Your code however only clears the lower spectrum.
Try this:
for (int i = 0; i < 512; ++i)
{
if (i * 8000 / 1024 > 3400 || i * 8000 / 1024 < 300 )
{
RealX[i] = 0;
ImmX[i] = 0;
// clear mirror spectrum as well:
RealX[1023-i] = 0;
ImmX[1023-i] = 0;
}
}
That may help unless your FFT implementation does this step automatically.
Btw, just zeroing out frequency bins like you did is not a good way to do such filters. Expect a very nasty phase response and a lot of ringing in your signal.
I'm doing skin detection algorithm according to this article. There are two models at page 21: Mixture of Gaussian Skin and Non-skin Color Model.
The first model for skin detection works exellent.
There are examples:
1)Orginal image:
2) Skin mask
But the non-skin model gives wrong results:
Here is my code:
ipl_image_wrapper NudityDetector::filterPixelsWithGMM(const float covarinceMatrix[][3], const float meanMatrix[][3], const float weightVector[], const float probValue) const
{
ipl_image_wrapper mask = cvCreateImage(cvGetSize(m_image.get()), IPL_DEPTH_8U, 1);
double probability = 0.0;
float x[3] = { 0, 0, 0};
for(int i = 0; i < m_image.get()->height; ++i)
{
for(int j = 0; j < m_image.get()->width; ++j)
{
if (m_image.get()->nChannels == 3)
{
x[0] = (reinterpret_cast<uchar*>(m_image.get()->imageData + i * m_image.get()->widthStep))[j * 3 + 2];
x[1] = (reinterpret_cast<uchar*>(m_image.get()->imageData + i * m_image.get()->widthStep))[j * 3 + 1];
x[2] = (reinterpret_cast<uchar*>(m_image.get()->imageData + i * m_image.get()->widthStep))[j * 3];
double cov_det = 0.0;
double power = 0.0;
double A1 = 0.0;
double A2 = 0.0;
double A3 = 0.0;
probability = 0;
for (int k = 0; k < 16; ++k)
{
cov_det = covarinceMatrix[k][0] * covarinceMatrix[k][1] * covarinceMatrix[k][2];
A1 = covarinceMatrix[k][1] * covarinceMatrix[k][2];
A2 = covarinceMatrix[k][0] * covarinceMatrix[k][2];
A3 = covarinceMatrix[k][0] * covarinceMatrix[k][1];
power =(std::pow((x[0] - meanMatrix[k][0]), 2) * A1 +
std::pow((x[1] - meanMatrix[k][1]), 2) * A2 +
std::pow((x[2] - meanMatrix[k][2]), 2) * A3 ) / (2 * cov_det);
probability += 100 * weightVector[k] *std::exp(-power) / (std::pow(2 * M_PI, 3/2) * std::pow(cov_det, 1/2));
}
if ( probability < probValue)
{
(reinterpret_cast<uchar*>(mask.get()->imageData + i * mask.get()->widthStep))[j] = 0;
}
else
{
(reinterpret_cast<uchar*>(mask.get()->imageData + i * mask.get()->widthStep))[j] = 255;
}
}
}
}
cvDilate(mask.get(), mask.get(), NULL, 2);
cvErode(mask.get(), mask.get(), NULL, 1);
return mask;
}
ipl_image_wrapper NudityDetector::detectSkinWithGMM(const float probValue) const
{
//matrices are from article
ipl_image_wrapper mask = filterPixelsWithGMM(COVARIANCE_SKIN_MATRIX, MEAN_SKIN_MATRIX, SKIN_WEIGHT_VECTOR, probValue);
return mask;
}
ipl_image_wrapper NudityDetector::detectNonSkinWithGMM(const float probValue) const
{
//matrices are from article
ipl_image_wrapper mask = filterPixelsWithGMM(COVARIANCE_NON_SKIN_MATRIX, MEAN_NON_SKIN_MATRIX, NON_SKIN_WEIGHT_VECTOR, probValue);
return mask;
}
What I'm doing wrong? Maybe I misunderstand the meaning of tre article? Or I translated formula wrong in the code?
Thank you in advance!
In fact, there seems to be nothing wrong with the results, non-skin model correctly identifies non-skin regions as 255 and skin regions as 0. You may just need to tune parameter probValue to a lower value to get rid of some false negatives (small non-skin regions)
GMM may not be an effective approach for skin detection and you may employ some edge intensity information as a regularization parameter so that detected regions will not be fragmented.
currently i m working on the Tizen IDE.
I had read the input data from the microPhone and apply the FFT on it... but everytime i gets the nan output.
here is my code..
ShortBuffer *pBuffer1 = pData->AsShortBufferN();
fft = new KissFFT(BUFFER_SIZE);
std::vector<short> input(pBuffer1->GetPointer(),
pBuffer1->GetPointer() + BUFFER_SIZE); // this contains audio data
std::vector<float> specturm(BUFFER_SIZE);
fft->spectrum(input, specturm);
applying FFT..
void KissFFT::spectrum(KissFFTO* fft, std::vector<short>& samples2,
std::vector<float>& spectrum) {
int len = fft->numSamples / 2 + 1;
kiss_fft_scalar* samples = (kiss_fft_scalar*) &samples2[0];
kiss_fftr(fft->config, samples, fft->spectrum);
for (int i = 0; i < len; i++) {
float re = scale(fft->spectrum[i].r) * fft->numSamples;
float im = scale(fft->spectrum[i].i) * fft->numSamples;
if (i > 0)
spectrum[i] = sqrtf(re * re + im * im) / (fft->numSamples / 2);
else
spectrum[i] = sqrtf(re * re + im * im) / (fft->numSamples);
AppLog("specturm %d",spectrum[i]); // everytime returns returns nan output
}
}
KissFFTO* KissFFT::create(int numSamples) {
KissFFTO* fft = new KissFFTO();
fft->config = kiss_fftr_alloc(numSamples/2, 0, NULL, NULL);
fft->spectrum = new kiss_fft_cpx[numSamples / 2 + 1];
fft->numSamples = numSamples;
return fft;
}
In fft->config there should be some parameters about the size of FFT like 2048, 4096, i.e. powers of 2. If you increase this value, you can get more resolution in frequency.