I have a 1024 samples and I chucked it into 32 chunks in order to perform FFT on it, below is the output from FFT:
(3.13704,2.94588) (12.9193,14.7706) (-4.4401,-6.21331) (-1.60103,-2.78147) (-0.84114,-1.86292) (-0.483564,-1.43068) (-0.272469,-1.17551) (-0.130891,-1.00437) (-0.0276415,-0.879568) (0.0523422,-0.782884) (0.117249,-0.704425) (0.171934,-0.638322) (0.219483,-0.580845) (0.261974,-0.529482) (0.300883,-0.48245) (0.337316,-0.438409) (0.372151,-0.396301) (0.40613,-0.355227) (0.439926,-0.314376) (0.474196,-0.27295) (0.509637,-0.23011) (0.54704,-0.184897) (0.587371,-0.136145) (0.631877,-0.0823468) (0.682262,-0.021441) (0.740984,0.0495408) (0.811778,0.135117) (0.900701,0.242606) (1.01833,0.384795) (1.18506,0.586337) (1.44608,0.901859) (1.92578,1.48171)
(-3.48153,2.52948) (-16.9298,9.92273) (6.93524,-3.19719) (3.0322,-1.05148) (1.98753,-0.477165) (1.49595,-0.206915) (1.20575,-0.047374) (1.01111,0.0596283) (0.869167,0.137663) (0.759209,0.198113) (0.669978,0.247168) (0.594799,0.288498) (0.52943,0.324435) (0.471015,0.356549) (0.417524,0.385956) (0.367437,0.413491) (0.319547,0.439819) (0.272834,0.4655) (0.226373,0.491042) (0.17926,0.516942) (0.130538,0.543728) (0.0791167,0.571997) (0.0236714,0.602478) (-0.0375137,0.636115) (-0.106782,0.674195) (-0.18751,0.718576) (-0.284836,0.772081) (-0.407084,0.839288) (-0.568795,0.928189) (-0.798009,1.0542) (-1.15685,1.25148) (-1.81632,1.61402)
(-1.8323,-3.89383) (-6.57464,-18.4893) (1.84103,7.4115) (0.464674,3.17552) (0.0962861,2.04174) (-0.0770633,1.50823) (-0.1794,1.19327) (-0.248036,0.982028) (-0.29809,0.827977) (-0.336865,0.708638) (-0.368331,0.611796) (-0.394842,0.530204) (-0.417894,0.459259) (-0.438493,0.395861) (-0.457355,0.337808) (-0.475018,0.283448) (-0.491906,0.231473) (-0.508378,0.180775) (-0.524762,0.130352) (-0.541376,0.0792195) (-0.558557,0.0263409) (-0.57669,-0.0294661) (-0.596242,-0.089641) (-0.617818,-0.156045) (-0.642245,-0.231222) (-0.670712,-0.318836) (-0.705033,-0.424464) (-0.748142,-0.55714) (-0.805167,-0.732645) (-0.885996,-0.981412) (-1.01254,-1.37087) (-1.24509,-2.08658)
I only included 3 chunks of 32 in order to prove they are each different values.
After taking this output and giving it to abs() function to calculate magnitude I noticed I get the same output for every chunk! (example below)
4.3034 19.6234 7.63673 3.20934 2.04401 1.51019 1.20668 1.01287 0.880002 0.784632 0.714117 0.661072 0.62093 0.590747 0.568584 0.553159 0.543646 0.539563 0.54071 0.547141 0.559178 0.577442 0.602943 0.63722 0.682599 0.742638 0.822946 0.932803 1.08861 1.32218 1.70426 2.42983
4.3034 19.6234 7.63673 3.20934 2.04401 1.51019 1.20668 1.01287 0.880002 0.784632 0.714117 0.661072 0.62093 0.590747 0.568584 0.553159 0.543646 0.539563 0.54071 0.547141 0.559178 0.577442 0.602943 0.63722 0.682599 0.742638 0.822946 0.932803 1.08861 1.32218 1.70426 2.42983
4.3034 19.6234 7.63673 3.20934 2.04401 1.51019 1.20668 1.01287 0.880002 0.784632 0.714117 0.661072 0.62093 0.590747 0.568584 0.553159 0.543646 0.539563 0.54071 0.547141 0.559178 0.577442 0.602943 0.63722 0.682599 0.742638 0.822946 0.932803 1.08861 1.32218 1.70426 2.42983
Why am I getting the exact same output out of different inputs? is this normal?
Here is a part of my code which I'm performing all of these calculations:
int main(int argc, char** argv)
{
int i;
double y;
const double Fs = 100;//How many time points are needed i,e., Sampling Frequency
const double T = 1 / Fs;//# At what intervals time points are sampled
const double f = 4;//frequency
int chuck_size = 32; // chunk size (N / 32=32 chunks)
Complex chuck[32];
int j = 0;
int counter = 0;
for (int i = 0; i < N; i++)
{
t[i] = i * T;
in[i] = { (0.7 * cos(2 * M_PI * f * t[i])), (0.7 * sin(2 * M_PI * f * t[i])) };// generate (complex) sine waveform
chuck[j] = in[i];
//compute FFT for each chunk
if (i + 1 == chuck_size) // for each set of 32 chunks, apply FFT and save it all in a 1d array (magnitude)
{
chuck_size += 32;
CArray data(chuck, 32);
fft(data);
j = 0;
for (int h = 0; h < 32; h++)
{
magnitude[counter] = abs(data[h]);
std::cout << abs(data[h]) << " ";
counter++;
}
printf("\n\n");
}
else
j++;
}
}
spectrogram (normalized):
Your signal is a sine wave. You chop it up. Each segment will have the same frequency components, just a different phase (shift). The FFT gives you both the magnitude and phase for each frequency component, but after abs only the magnitude remains. These magnitudes are necessarily the same for all your chunks.
I am trying to recognise a sequence of audio frames on an embedded system - an audio frame being a frequency or interpolation of two frequencies for a variable amount of time. I know the sounds I am trying to recognise (i.e. the start and end frequencies which are being linearly interpolated and the duration of each audio frame), but they are produced by a another embedded system so the microphone and speaker are cheap and somewhat inaccurate. The output is a square wave. Any suggestions how to go about doing this?
What I am trying to do now is to use FFT to get the magnitude of all frequencies, detect the peaks, look at the detection duration/2 ms ago and check if that somewhat matches an audio frame, and finally just checking if any sound I am looking for matched the sequence.
So far I used the FFT to process the microphone input - after applying a Hann window - and then assigning each frequency bin a coefficient that it's a peak based on how many standard deviations is away from the mean. This hasn't worked great since it thought there are peaks when it was silence in the room. Any ideas on how to more accurately detect the peaks? Also I think there are a lot of harmonics because of the square wave / interpolation? Can I do harmonic product spectrum if the peaks don't really line up at double the frequency?
Here I graphed noise (almost silent room) with somewhere in the interpolation of 2226 and 1624 Hz.
https://i.stack.imgur.com/R5Gs2.png
I sample at 91 microseconds -> 10989 Hz. Should I sample more often?
I added here samples of how the interpolation sounds when recorded on my laptop and on the embedded system.
https://easyupload.io/m/5l72b0
#define MIC_SAMPLE_RATE 10989 // Hz
#define AUDIO_SAMPLES_NUMBER 1024
MicroBitAudioProcessor::MicroBitAudioProcessor(DataSource& source) : audiostream(source)
{
arm_rfft_fast_init_f32(&fft_instance, AUDIO_SAMPLES_NUMBER);
buf = (float *)malloc(sizeof(float) * (AUDIO_SAMPLES_NUMBER * 2));
output = (float *)malloc(sizeof(float) * AUDIO_SAMPLES_NUMBER);
mag = (float *)malloc(sizeof(float) * AUDIO_SAMPLES_NUMBER / 2);
}
float henn(int i){
return 0.5 * (1 - arm_cos_f32(2 * 3.14159265 * i / AUDIO_SAMPLES_NUMBER));
}
int MicroBitAudioProcessor::pullRequest()
{
int s;
int result;
auto mic_samples = audiostream.pull();
if (!recording)
return DEVICE_OK;
int8_t *data = (int8_t *) &mic_samples[0];
int samples = mic_samples.length() / 2;
for (int i=0; i < samples; i++)
{
s = (int) *data;
result = s;
data++;
buf[(position++)] = (float)result;
if (position % AUDIO_SAMPLES_NUMBER == 0)
{
position = 0;
float maxValue = 0;
uint32_t index = 0;
// Apply a Henn window
for(int i=0; i< AUDIO_SAMPLES_NUMBER; i++)
buf[i] *= henn(i);
arm_rfft_fast_f32(&fft_instance, buf, output, 0);
arm_cmplx_mag_f32(output, mag, AUDIO_SAMPLES_NUMBER / 2);
}
}
return DEVICE_OK;
}
uint32_t frequencyToIndex(int freq) {
return (freq / ((uint32_t)MIC_SAMPLE_RATE / AUDIO_SAMPLES_NUMBER));
}
float MicroBitAudioProcessor::getFrequencyIntensity(int freq){
uint32_t index = frequencyToIndex(freq);
if (index <= 0 || index >= (AUDIO_SAMPLES_NUMBER / 2) - 1) return 0;
return mag[index];
}
So i am writing a beat detection algorithm, and it works cool, but it detects every beat (drum, voice, hi-hat, etc.).
And I am trying to take only a hi-hat beat sound.
Here is part of the code, where i am using FFT and trying to filter it:
for (int channel = 0; channel < numChannels; ++channel) {
for (int j = k * smallbuf_samples; j < (k + 1) * smallbuf_samples; ++j) {
smallbuffer[channel].push_back(bigbuffer[channel][j]);
}
}
fftw_complex x[smallbuf_samples];
fftw_complex y[smallbuf_samples];
for (int i = 0; i < smallbuf_samples; ++i) {
x[i][REAL] = smallbuffer[0][i];
x[i][IMAG] = smallbuffer[1][i];
}
fftw_plan plan = fftw_plan_dft_1d(smallbuf_samples, x, y, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(plan);
fftw_destroy_plan(plan);
fftw_cleanup();
std::vector<double> b;
for (int i = 80; i < smallbuf_samples; ++i) {
y[i][REAL] = 0;
y[i][IMAG] = 0;
}
for (int i = 0; i < smallbuf_samples; ++i) {
b.push_back(y[i][REAL] * y[i][REAL] + y[i][IMAG] * y[i][IMAG]);
}
for (int i = 0; i < smallbuf_samples / very_smallbuf_samples; ++i) {
double sum = 0;
int j;
for (j = i*(i+1)/2 * 108/13 + 22/13; j < (i+1)*(i+2)/2 * 108/13 + 22/13 && j < smallbuf_samples; ++j) {
sum += b[j];
}
Es[k].push_back((float) (j - (i*(i+1)/2 * 108/13 + 22/13)) / (float) smallbuf_samples * sum);
}
for (int channel = 0; channel < numChannels; ++channel) {
smallbuffer[channel].clear();
}
So, as you can see, i am filtering it by setting all the y samples index higher than 80 to 0 (because the frequency of hi-hat is around 300..3000 Hz).
Although, my beat algorithm detects voice, drums and other beats.
How to fix it and what am i doing wrong?
If I were you, I'd approach it differently. What you do now is you filter frequencies in an audible range, but you should filter then in an inaudible range, where the beat range lines. I.e. not "give me frequencies less than 300 Hz (less than 300 cycles per second)", but "filter the frequencies between 40 cycles per minute to, say, 200 cycles per minute, which is from 0.6 Hz to 3.3 Hz
But you can't analyze the audible signal for that. You need to create an inaudible "peaks" signal first:
go through the signal and take peaks only, build the second signal (it will be inaudible, because the frequency is too low and even if you could hear it - it won't make any sense to your ear)
analyze the resulting signal with FFT, set up for lower frequency range (say, 128 times slower than your 20-20000 you use to analyze audible signal, so you have 0.15-150 Hz result)
filter it to 0.6 to 3 Hz
and find the loudest peak in this range (or the lowest - here you'll need to experiment). This will be your beat. Multiply it by 60 to convert Hz to BPM
Of course, the window for your FFT must be much slower than an audible signal, here it must be:
at least 2 seconds to detect frequencies above 0.5 Hz
a size must be big to increase the resolution in lower frequencies
With this approach it doesn't really matter what exactly makes the beat: it could be a bass drum, just a base guitar or piano, i.e. the frequency of the beat making instrument doesn't matter (with your approach, where you filter high frequencies, songs with "only hi-hat" won't be detected
I'm trying to create a simple spectrum via QAudioProbe but my spectrum does not "feel the beat". every bin in spectrum just goes high and low.
Here is my code processing buffer from QAudioProbe :
void Waveform::bufferReady(QAudioBuffer buffer){
int n = buffer.frameCount();
cfg = kiss_fft_alloc(n, 0/*is_inverse_fft*/, NULL, NULL);
QAudioBuffer::S16U *frames = buffer.data<QAudioBuffer::S16U>();
qDeleteAll(m_finalData);
m_finalData.clear();
kiss_fft_cpx output[n],input[n];
for (int i=0; i < n; i++)
{
// frames[i].right contains the i-th sample from the right channel
// frames[i].left contains the i-th sample from the left channel
// if the signal is mono and not stereo, then only one of the channels will have data
qreal hanawindow = 0.5 * (1 - qCos((2 * M_PI * i) / (n - 1)));
input[i].r = frames[i].right * hanawindow; // WindowFunction
input[i].i = 0;
}
kiss_fft(cfg, input, output); // DO FFT
int step = n/(2*60); // distance to take value for bin from list. Here is 60bins
for(int i=0; i< n/2;i+=step){
qreal magnitude = qSqrt(output[i].i*output[i].i + output[i].r*output[i].r);
qreal amplitude = 0.15 * log10(magnitude);
amplitude = qMax(qreal(0.0), amplitude);
amplitude = qMin(qreal(1.0), amplitude);
m_finalData.append(new Sample(amplitude));
}
qDebug() << "Number of Bins : " << m_finalData.count();
emit dataReady();
}
I don't know what are problems with the above code. I've been trying a lot of other ways but the spectrum still weird.
I'm currently trying to display an audio spectrum using FFTW3 and SFML. I've followed the directions found here and looked at numerous references on FFT and spectrums and FFTW yet somehow my bars are almost all aligned to the left like below. Another issue I'm having is I can't find information on what the scale of the FFT output is. Currently I'm dividing it by 64 yet it still reaches beyond that occasionally. And further still I have found no information on why the output of the from FFTW has to be the same size as the input. So my questions are:
Why is the majority of my spectrum aligned to the left unlike the image below mine?
Why isn't the output between 0.0 and 1.0?
Why is the input sample count related to the fft output count?
What I get:
What I'm looking for:
const int bufferSize = 256 * 8;
void init() {
sampleCount = (int)buffer.getSampleCount();
channelCount = (int)buffer.getChannelCount();
for (int i = 0; i < bufferSize; i++) {
window.push_back(0.54f - 0.46f * cos(2.0f * GMath::PI * (float)i / (float)bufferSize));
}
plan = fftwf_plan_dft_1d(bufferSize, signal, results, FFTW_FORWARD, FFTW_ESTIMATE);
}
void update() {
int mark = (int)(sound.getPlayingOffset().asSeconds() * sampleRate);
for (int i = 0; i < bufferSize; i++) {
float s = 0.0f;
if (i + mark < sampleCount) {
s = (float)buffer.getSamples()[(i + mark) * channelCount] / (float)SHRT_MAX * window[i];
}
signal[i][0] = s;
signal[i][1] = 0.0f;
}
}
void draw() {
int inc = bufferSize / 2 / size.x;
int y = size.y - 1;
int max = size.y;
for (int i = 0; i < size.x; i ++) {
float total = 0.0f;
for (int j = 0; j < inc; j++) {
int index = i * inc + j;
total += std::sqrt(results[index][0] * results[index][0] + results[index][1] * results[index][1]);
}
total /= (float)(inc * 64);
Rectangle2I rect = Rectangle2I(i, y, 1, -(int)(total * max)).absRect();
g->setPixel(rect, Pixel(254, toColor(BLACK, GREEN)));
}
}
All of your questions are related to the FFT theory. Study the properties of FFT from any standard text/reference book and you will be able to answer your questions all by yourself only.
The least you can start from is here:
https://en.wikipedia.org/wiki/Fast_Fourier_transform.
Many FFT implementations are energy preserving. That means the scale of the output is linearly related to the scale and/or size of the input.
An FFT is a DFT is a square matrix transform. So the number of outputs will always be equal to the number of inputs (or half that by ignoring the redundant complex conjugate half given strictly real input), unless some outputs are thrown away. If not, it's not an FFT. If you want less outputs, there are ways to downsample the FFT output or post process it in other ways.