This is for a python computational physics class. We are given two .wav files that contain files of a harp and a piano playing the same note. We are supposed to "load the files and take the FFT of the amplitude. From the FFT determine the frequency of the fundamental for both instruments to 4 sig figs."
Here is what I have done.
import scipy.io.wavfile as sciwav
import matplotlib.pyplot as plt
#import data from .wav file. This function returns the sampling rate and the data in an array.
harp_rate,harp_data=sciwav.read('/Users/williamweiss2/Desktop/Test2/harp.wav',mmap=False)
piano_rate,piano_data=sciwav.read('/Users/williamweiss2/Desktop/Test 2/piano.wav',mmap=False)
#perform the FFT on both sets of data and graph to find the index of the first harmonic.
plt.figure(1)
p=rfft(piano_data)
h=rfft(harp_data)
plt.subplot(121)
plot(abs(p),'b')
title('Piano FFT')
xlim(0,100000)
plt.subplot(122)
plot(abs(h),'g')
title('Harp FFT')
This all works just fine. Now, to find freq. of note played this is what I was taught to do.
x value of first spike in FFT graph = Index.
deltaF = Sampling Rate / # of samples.
Index * deltaF = Freq. of note played.
I followed these steps and got two drastically different notes. Does anyone see a misstep in my process? Any ideas are appreciated even if they go over my head. I am just a junior getting a Physics degree. Thanks very much in advance.
Related
I am writing a code on raspberry pi in python to compare two images using mean squared error. The project is an personal home security thing.
My main goal is to detect a change between the images that I capture from pi camera(if something is added to the current image or something removed from the image) but right now my code is too sensitive. It is affected by change in background lighting, which I do not want.
I have two options in front of me, to either scrape my current logic and start a new one or improve my current logic to account for these noise(if I can call them that). I am searching for ways to improve my logic but I wanted some guidance on how to go about it.
My biggest fear being, am I wasting time kicking a dead horse or should I just look for some other algorithm to detect a change in image or should I use edge detection
import numpy as np
import cv2
import os
from threading import Thread
######Function Definition########################################
def mse(imageA, imageB):
# the 'Mean Squared Error' between the two images is the
# sum of the squared difference between the two images;
# NOTE: the two images must have the same dimension
err = np.sum((imageA.astype("int") - imageB.astype("int")) ** 2)
err /= int(imageA.shape[0] * imageA.shape[1])
# return the MSE, the lower the error, the more "similar"
# the two images are
return err
def compare_images(imageA, imageB):
# compute the mean squared error
m = mse(imageA, imageB)
print(m)
def capture_image():
##shell command to click photos
os.system(image_args)
##original image Path variable
original_image_path= "/home/pi/Downloads/python-compare-two-images/originalimage.png"
##original_image_args is a shell command to click photos
original_image_args="raspistill -o "+original_image_path+" -w 320 -h 240 -q 50 -t 500"
os.system(original_image_args)
##read the greyscale of the image in to the variable original_image
original_image=cv2.imread(original_image_path, 0)
##Three images
image_args="raspistill -o /home/pi/Downloads/python-compare-two-images/Test_Images/image.png -w 320 -h 240 -q 50 --nopreview -t 10 --exposure sports"
image_path="/home/pi/Downloads/python-compare-two-images/Test_Images/"
image1_name="image.png"
#created a new thread to take pictures
My_Thread=Thread(target=capture_image)
#Thread started
My_Thread.start()
flag = 0
while(True):
if(My_Thread.isAlive()==True):
flag=0
else:
flag=1
if(flag==1):
flag=0
image1 = cv2.imread((image_path+image1_name), 0)
My_Thread=Thread(target=capture_image)
My_Thread.start()
compare_images(original_image, image1)
A first improvement is to adjust a gain to compensate for the global variation of the light. Like taking the average intensity of the two images and correcting one with the ratio of the intensities.
This can fail in case of an change of the foreground, which will influence the global average. If that change in the foreground doesn't have a too large area, you can get an estimate by robust fitting of a linear model y = a.x.
A worse, but unfortunately common, scenario, is when the background illumination changes in a non-uniform way. A partial solution is to try and fit a non-uniform gain model such as one obtained by bilinear interpolation between gains estimated at the corners, or a finer subdivision of the image.
The topic of change detection is a very studied field. One of the basic options is to model each one of the pixels as a Gaussian distribution by sampling a lot of images for each pixel and calculate the mean and variance of each pixel.
For the pixels that tend to change when there is change in lighting the variance of the pixels will be bigger than the ones that don't change as much.
In order to detect movement for a certain pixel you just need to choose what is the probability you consider as an unordarinry change in the pixel value and use the Gaussain distribution you calculated to find what is the corresponding value that is considered unordarinry.
To make this solution efficient for your raspberry pi you will need to first do an "offline" calculation of the values for each pixel that will be the threshold values for which the change in the pixel value is considered movement and store them in a file and than in the "online" sage you will just compare each pixel to the calculated value.
For the "offline" stage i recommend using images that were recorder during the entire day in order to get all the variation you need per pixel. This stage of curse can be done on your computer and only the output file will be uploaded to the raspberry pi
In working on a project I came across the need to generate various waves, accurately. I thought that a simple sine wave would be the easiest to begin with, but it appears that I am mistaken. I made a simple program that generates a vector of samples and then plays those samples back so that the user hears the wave, as a test. Here is the relevant code:
vector<short> genSineWaveSample(int nsamples, float freq, float amp) {
vector<short> samples;
for(float i = 0; i <= nsamples; i++) {
samples.push_back(amp * sinx15(freq*i));
}
return samples;
}
I'm not sure what the issue with this is. I understand that there could be some issue with the vector being made of shorts, but that's what my audio framework wants, and I am inexperienced with that kind of library and so do not know what to expect.
The symptoms are as follows:
frequency not correct
ie: given freq=440, A4 is not the note played back
strange distortion
Most frequencies do not generate a clean wave. 220, 440, 880 are all clean, most others are distorted
Most frequencies are shifted upwards considerably
Can anyone give advice as to what I may be doing wrong?
Here's what I've tried so far:
Making my own sine function, for greater accuracy.
I used a 15th degree Taylor Series expansion for sin(x)
Changed the sample rate, anything from 256 to 44100, no change can be heard given the above errors, the waves are simply more distorted.
Thank you. If there is any information that can help you, I'd be obliged to provide it.
I suspect that you are passing incorrect values to your sin15x function. If you are familiar with the basics of signal processing the Nyquist frequency is the minimum frequency at which you can faithful reconstruct (or in your case construct) a sampled signal. The is defined as 2x the highest frequency component present in the signal.
What this means for your program is that you need at last 2 values per cycle of the highest frequency you want to reproduce. At 20Khz you'd need 40,000 samples per second. It looks like you are just packing a vector with values and letting the playback program sort out the timing.
We will assume you use 44.1Khz as your playback sampling frequency. This means that a snipet of code producing one second of a 1kHz wave would look like
DataStructure wave = new DataStructure(44100) // creates some data structure of 44100 in length
for(int i = 0; i < 44100; i++)
{
wave[i] = sin(2*pi * i * (frequency / 44100) + pi / 2) // sin is in radians, frequency in Hz
}
You need to divide by the frequency, not multiply. To see this, take the case of a 22,050 Hz frequency value is passed. For i = 0, you get sin(0) = 1. For i = 1, sin(3pi/2) = -1 and so on are so forth. This gives you a repeating sequence of 1, -1, 1, -1... which is the correct representation of a 22,050Hz wave sampled at 44.1Khz. This works as you go down in frequency but you get more and more samples per cycle. Interestingly though this does not make a difference. A sinewave sampled at 2 samples per cycle is just as accurately recreated as one that is sampled 1000 times per second. This doesn't take into account noise but for most purposes works well enough.
I would suggest looking into the basics of digital signal processing as it a very interesting field and very useful to understand.
Edit: This assumes all of those parameters are evaluated as floating point numbers.
Fundamentally, you're missing a piece of information. You don't specify the amount of time over which you want your samples taken. This could also be thought of as the rate at which the samples will be played by your system. Something roughly in this direction will get you closer, for now, though.
samples.push_back(amp * std::sin(M_PI / freq *i));
How can I draw an spectrum for an given audio file with Bass library?
I mean the chart similar to what Audacity generates:
I know that I can get the FFT data for given time t (when I play the audio) with:
float fft[1024];
BASS_ChannelGetData(chan, fft, BASS_DATA_FFT2048); // get the FFT data
That way I get 1024 values in array for each time t. Am I right that the values in that array are signal amplitudes (dB)? If so, how the frequency (Hz) is associated with those values? By the index?
I am an programmer, but I am not experienced with audio processing at all. So I don't know what to do, with the data I have, to plot the needed spectrum.
I am working with C++ version, but examples in other languages are just fine (I can convert them).
From the documentation, that flag will cause the FFT magnitude to be computed, and from the sounds of it, it is the linear magnitude.
dB = 10 * log10(intensity);
dB = 20 * log10(pressure);
(I'm not sure whether audio file samples are a measurement of intensity or pressure. What's a microphone output linearly related to?)
Also, it indicates the length of the input and the length of the FFT match, but half the FFT (corresponding to negative frequencies) is discarded. Therefore the highest FFT frequency will be one-half the sampling frequency. This occurs at N/2. The docs actually say
For example, with a 2048 sample FFT, there will be 1024 floating-point values returned. If the BASS_DATA_FIXED flag is used, then the FFT values will be in 8.24 fixed-point form rather than floating-point. Each value, or "bin", ranges from 0 to 1 (can actually go higher if the sample data is floating-point and not clipped). The 1st bin contains the DC component, the 2nd contains the amplitude at 1/2048 of the channel's sample rate, followed by the amplitude at 2/2048, 3/2048, etc.
That seems pretty clear.
I write application where I must process digital signal - array of double. I must the signal decimate, filter etc.. I found a project gnuradio where are functions for this problem. But I can't figure how to use them correctly.
I need signal decimate (for example from 250Hz to 200Hz). The function should be similar to resample function in Matlab. I found, the classes for it are:
rational_resampler_base_fff Class source
fir_filter_fff Class source
...
Unfortunately I can't figure how to use them.
gnuradio and shared library I have installed
Thanks for any advice
EDIT to #jcoppens
Thank you very much for you help.
But I must process signal in my code. I find classes in gnuradio which can solve my problem, but I need help how set them.
Functions which I must set are:
low_pass(doub gain, doub sampling_freq, doub cutoff_freq, doub transition_width, window, beta)
where:
use "window method" to design a low-pass FIR filter
gain: overall gain of filter (typically 1.0)
sampling_freq: sampling freq (Hz)
cutoff_freq: center of transition band (Hz)
transition_width: width of transition band (Hz).
The normalized width of the transition band is what sets the number of taps required. Narrow –> more taps
window_type: What kind of window to use. Determines maximum attenuation and passband ripple.
beta: parameter for Kaiser window
I know, I must use window = KAISER and beta = 5, but for the rest I'm not sure.
The func which I use are: low_pass and pfb_arb_resampler_fff::filter
UPDATE:
I solved the resampling using libsamplerate
I need signal decimate (for example from 250Hz to 200Hz)
WARNING: I expressed the original introductory paragraph incorrectly - my apologies.
As 250 Hz is not related directly to 200 Hz, you have to do some tricks to convert 250Hz into 200Hz. Inserting 4 interpolated samples in between the 250Hz samples, lowers the frequency to 50Hz. Then you can raise the frequency to 200Hz again by decimating by a factor 4.
For this you need the "Rational Resampler", where you can define the subsample and decimate factors. Something like this:
This means you would have to do something similar if you use the library. Maybe it's even simpler to do it without the library. Interpolate linearly between the 250 Hz samples (i.e. insert 4 extra samples between each), then decimate by selecting each 4th sample.
Note: There is a Signal Processing forum on stackexchange - maybe this question might fall in that category...
More information: If you only have to resample your input data, and you do not need the actual gnuradio program, then have a look at this document:
https://ccrma.stanford.edu/~jos/resample/resample.pdf
There are several links to other documents, and a link to libresample, libresample4, and others, which may be of use to you. Another, very interesting, page is:
http://www.dspguru.com/dsp/faqs/multirate/resampling
Finally, from the same source as the pdf above, check their snd program. It may solve your problem without writing any software. It can load floating point samples, resample, and save again:
http://ccrma.stanford.edu/planetccrma/software/soundapps.html#SECTION00062100000000000000
EDIT: And yet another solution - maybe the simplest of all: Use Matlab (or the free Octave version):
pkg load signal
t = linspace(0, 10*pi, 50); % Generate a timeline - 5 cycles
s = sin(t); % and the sines -> 250 Hz
tr = resample(s, 5, 4); % Convert to 200 Hz
plot(t, s, 'r') % Plot 250 Hz in red
hold on
plot(t, tr(1:50)) % and resampled in blue
Will give you:
How might one generate audio at runtime using C++? I'm just looking for a starting point. Someone on a forum suggested I try to make a program play a square wave of a given frequency and amplitude.
I've heard that modern computers encode audio using PCM samples: At a give rate for a specific unit of time (eg. 48 kHz), the amplitude of a sound is recorded at a given resolution (eg. 16-bits). If I generate such a sample, how do I get my speakers to play it? I'm currently using windows. I'd prefer to avoid any additional libraries if at all possible but I'd settle for a very light one.
Here is my attempt to generate a square wave sample using this principal:
signed short* Generate_Square_Wave(
signed short a_amplitude ,
signed short a_frequency ,
signed short a_sample_rate )
{
signed short* sample = new signed short[a_sample_rate];
for( signed short c = 0; c == a_sample_rate; c++ )
{
if( c % a_frequency < a_frequency / 2 )
sample[c] = a_amplitude;
else
sample[c] = -a_amplitude;
}
return sample;
}
Am I doing this correctly? If so, what do I do with the generated sample to get my speakers to play it?
Your loop has to use c < a_sample_rate to avoid a buffer overrun.
To output the sound you call waveOutOpen and other waveOut... functions. They are all listed here:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd743834(v=vs.85).aspx
The code you are using generates a wave that is truly square, binary kind of square, in short the type of waveform that does not exist in real life. In reality most (pretty sure all) of the sounds you hear are a combination of sine waves at different frequencies.
Because your samples are created the way they are they will produce aliasing, where a higher frequency masquerades as a lower frequency causing audio artefacts. To demonstrate this to yourself write a little program which sweeps the frequency of your code from 20-20,000hz. You will hear that the sound does not go up smoothly as it raises in frequency. You will hear artefacts.
Wikipedia has an excellent article on square waves: https://en.m.wikipedia.org/wiki/Square_wave
One way to generate a square wave is to perform an inverse Fast Fourier Transform which transforms a series of frequency measurements into a series of time based samples. Then generating a square wave is a matter of supplying the routine with a collection of the measurements of sin waves at different frequencies that make up a square wave and the output is a buffer with a single cycle of the waveform.
To generate audio waves is computationally expensive so what is often done is to generate arrays of audio samples and play them back at varying speeds to play different frequencies. This is called wave table synthesis.
Have a look at the following link:
https://www.earlevel.com/main/2012/05/04/a-wavetable-oscillator%E2%80%94part-1/
And some more about band limiting a signal and why it’s necessary:
https://dsp.stackexchange.com/questions/22652/why-band-limit-a-signal