The best algorithm for filtering noise from a PCM 16-bit wave? - pcm

I have produced the FFT from the PCM wave. What is the best way to filter out noise?
Thanks for your time and responses,
dk

A very broad and very technical question. A quick and dirty way to get rid of hiss would be to get rid of the high frequencies (low-pass filter).

If the noise is hissy and you don't need linear-phase and the machine uses C programming, a quick-and-dirty low pass filter can be this...
signed short lowpass(signed short input)
{
static signed short last_sample=0;
signed short retvalue=(input + (last_sample * 7)) >> 3;
last_sample=retvalue;
return retvalue;
}
If the noise is power grid/mains hum you can delay the audio by 735 samples at 44,100 samples per second (for 60 hertz power) and return delayed_sample-input;

Noise is mostly stationary spectral components of the signal. Hoping you have speech or music as the wanted signal. You need to subtract the noise spectra from the original spectra.
Typical noise suppression needs a VAD (voice activity detection) module. Hope your FFT is windowed and overlapped, if not please do so. One simpler method of noise suppression is using minimum statistics, as described by Rainer Martin
Where the algorithm tracks spectral minima in each frequency band without any distinction between speech activity and speech pause. By minimizing a conditional mean square estimation error criterion in each time step we derive the optimal smoothing parameter for recursive smoothing of the power spectral density of the noisy speech signal. Details are in http://www.ind.rwth-aachen.de/fileadmin/publications/martin01c.pdf
“Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics”
At the same time you could use audio editor, Audacity http://audacity.sourceforge.net/about/ to test if this noise could be removed. It has a noise suppression feature under Effects menu

Related

How to plot spectrogram from an array or (vector,list etc) containing raw data?

I have been working to find temporal displacement between audio signals using a spectrogram. I have a short array containing data of a sound wave (pulses at specific frequencies). Now I want to plot spectrogram from that array. I have followed this steps (Spectrogram C++ library):
It would be fairly easy to put together your own spectrogram. The steps are:
window function (fairly trivial, e.g. Hanning)
FFT (FFTW would be a good choice but if licensing is an issue then go for Kiss FFT or
similar)
calculate log magnitude of frequency domain components(trivial: log(sqrt(re * re + im * im))
Now after performing these 3 steps, I am stuck at how to plot the spectrogram from this available data? Being naive in this field, I need some clear steps ahead to plot the spectrogram.
I know that a simple spectrogram has Frequency at Y-Axis, time at X-axis and magnitude as the color intensity.
But how do I get these three things to plot the spectrogram? (I want to observe and analyze data behind spectral peaks(what's the value on Y-axis and X-axis), the main purpose of plotting spectrogram).
Regards,
Khubaib

Double integrating acceleration in C++ using a 9DOF IMU with sensor fusion

I've spent a few hours doing research on numeric integration and velocity/position estimation but I couldn't really find an answer that would be either understandable by my brain or appropriate for my situation.
I have an IMU (Inertial Measurement Unit) that has a gyro, an accelerometer and a magnetometer. All those sensors are in fusion, which means for example using the gyro I'm able to compensate for the gravity in the accelerometer readings and the magnetometer compensates the drift.
In other words, I can get pure acceleration readings using such setup.
Now, I'm trying to accurately estimate the position based on acceleration, which as you may know requires double integration, and there are various methods of doing that. But I don't know which would be the most appropriate here.
Could somebody please share some information about this?
Also, I'd appreciate it if you could explain it to me without using any complex math formulas/symbols, I'm not a mathematician and this was one of my problems when looking for information.
Thanks
You can integrate the accelerations by simply summing the acceleration vectors multiplied by the timestep (period of the IMU) to get the velocity, then sum the velocities times the timestep to get the position. You can propagate (not integrate) the orientation by using various methods depending on which orientation representation you choose (Euler angles, Quaternions, Attitude Matrix (DCM), Axis-Angle, etc..).
However you have a bigger problem.
Long story short: Unless you have a naval-quality IMU (USD$200,000+) you cannot simply integrate the accelerations and angular rates to get an accurate pose (position and orientation) estimates.
I assume you are using a low-cost (under USD$1,000) IMU - your accelerometer and gyro are subject to both noise and bias. These will make it impossible to get accurate pose by simply integrating.
In practice to do what you are intending it is required to fuse 'correcting' measurements of the position and optionally the orientation. The IMU 'predicts' the position/orientation, while another sensor model (camera features, gps, altimeter, range/bearing measurements) take the predicted position and 'correct' it. There are various methods of fusing this data, the most prolific of which is the Extended Kalman Filter or Error-State (Indirect) Kalman Filter.
Back to your original question; I would represent the orientation as quaternions, and you can propagate the quaternion orientation by using the error-quaternion derivative and the angular rates from your gyro.
EDIT:
The noise problem can be partially worked around by using a high pass
filter, but what kind of bias are you exactly talking about?
You should read up on the sources of error in MEMS accelerometers: constant alignment bias, random walk bias, white noise and temperature bias. As you said you can high-pass filter to reduce the effect of noise - however this is not perfect so there is significant residual noise. The double integration of the residual noise gives a quadratically-increasing position error. Even after removing the acceleration due to gravity there will be significant accelerations measured due to these error sources which will render the position estimate inaccurate within less than 1 second of integration.

How to exploit periodicity to reduce noise of a signal?

100 periods have been collected from a 3 dimensional periodic signal. The wavelength slightly varies. The noise of the wavelength follows Gaussian distribution with zero mean. A good estimate of the wavelength is known, that is not an issue here. The noise of the amplitude may not be Gaussian and may be contaminated with outliers.
How can I compute a single period that approximates 'best' all of the collected 100 periods?
Time-series, ARMA, ARIMA, Kalman Filter, autoregression and autocorrelation seem to be keywords here.
UPDATE 1: I have no idea how time-series models work. Are they prepared for varying wavelengths? Can they handle non-smooth true signals? If a time-series model is fitted, can I compute a 'best estimate' for a single period? How?
UPDATE 2: A related question is this. Speed is not an issue in my case. Processing is done off-line, after all periods have been collected.
Origin of the problem: I am measuring acceleration during human steps at 200 Hz. After that I am trying to double integrate the data to get the vertical displacement of the center of gravity. Of course the noise introduces a HUGE error when you integrate twice. I would like to exploit periodicity to reduce this noise. Here is a crude graph of the actual data (y: acceleration in g, x: time in second) of 6 steps corresponding to 3 periods (1 left and 1 right step is a period):
My interest is now purely theoretical, as http://jap.physiology.org/content/39/1/174.abstract gives a pretty good recipe what to do.
We have used wavelets for noise suppression with similar signal measured from cows during walking.
I'm don't think the noise is so much of a problem here and the biggest peaks represent actual changes in the acceleration during walking.
I suppose that the angle of the leg and thus accelerometer changes during your experiment and you need to account for that in order to calculate the distance i.e you need to know what is the orientation of the accelerometer in each time step. See e.g this technical note for one to account for angle.
If you need get accurate measures of the position the best solution would be to get an accelerometer with a magnetometer, which also measures orientation. Something like this should work: http://www.sparkfun.com/products/10321.
EDIT: I have looked into this a bit more in the last few days because a similar project is in my to do list as well... We have not used gyros in the past, but we are doing so in the next project.
The inaccuracy in the positioning doesn't come from the white noise, but from the inaccuracy and drift of the gyro. And the error then accumulates very quickly due to the double integration. Intersense has a product called Navshoe, that addresses this problem by zeroing the error after each step (see this paper). And this is a good introduction to inertial navigation.
Periodic signal without noise has the following property:
f(a) = f(a+k), where k is the wavelength.
Next bit of information that is needed is that your signal is composed of separate samples. Every bit of information you've collected are based on samples, which are values of f() function. From 100 samples, you can get the mean value:
1/n * sum(s_i), where i is in range [0..n-1] and n = 100.
This needs to be done for every dimension of your data. If you use 3d data, it will be applied 3 times. Result would be (x,y,z) points. You can find value of s_i from the periodic signal equation simply by doing
s_i(a).x = f(a+k*i).x
s_i(a).y = f(a+k*i).y
s_i(a).z = f(a+k*i).z
If the wavelength is not accurate, this will give you additional source of error or you'll need to adjust it to match the real wavelength of each period. Since
k*i = k+k+...+k
if the wavelength varies, you'll need to use
k_1+k_2+k_3+...+k_i
instead of k*i.
Unfortunately with errors in wavelength, there will be big problems keeping this k_1..k_i chain in sync with the actual data. You'd actually need to know how to regognize the starting position of each period from your actual data. Possibly need to mark them by hand.
Now, all the mean values you calculated would be functions like this:
m(a) :: R->(x,y,z)
Now this is a curve in 3d space. More complex error models will be left as an excersize for the reader.
If you have a copy of Curve Fitting Toolbox, localized regression might be a good choice.
Curve Fitting Toolbox supports both lowess and loess localized regression models for curve and curve fitting.
There is an option for robust localized regression
The following blog post shows how to use cross validation to estimate an optimzal spaning parameter for a localized regression model, as well as techniques to estimate confidence intervals using a bootstrap.
http://blogs.mathworks.com/loren/2011/01/13/data-driven-fitting/

Kalman Tracking - Measurement variance

I'm doing some work on tracking moving objects using a ceiling mounted downward facing camera. I've got to the point where I can detect the position of the desired object in each frame.
I'm looking into using a Kalman filter to track the object's position and speed through the scene and I've reached a stumbling block. I've set up my system and have all the required parts of the Kalman filter except the measurement variance.
I want to be able to assign a meaningful variance to each measurement to allow the correction phase to use the new information in a sensible manner. I have several measures assigned to my detected objects which could in theory be useful in determining how accurate the position should be and it seems logical to try and combine them to derive a suitable variance.
Am I approaching this in the right manner and if so, can anyone point me in the right direction to continue?
Any help greatly appreciated.
I think you are right. According to this post:
Sensor fusioning with Kalman filter
determining the variance is 100% experimental. It seems to me you have everything you need to get good estimates of the variance.
sorry for the late reply. I have personally encountered the same problem in my previous project. I found the advice given by Gustaf Hendeby in his Sensor Fusion lecture slides ( Page 10 of the slides) extremely valuable.
To summarize:
(1) The SNR of your measurement noise and your process noise determines your filter behavior. A high process noise/measurement noise ration makes your filter slower (low-pass filter), which will usually allow smoother tracking, vice versa a if you set your measurement noise low, you essentially have a high pass filter, which tends to have more jitter.
(2) There are numerous papers in the literature discuss on how to set these noise model properly. However, usually a lot of "tuning" is needed depends on your application. Usually the measurement noise is what we can measure/characterize based on the hardware specification. Therefore a recommendation is to fix "R" (measurement noise covariance) and tune Q (the process model noise covariance).

detecting pauses in a spoken word audio file using pymad, pcm, vad, etc

First I am going to broadly state what I'm trying to do and ask for advice. Then I will explain my current approach and ask for answers to my current problems.
Problem
I have an MP3 file of a person speaking. I'd like to split it up into segments roughly corresponding to a sentence or phrase. (I'd do it manually, but we are talking hours of data.)
If you have advice on how to do this programatically or for some existing utilities, I'd love to hear it. (I'm aware of voice activity detection and I've looked into it a bit, but I didn't see any freely available utilities.)
Current Approach
I thought the simplest thing would be to scan the MP3 at certain intervals and identify places where the average volume was below some threshold. Then I would use some existing utility to cut up the mp3 at those locations.
I've been playing around with pymad and I believe that I've successfully extracted the PCM (pulse code modulation) data for each frame of the mp3. Now I am stuck because I can't really seem to wrap my head around how the PCM data translates to relative volume. I'm also aware of other complicating factors like multiple channels, big endian vs little, etc.
Advice on how to map a group of pcm samples to relative volume would be key.
Thanks!
PCM is a time frame base encoding of sound. For each time frame, you get a peak level. (If you want a physical reference for this: The peak level corresponds to the distance the microphone membrane was moved out of it's resting position at that given time.)
Let's forget that PCM can uses unsigned values for 8 bit samples, and focus on
signed values. If the value is > 0, the membrane was on one side of it's resting position, if it is < 0 it was on the other side. The bigger the dislocation from rest (no matter to which side), the louder the sound.
Most voice classification methods start with one very simple step: They compare the peak level to a threshold level. If the peak level is below the threshold, the sound is considered background noise.
Looking at the parameters in Audacity's Silence Finder, the silence level should be that threshold. The next parameter, Minimum silence duration, is obviously the length of a silence period that is required to mark a break (or in your case, the end of a sentence).
If you want to code a similar tool yourself, I recommend the following approach:
Divide your sound sample in discrete sets of a specific duration. I would start with 1/10, 1/20 or 1/100 of a second.
For each of these sets, compute the maximum peak level
Compare this maximum peak to a threshold (the silence level in Audacity). The threshold is something you have to determine yourself, based on the specifics of your sound sample (loudnes, background noise etc). If the max peak is below your threshold, this set is silence.
Now analyse the series of classified sets: Calculate the length of silence in your recording. (length = number of silent sets * length of a set). If it is above your Minimum silence duration, assume that you have the end of a sentence here.
The main point in coding this yourself instead of continuing to use Audacity is that you can improve your classification by using advanced analysis methods. One very simple metric you can apply is called zero crossing rate, it just counts how often the sign switches in your given set of peak levels (i.e. your values cross the 0 line). There are many more, all of them more complex, but it may be worth the effort. Have a look at discrete cosine transformations for example...
Just wanted to update this. I'm having moderate success using Audacity's Silence Finder. However, I'm still interested in this problem. Thanks.
PCM is a way of encoding a sinusoidal wave. It will be encoded as a series of bits, where one of the bits (1, I'd guess) indicates an increase in the function, and 0 indicates a decrease. The function can stay roughly constant by alternating 1 and 0.
To estimate amplitude, plot the sin wave, then normalize it over the x axis. Then, you should be able to estimate the amplitude of the sin wave at different points. Once you've done that, you should be able to pick out the spots where amplitude is lower.
You may also try to use a Fourier transform to estimate where the signals are most distinct.