Smooth first derivative of noisy input signal - c++

I have attached images of an input signal (shown in blue), which is actually a continuous input stream and whose trend I do not know, and the signal smoothed using a moving average filter of span 5 (shown in red).
Raw input and smoothed input signal
First derivative of the raw input and smoothed input signal
My aim is to calculate the ratio of this signal to its first derivative. However, clearly the first derivative is noisy and does not give good results. I realize I must change the filter from moving average to a more robust one. I have looked up Savitzky-Golay filter, but I read on another site that it is more efficient in retaining the shape of the signal than reducing the noise. http://terpconnect.umd.edu/~toh/spectrum/Smoothing.html
Kalman filter would be my next guess but it needs an initial state estimate which I cannot know for this type of signal.
Any other suggestions on how to smooth the first derivative of a noisy input?

First of all, don't expect any miracles from any of these filters. Numerical differentiation of noisy data is generally critical because the differentialtion operation itself acts as a high pass filter and thus amplifies noise.
Yes there are differences between Moving Average, Savitzky-Golay and Kalman but these are subtle. The main advantage of Savitzky-Golay is using an adaptive window size.
Looking at your data, it seems like you should use a much larger window size resulting in a lower cut-off frequency. But then, I don't know if your data sets always look like this.
Another hint: As long as your filters are effectively linear, it does not matter if you first apply the filter and then calculate the derivative or if you calculate the derivative from the original signal and then apply the filtering.

Related

OpenCV - PCA analysis on BW image - area vs shape - is there already an implementation?

The shape of an object is detected on a bw image. The object is a black continuous shape, the background is white.
We use PCA (http://docs.opencv.org/3.1.0/d1/dee/tutorial_introduction_to_pca.html) to get the object direction and align the object. Currently the shape itself (the points on the contour) is the input to the opencv PCA implementation. This usually works very well. But from time to time there is small dirt on the object border, causing the shape to pass around the dirt. This causes more points and more weight on one side, slightly turning the object.
Idea: Instead of the contour, we use the area of the object as input for our PCA analysis. The issue there, to check all points on if they are inside the contour and then use them for PCA slows the application down. This part will be about 52352 times slower.
New Approach: We take random points in the image, check if they are inside the shape and if so, use them for our PCA. We have to see if we can get the consistent quality needed from this approach.
Is there already a similar implementation in opencv which is using the area instead of the shape?
Another approach would be to put a mesh over the object and use the mesh points inside the object for PCA.
Is there already something similar available one can just use or does one quickly need to implement something like this?
Going for straight lines around the object isn't an option.
Given that we have received very limited information about your problem (posting images would help a lot) and you do not seem to know the probability density function of the noise, your best bet is to consider the noise to be Gaussian.
As such, and following your intuition, my suggested approach is to take a few (by a few I mean statistically relevant but not raising the computation time that much) random points that lie inside the object and compute the PCA.
Repeat this procedure in an iterative loop and store somewhere the resulting rotation angles you get from the application of the PCA to the object shape.
Stop once you have enough point, compute the mean of the rotation angles: this is a decent estimate of the true angle. Compute also the standard deviation to get a measure of the quality of your estimation. By "enough points" you can consider that ~30 points is usually considered to be "enough" for being representative of the underlying population according to the central limit theorem.
If you want, you can improve on this approach in many ways, for example doing robust estimation of the true angle once you have collected enough points. It all depends on the data you have at hand...take my suggestion just as a starting point.
There are few parameters that you could change, in which may improve your system.
First is the threshold you use to binarize your image. I don't know what your application is about, but you could use other color systems, or normalize your image by cromacity, and after that, apply the new threshold.
Other aspect is to exclude shapes (contours) that have bigger or smaller area that what you are expecting.
To add up, you may use a blur filter before detect contours.
I don't know how the noise looks but when you say "small dirt" I think it might be only some few pixels that is a lot smaller then the object it self, but it might be attached to the object. To reduce this noise it might be possible to perform an opening (morphology) on the binary image.
http://docs.opencv.org/2.4/doc/tutorials/imgproc/opening_closing_hats/opening_closing_hats.html

Kalman filters for Multiple Object Tracking in videos

From what I've understood, tracking algorithms predict where a given object will be in the next frame (after object detection is already performed). The object is then again recognized in the next frame. What isn't clear is how the tracker then knows to associate the object in the 2nd frame as the same as the one in the 1st, especially when there are multiple objects in the frame.
I've seen in a few places that a cost matrix is created using Euclidean distance between the prediction and all detections, and the problem is framed as an assignment problem (Hungarian algorithm).
Is my understanding of tracking correct? Are there other ways of establishing that an object in one frame is the same as an object in the next frame?
Your understanding is correct. You have described a simple cost function, which is likely to work well in many situations. However, there will be times when it fails.
Assuming you have the computational resources, you can try to make your tracker more robust, by making the cost function more complicated.
The simplest thing you can do is take into account the error covariance of the Kalman filter, rather than just using the Euclidean distance. See the distance equation in the documentation for the vision.KalmanFilter object in MATLAB. Also see the Motion-based Multiple Object Tracking example.
You can also include other information in the cost function. You could account for the fact that the size of the object should not change too much between frames, or that the object's appearance should stay the same. For example, you could compute color histograms of your detections, and define your cost function as a weighted sum of the "Kalman filter distance" and some distance between color histograms.

Opencv - How to differentiate jitter from panning?

I'm working on a video stabilizer using Opencv in C++.
At this time of the project I'm correctly able to find the translation between two consecutive frames with 3 different technique (Optical Flow, Phase Correlation, BFMatcher on points of interests).
To obtain a stabilized image I add up all the translation vector (from consecutive frame) to one, which is used in warpAffine function to correct the output image.
I'm having good result on fixed camera but result on camera in translation are really bad : the image disappear from the screen.
I think I have to distinguish jitter movement that I want to remove from panning movement that I want to keep. But I'm open to others solutions.
Actually the whole problem is a bit more complex than you might have thought in the beginning. Let's look a it this way: when you move your camera through the world, things that move close to the camera move faster than the ones in the background - so objects at different depths change their relative distance (look at your finder while moving the head and see how it points to different things). This means the image actually transforms and does not only translate (move in x or y) - so how do you want to accompensate for that? What you you need to do is to infer how much the camera moved (translation along x,y and z) and how much it rotated (with the angles of yaw, pan and tilt). This is a not very trivial task but openCV comes with a very nice package: http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html
So I recommend you to read as much on Homography(http://en.wikipedia.org/wiki/Homography), camera models and calibration as possible and then think what you actually want to stabilize for and if it is only for the rotation angles, the task is much simpler than if you would also like to stabilize for translational jitters.
If you don't want to go fancy and neglect the third dimension, I suggest that you average the optic flow, high-pass filter it and compensate this movement with a image translation into the oposite direction. This will keep your image more or less in the middle of the frame and only small,fast changes will be counteracted.
I would suggest you the possible approaches (in complexity order):
apply some easy-to-implement IIR low pass filter on the translation vectors before applying the stabilization. This will separate the high frequency (jitter) from the low frequency (panning)
same idea, a bit more complex, use Kalman filtering to track a motion with constant velocity or acceleration. You can use OpenCV's Kalman filter for that.
A bit more tricky, put a threshold on the motion amplitude to decide between two states (moving vs static camera) and filter the translation or not.
Finaly, you can use some elaborate technique from machine Learning to try to identify the user's desired motion (static, panning, etc.) and filter or not the motion vectors used for the stabilization.
Just a threshold is not a low pass filter.
Possible low pass filters (that are easy to implement):
there is the well known averaging, that is already a low-pass filter whose cutoff frequency depends on the number of samples that go into the averaging equation (the more samples the lower the cutoff frequency).
One frequently used filter is the exponential filter (because it forgets the past with an exponential rate decay). It is simply computed as x_filt(k) = a*x_nofilt(k) + (1-a)x_filt(k-1) with 0 <= a <= 1.
Another popular filter (and that can be computed beyond order 1) is the Butterworth filter.
Etc Low pass filters on Wikipedia, IIR filters...

How to exploit periodicity to reduce noise of a signal?

100 periods have been collected from a 3 dimensional periodic signal. The wavelength slightly varies. The noise of the wavelength follows Gaussian distribution with zero mean. A good estimate of the wavelength is known, that is not an issue here. The noise of the amplitude may not be Gaussian and may be contaminated with outliers.
How can I compute a single period that approximates 'best' all of the collected 100 periods?
Time-series, ARMA, ARIMA, Kalman Filter, autoregression and autocorrelation seem to be keywords here.
UPDATE 1: I have no idea how time-series models work. Are they prepared for varying wavelengths? Can they handle non-smooth true signals? If a time-series model is fitted, can I compute a 'best estimate' for a single period? How?
UPDATE 2: A related question is this. Speed is not an issue in my case. Processing is done off-line, after all periods have been collected.
Origin of the problem: I am measuring acceleration during human steps at 200 Hz. After that I am trying to double integrate the data to get the vertical displacement of the center of gravity. Of course the noise introduces a HUGE error when you integrate twice. I would like to exploit periodicity to reduce this noise. Here is a crude graph of the actual data (y: acceleration in g, x: time in second) of 6 steps corresponding to 3 periods (1 left and 1 right step is a period):
My interest is now purely theoretical, as http://jap.physiology.org/content/39/1/174.abstract gives a pretty good recipe what to do.
We have used wavelets for noise suppression with similar signal measured from cows during walking.
I'm don't think the noise is so much of a problem here and the biggest peaks represent actual changes in the acceleration during walking.
I suppose that the angle of the leg and thus accelerometer changes during your experiment and you need to account for that in order to calculate the distance i.e you need to know what is the orientation of the accelerometer in each time step. See e.g this technical note for one to account for angle.
If you need get accurate measures of the position the best solution would be to get an accelerometer with a magnetometer, which also measures orientation. Something like this should work: http://www.sparkfun.com/products/10321.
EDIT: I have looked into this a bit more in the last few days because a similar project is in my to do list as well... We have not used gyros in the past, but we are doing so in the next project.
The inaccuracy in the positioning doesn't come from the white noise, but from the inaccuracy and drift of the gyro. And the error then accumulates very quickly due to the double integration. Intersense has a product called Navshoe, that addresses this problem by zeroing the error after each step (see this paper). And this is a good introduction to inertial navigation.
Periodic signal without noise has the following property:
f(a) = f(a+k), where k is the wavelength.
Next bit of information that is needed is that your signal is composed of separate samples. Every bit of information you've collected are based on samples, which are values of f() function. From 100 samples, you can get the mean value:
1/n * sum(s_i), where i is in range [0..n-1] and n = 100.
This needs to be done for every dimension of your data. If you use 3d data, it will be applied 3 times. Result would be (x,y,z) points. You can find value of s_i from the periodic signal equation simply by doing
s_i(a).x = f(a+k*i).x
s_i(a).y = f(a+k*i).y
s_i(a).z = f(a+k*i).z
If the wavelength is not accurate, this will give you additional source of error or you'll need to adjust it to match the real wavelength of each period. Since
k*i = k+k+...+k
if the wavelength varies, you'll need to use
k_1+k_2+k_3+...+k_i
instead of k*i.
Unfortunately with errors in wavelength, there will be big problems keeping this k_1..k_i chain in sync with the actual data. You'd actually need to know how to regognize the starting position of each period from your actual data. Possibly need to mark them by hand.
Now, all the mean values you calculated would be functions like this:
m(a) :: R->(x,y,z)
Now this is a curve in 3d space. More complex error models will be left as an excersize for the reader.
If you have a copy of Curve Fitting Toolbox, localized regression might be a good choice.
Curve Fitting Toolbox supports both lowess and loess localized regression models for curve and curve fitting.
There is an option for robust localized regression
The following blog post shows how to use cross validation to estimate an optimzal spaning parameter for a localized regression model, as well as techniques to estimate confidence intervals using a bootstrap.
http://blogs.mathworks.com/loren/2011/01/13/data-driven-fitting/

detecting pauses in a spoken word audio file using pymad, pcm, vad, etc

First I am going to broadly state what I'm trying to do and ask for advice. Then I will explain my current approach and ask for answers to my current problems.
Problem
I have an MP3 file of a person speaking. I'd like to split it up into segments roughly corresponding to a sentence or phrase. (I'd do it manually, but we are talking hours of data.)
If you have advice on how to do this programatically or for some existing utilities, I'd love to hear it. (I'm aware of voice activity detection and I've looked into it a bit, but I didn't see any freely available utilities.)
Current Approach
I thought the simplest thing would be to scan the MP3 at certain intervals and identify places where the average volume was below some threshold. Then I would use some existing utility to cut up the mp3 at those locations.
I've been playing around with pymad and I believe that I've successfully extracted the PCM (pulse code modulation) data for each frame of the mp3. Now I am stuck because I can't really seem to wrap my head around how the PCM data translates to relative volume. I'm also aware of other complicating factors like multiple channels, big endian vs little, etc.
Advice on how to map a group of pcm samples to relative volume would be key.
Thanks!
PCM is a time frame base encoding of sound. For each time frame, you get a peak level. (If you want a physical reference for this: The peak level corresponds to the distance the microphone membrane was moved out of it's resting position at that given time.)
Let's forget that PCM can uses unsigned values for 8 bit samples, and focus on
signed values. If the value is > 0, the membrane was on one side of it's resting position, if it is < 0 it was on the other side. The bigger the dislocation from rest (no matter to which side), the louder the sound.
Most voice classification methods start with one very simple step: They compare the peak level to a threshold level. If the peak level is below the threshold, the sound is considered background noise.
Looking at the parameters in Audacity's Silence Finder, the silence level should be that threshold. The next parameter, Minimum silence duration, is obviously the length of a silence period that is required to mark a break (or in your case, the end of a sentence).
If you want to code a similar tool yourself, I recommend the following approach:
Divide your sound sample in discrete sets of a specific duration. I would start with 1/10, 1/20 or 1/100 of a second.
For each of these sets, compute the maximum peak level
Compare this maximum peak to a threshold (the silence level in Audacity). The threshold is something you have to determine yourself, based on the specifics of your sound sample (loudnes, background noise etc). If the max peak is below your threshold, this set is silence.
Now analyse the series of classified sets: Calculate the length of silence in your recording. (length = number of silent sets * length of a set). If it is above your Minimum silence duration, assume that you have the end of a sentence here.
The main point in coding this yourself instead of continuing to use Audacity is that you can improve your classification by using advanced analysis methods. One very simple metric you can apply is called zero crossing rate, it just counts how often the sign switches in your given set of peak levels (i.e. your values cross the 0 line). There are many more, all of them more complex, but it may be worth the effort. Have a look at discrete cosine transformations for example...
Just wanted to update this. I'm having moderate success using Audacity's Silence Finder. However, I'm still interested in this problem. Thanks.
PCM is a way of encoding a sinusoidal wave. It will be encoded as a series of bits, where one of the bits (1, I'd guess) indicates an increase in the function, and 0 indicates a decrease. The function can stay roughly constant by alternating 1 and 0.
To estimate amplitude, plot the sin wave, then normalize it over the x axis. Then, you should be able to estimate the amplitude of the sin wave at different points. Once you've done that, you should be able to pick out the spots where amplitude is lower.
You may also try to use a Fourier transform to estimate where the signals are most distinct.