Analyzing noisy data - data-mining

Analyzing noisy data - data-mining

I recently launched a rocket with a barometric altimeter that is accurate to roughly 10 ft (calculated via data acquired during flight). The recorded data is in time increments of 0.05 sec per sample and a graph of altitude vs. time looks pretty much like it should when zoomed out over the entire flight.
The problem is when I try to calculate other values such as velocity or acceleration from the data, the accuracy of the measurements makes the calculated values pretty much worthless. What techniques can I use to smooth out the data so that I can calculate (or approximate) reasonable values for the velocity and acceleration? It is important that major events remain in place in time, most notably the 0 for for the first entry and the highest point during flight (2707).
The altitude data follows and is measured in ft above ground level. The first time would be 0.00 and each sample is 0.05 seconds after the previous sample. The spike at the beginning of the flight is due to a technical problem that occurred during liftoff and removing the spike is optimal.
I originally tried using linear interpolation, averaging nearby data points, but it took many iterations to smooth the data enough for integration and the flattening of the curve removed the important apogee and ground level events.
All help is greatly appreciated. Please note this is not the complete data set and I am looking for suggestions on better ways to analyze the data, not for someone to reply with a transformed data set. It would be nice to use an algorithm on board future rockets which can predict current altitude/velocity/acceleration without knowing the full flight data, though that is not required.
00000
00000
00000
00076
00229
00095
00057
00038
00048
00057
00057
00076
00086
00095
00105
00114
00124
00133
00152
00152
00171
00190
00200
00219
00229
00248
00267
00277
00286
00305
00334
00343
00363
00363
00382
00382
00401
00420
00440
00459
00469
00488
00517
00527
00546
00565
00585
00613
00633
00652
00671
00691
00710
00729
00759
00778
00798
00817
00837
00856
00885
00904
00924
00944
00963
00983
01002
01022
01041
01061
01080
01100
01120
01139
01149
01169
01179
01198
01218
01238
01257
01277
01297
01317
01327
01346
01356
01376
01396
01415
01425
01445
01465
01475
01495
01515
01525
01545
01554
01574
01594
01614
01614
01634
01654
01664
01674
01694
01714
01724
01734
01754
01764
01774
01794
01804
01814
01834
01844
01854
01874
01884
01894
01914
01924
01934
01954
01954
01975
01995
01995
02015
02015
02035
02045
02055
02075
02075
02096
02096
02116
02126
02136
02146
02156
02167
02177
02187
02197
02207
02217
02227
02237
02237
02258
02268
02278
02278
02298
02298
02319
02319
02319
02339
02349
02359
02359
02370
02380
02380
02400
02400
01914
02319
02420
02482
02523
02461
02502
02543
02564
02595
02625
02666
02707
02646
02605
02605
02584
02574
02543
02543
02543
02543
02543
02543
02554
02543
02554
02554
02554
02554
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02543
02533
02543
02543
02543
02543
02543
02543
02543
02543
02533
02523
02523
02523
02523
02523
02523
02523
02523
02543
02523
02523
02523
02523
02523
02523
02523
02523
02513
02513
02502
02502
02492
02482
02482
02482
02482
02482
02482
02482
02482
02482
02482
02482
02482
02482
02482
02482
02472
02472
02472
02461
02461
02461
02461
02451
02451
02451
02461
02461
02451
02451
02451
02451
02451
02451
02441
02441
02441
02441
02441
02441
02441
02441
02441
02441
02441
02441
02441
02441
02441
02441
02441
02441
02441
02441
02431
02441
02431
02441
02431
02420
02431
02420
02420
02420
02420
02420
02420
02420
02420
02420
02420
02420
02420
02410
02420
02410
02410
02410
02410
02400
02400
02410
02400
02400
02400
02400
02400
02400
02400
02400
02400
02400
02400
02400
02390
02390
02390
02380
02380
02380
02380
02380
02380
02380
02380
02380
02380
02380
02380
02380
02370
02370
02380
02370
02359
02359
02359
02359
02359
02359
02359
02359
02359
02359
02359
02359
02359
02359
02349
02349
02349
02349
02349
02339
02339
02339
02339
02339
02339
02339
02339
02339
02339
02339
02339
02339

Here is my solution, using a Kalman filter. You will need to tune the parameters (even +- orders of magnitude) if you want to smooth more or less.
#!/usr/bin/env octave
% Kalman filter to smooth measures of altitude and estimate
% speed and acceleration. The continuous time model is more or less as follows:
% derivative of altitude := speed
% derivative of speed := acceleration
% acceleration is a Wiener process
%------------------------------------------------------------
% Discretization of the continuous-time linear system
%
% d |x| | 0 1 0 | |x|
% --- |v| = | 0 0 1 | |v| + "noise"
% dt |a| | 0 0 0 | |a|
%
% y = [1 0 0] |x| + "measurement noise"
% |v|
% |a|
%
st = 0.05; % Sampling time
A = [1 st st^2/2;
0 1 st ;
0 0 1];
C = [1 0 0];
%------------------------------------------------------------
% Fine-tune these parameters! (in particular qa and R)
% The acceleration follows a "random walk". The greater is the variance qa,
% the more "reactive" the system is expected to be, i.e.
% the more the acceleration is expected to vary
% The greater is R, the more noisy is your measurement instrument
% (less "accuracy" of the barometric altimeter);
% if you increase R, you will smooth the estimate more
qx = 1.0; % Variance of model noise for position
qv = 1.0; % Variance of model noise for speed
qa = 50.0; % Variance of model noise for acceleration
Q = diag([qx, qv, qa]);
R = 100.0; % Variance of measurement noise
% (10^2, if 10ft is the standard deviation)
load data.txt % Put your measures in this file
est_position = zeros(length(data), 1);
est_speed = zeros(length(data), 1);
est_acceleration = zeros(length(data), 1);
%------------------------------------------------------------
% Kalman filter
xhat = [0;0;0]; % Initial estimate
P = zeros(3,3); % Initial error variance
for i=1:length(data),
y = data(i);
xpred = A*xhat; % Prediction
Ppred = A*P*A' + Q; % Prediction error variance
Lambdainv = 1/(C*Ppred*C' + R);
xhat = xpred + Ppred*C'*Lambdainv*(y - C*xpred); % Update estimation
P = Ppred - Ppred*C'*Lambdainv*C*Ppred; % Update estimation error variance
est_position(i) = xhat(1);
est_speed(i) = xhat(2);
est_acceleration(i) = xhat(3);
end
%------------------------------------------------------------
% Plot
figure(1);
hold on;
plot(data, 'k'); % Black: real data
plot(est_position, 'b'); % Blue: estimated position
plot(est_speed, 'g'); % Green: estimated speed
plot(est_acceleration, 'r'); % Red: estimated acceleration
pause

You could try running the data through a low-pass filter. This will smooth out high frequency noise. Maybe a simple FIR.
Also, you could pull your major events from the raw data, but use a polynomial fit for velocity and acceleration data.

have you tried performing a scrolling window average of your values ? Basically you perform a window of, say 10 values (from 0 to 9), and calculate its average. then you scroll the window one point (from 1 to 10) and recalculate. This will smooth the values while keeping the number of points relatively unchanged. Larger windows give smoother data at the price of loosing more high-frequency information.
You can use the median instead of the average if your data happen to present outlier spikes.
You can also try with Autocorrelation.

One way you can approach analyzing you data is to try to match it too some model, generate a function, and then test its fitness to your data set.... This can be rather complicated and is probably unnecessary... but the point is that instead of generating acceleration/velocity data directly from you data you can match it to your model (rather simple for a rocket, some acceleration upwards followed by a slow constant speed descent.) At least that how i would do it in a physics experiment.
As for generating some sense of velocity and acceleration during flight this should be as simple averaging the velocity from several different results. Something along the lines of:
EsitimatedV = Vmeasured*(1/n) + (1 - 1/n)*EstimatedV. Set n based on how quickly you want your velocity to adjust by.

I know nothing about rockets. I plotted your points and they look lovely.
Based on what I see in that plot let me assume that there is usually a single apogee and that the function that gave rise to your points has no derivative wrt time at that apogee.
Suggestion:
Monitor maximum altitude throughout the flight.
Continuously watch for the apogee by (say, simply) comparing the most recent few points with the current maximum.
Until you reach the maximum, with (0,0) fixed and some arbitrary set of knots calculate a collection of natural splines up to the current altitude. Use the residuals wrt the splines to decide which data to discard. Recalculate the splines.
At the maximum retain the most recently calculate splines. Start calculating a new set of splines for the curve beyond the apogee.

ARIMA model and look for autocorrelation in the residual is standard procedure. Volatility model another.

Related

Algorithm for 'pixelated circle' image recognition

Here are three sample images. In these images I want to find:
Coordinates of the those small pixelated partial circles.
Rotation of these circles. These circle have a 'pointy' side. I want to find its direction.
For example, coordinates and the angle with positive x axis of that small partial circle in the
first image is (51 px, 63 px), 240 degrees, respectively.
second image is (50 px, 52 px), 300 degrees, respectively.
third image is (80 px, 29 px), 225 degrees, respectively.
I don't care about scale invariance.
Methods I have tried:
ORB feature detection
SIFT feature detection
Feature detection don't seem to work here.
Above is the example of ORB feature detector finding similar features in 1st and 2nd image.
It is finding one correct match, rest are wrong.
Probably because these images are too low resolution to find any meaningful corners or blobs. The corners and blob it does find are not much different form other pixelated object present.
I have seen people use erosion and dilution to remove noise, but my objects are too small for that to work.
Perhaps some other feature detector can help?
I am also thinking about Generalized Hough transform, however I cant find a complete tutorial to implement it with OpenCV (c++). Also I want something that is fast. Hopefully in real time.
Any help is appreciated.

If the small circles have constant size, then you might try a convolution.
This is a quick and dirty test I ran with ImageMagick for speed, and coefficients basically pulled out of thin air:
convert test1.png -define convolve:scale='!' -morphology Convolve \
"12x12: \
-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 \
-9,-7,-2,-1,0,0,0,0,-1,-2,-7,-9 \
-9,-2,-1,0,9,9,9,9,0,-1,-2,-9 \
-9,-1,0,9,7,7,7,7,9,0,-1,-9 \
-9,0,9,7,-9,-9,-9,-9,7,9,0,-9 \
-9,0,9,7,-9,-9,-9,-9,7,9,0,-9 \
-9,0,9,7,-9,-9,-9,-9,7,9,0,-9 \
-9,0,9,7,-9,-9,-9,-9,7,9,0,-9 \
-9,-1,0,9,7,7,7,7,9,0,-1,-9 \
-9,-2,0,0,9,9,9,9,0,0,-2,-9 \
-9,-7,-2,-1,0,0,0,0,-1,-2,-7,-9 \
-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9" \
test2.png
I then ran a simple level stretch plus contrast to bring out what already were visibly more luminous pixels, and a sharpen/reduction to shrink pixel groups to their barycenters (these last operations could be done by multiplying the matrix by the proper kernel), and got this.
The source image on the left is converted to the output on the right, the pixels above a certain threshold mean "circle detected".
Once this is done, I imagine the "pointy" end can be refined with a modified quicunx - use a 3x3 square grid centered on the center pixel, count the total luminosity in each of the eight peripheral squares, and that ought to give you a good idea of where the "point" is. You might want to apply thresholding to offset a possible blurring of the border (the centermost circle in the example below, the one inside the large circle, could give you a false reading).
For example, if we know the coordinates of the center in the grayscale matrix M, and we imagine the circle having diameter of 7 pixels (this is more or less what the convolution above says), we would do
uint quic[3][3] = { { 0, 0, 0 }, { 0, 0, 0 }, { 0, 0, 0 } };
for (y = -3; y <= 3; y++) {
for (x = -3; x <= 3; x++) {
if (matrix[cy+y][cx+x] > threshold) {
quic[(y+3)/2-1][(x+3)/2-1] += matrix[cy+y][cx+x];
}
}
}
// Now, we find which quadrant in quic holds the maximum:
// if it is, say, quic[2][0], the point is southeast.
// 0 1 2 x
// 0 NE N NW
// 1 E X W
// 2 SE S SW
// y
// Value X (1,1) is totally unlikely - the convolution would
// not have found the circle in the first place if it was so
For an accurate result you would have to use "sub-pixel" addressing, which is slightly more complicated. With the method above, one of the circles results in these quicunx values, that give a point to the southeast:
Needless to say, with this kind of resolution the use of a finer grid is pointless, you'd get an error of the same order of magnitude.
I've tried with some random doodles and the convolution matrix has a good rejection of non-signal shapes, but of course this is due to information about the target's size and shape - if that assumption fails, this approach will be a dead end.
It would help to know the image source: there're several tricks used in astronomy and medicine to detect specific shapes or features.
Python opencv2
The above can be implemented with Python:
#!/usr/bin/python3
import cv2
import numpy as np
# Scaling factor
d = 240
kernel1 = np.array([
[ -9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 ],
[ -9,-7,-2,-1,0,0,0,0,-1,-2,-7,-9 ],
[ -9,-2,-1,0,9,9,9,9,0,-1,-2,-9 ],
[ -9,-1,0,9,7,7,7,7,9,0,-1,-9 ],
[ -9,0,9,7,-9,-9,-9,-9,7,9,0,-9 ],
[ -9,0,9,7,-9,-9,-9,-9,7,9,0,-9 ],
[ -9,0,9,7,-9,-9,-9,-9,7,9,0,-9 ],
[ -9,0,9,7,-9,-9,-9,-9,7,9,0,-9 ],
[ -9,-1,0,9,7,7,7,7,9,0,-1,-9 ],
[ -9,-2,0,0,9,9,9,9,0,0,-2,-9 ],
[ -9,-7,-2,-1,0,0,0,0,-1,-2,-7,-9 ],
[ -9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9 ]
], dtype = np.single)
sharpen = np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]]);
image = cv2.imread('EuDpD.png')
# Scale kernel
for i in range(0, 12):
for j in range(0, 12):
kernel1[i][j] = kernel1[i][j]/d
identify = cv2.filter2D(src=image, ddepth=-1, kernel=kernel1)
# Sharpen image
identify = cv2.filter2D(src=identify, ddepth=-1, kernel=sharpen)
# Cut at ~90% of maximum
ret,thresh = cv2.threshold(identify, 220, 255, cv2.THRESH_BINARY)
cv2.imwrite('identify.png', thresh)
The above, ran on the grayscaled image (left), gives the following result (right). A better sharpening or adaptive thresholding could come up with a single pixel.

Using Eigen to solve a dense, constrained least squares fit

I need to solve a classic problem of the form Ax = b for a vector x that is of size 4. A is on the order of ~500 data points and thus is a dense 500x4 matrix.
Currently I can solve this using the normal equations described here and it works fine however I would like to constrain one of my parameters in x to never be above a certain value.
Is there a good way to do this programmatically with Eigen?

You can try my quadradic programming solver based on Eigen there. You'll still have to form the normal equation.

Here is some python-demo (using numpy which is not that far away from Eigen) showing an accelerated projected-gradient algorithm for your kind of problem. Typically this approach is used for large-scale problems (where other algorithms incorporating second-order information might struggle), but it's also nice to implement.
This version, which is a small modification from some old code of mine is not the most simple approach which can be used, as we are using:
acceleration / momentum (faster iteration)
line-search (saves us some step-size tuning trouble)
You could remove the line-search and tune the step-size. Momentum is also not needed.
As i'm not doing much C++ right now, i don't think i will port this to Eigen. But i'm sure, that if you would wanto to port it, it's not that hard. Eigen should not be too different from numpy.
I did not measure the performance, but the results are calculated instantly (perceptually).
Edit: some non-scientific timings (more momentum; lesser tolerance than in following code):
A=(500,4):
Solve non-bounded with scipys lsq_linear
used: 0.004898870004675975
cost : 244.58267993
Solve bounded (0.001, 0.05) with scipys lsq_linear
used: 0.005605718416479959
cost : 246.990611225
Solve bounded (0.001, 0.05) with accelerated projected gradient
early-stopping # it: 3
used: 0.002282825315435914
cost: 246.990611225
A=(50000, 500):
Solve non-bounded with scipys lsq_linear
used: 4.118898701951786 secs
cost : 24843.7115776
Solve bounded (0.001, 0.05) with scipys lsq_linear
used: 14.727660030288007 secs
cost : 25025.0328661
Solve bounded (0.001, 0.05) with accelerated projected gradient
early-stopping # it: 14
used: 5.319953458329618 secs
cost: 25025.0330754
The basic idea is to use some gradient-descent-like algorithm, and project onto our constraints after each gradient-step. This approach is very powerful, if that projection can be done efficiently. Box-constraint-projections are simple!
Page 4 in this pdf shows you the box-constraint projection.
We just clip our solution-vector to lower_bound, upper_bound. Clipping in numpy is described as: Given an interval, values outside the interval are clipped to the interval edges. For example, if an interval of [0, 1] is specified, values smaller than 0 become 0, and values larger than 1 become 1.
It's an iterative-algorithm approximating the solution and i think every algorithm in use will be an iterated one.
Code
import numpy as np
from scipy.optimize import lsq_linear
np.random.seed(1)
A = np.random.normal(size=(500, 4))
b = np.random.normal(size=500)
""" Solve Ax=b
----------
"""
print('Solve non-bounded with scipys lsq_linear')
sol = lsq_linear(A, b)
print('Ax=b sol: ', sol['x'])
print('cost : ', sol['cost'])
print()
""" Solve Ax=b with box-constraints
-------------------------------
"""
print('Solve bounded (0.001, 0.05) with scipys lsq_linear')
sol = lsq_linear(A, b, bounds=(0.001, 0.05))
print('Ax=b constrained sol: ', sol['x'])
print('cost : ', sol['cost'])
print()
""" Solve Ax=b with box-constraints using a projected gradient algorithm
--------------------------------------------------------------------
"""
def solve_pg(A, b, bounds=(-np.inf, np.inf), momentum=0.9, maxiter=1000):
""" remarks:
algorithm: accelerated projected gradient
projection: proj onto box constraints
line-search: armijo-rule along projection-arc (Bertsekas book)
stopping-criterion: naive
gradient-calculation: precomputes AtA
"""
lb = np.empty(A.shape[1])
ub = np.empty(A.shape[1])
if len(bounds) == 2:
# apply lb & ub to all variables
lb = bounds[0]
ub = bounds[1]
else:
# assume dimensions are ok
lb = np.array(bounds[0])
ub = np.array(bounds[1])
M, N = A.shape
x = np.zeros(N)
AtA = A.T.dot(A)
Atb = A.T.dot(b)
stop_count = 0
def gradient(x):
return AtA.dot(x) - Atb
def obj(x):
return 0.5 * np.linalg.norm(A.dot(x) - b)**2
it = 0
while True:
grad = gradient(x)
# line search
alpha = 1
beta = 0.5
sigma=1e-2
old_obj = obj(x)
while True:
new_x = x - alpha * grad
new_obj = obj(new_x)
if old_obj - new_obj >= sigma * grad.dot(x - new_x):
break
else:
alpha *= beta
x_old = x[:]
x = x - alpha*grad
# projection
np.clip(x, lb, ub, out=x) # Projection onto box constraints
# see SO-text
# in-place clipping
y = x + momentum * (x - x_old)
if np.abs(old_obj - obj(x)) < 1e-2:
stop_count += 1
else:
stop_count = 0
if stop_count == 3:
print('early-stopping # it: ', it)
return x
it += 1
if it == maxiter:
return x
print('Solve bounded (0.001, 0.05) with accelerated projected gradient')
sol = solve_pg(A, b, bounds=(0.001, 0.05))
print(sol)
print('cost: ', 0.5 * (np.square(np.linalg.norm(A.dot(sol) - b))))
Output
Solve non-bounded with scipys lsq_linear
Ax=b sol: [ 0.06627173 -0.06104991 -0.07010355 0.04024075]
cost : 244.58267993
Solve bounded (0.001, 0.05) with scipys lsq_linear
Ax=b constrained sol: [ 0.05 0.001 0.001 0.03902291]
cost : 246.990611225
Solve bounded (0.001, 0.05) with accelerated projected gradient
early-stopping # it: 3
[ 0.05 0.001 0.001 0.03902229]
cost: 246.990611225

Computational Physics, FFT analysis

I solved the following questions for a computational assignment, I got a really bad grade on it (67%) I would like to understand how to properly do these questions, in particular Q1.b and Q3. Please be as detailed as possible, I would really like to understand my msitakes
Generate data (sinusoidal functions). Use fft to analyze:
a) A superposition of three waves with constant, but different frequencies
b) A wave whose frequency depends on time
Plot the graphs, sample frequencies, amplitude and power spectra with appropriate axes.
Use the 3 waves from Exercise 1a), but change them to have the same frequency, phase and amplitude. Contaminate each of them with successively increasing amounts of
random, Gaussian-distributed noise.
1) Perform an FFT on the superposition of the three noise-contaminated waves.
Analyze and plot the output.
2) Filter the signal with a Gaussian function, plot the “clean” wave, and analyze the
result. Is the resultant wave 100% clean? Explain.
#1(b)
tmin = -2*pi
tmax - 2*pi
delta = 0.01
t = arange(tmin, tmax, delta)
y = sin(2.5*t*t)
plot(t, y, '-')
title('Figure 2: Plotting a wave whose frequency depends on time ')
xlabel('Time (s)')
ylabel('Y(t)')
show()
#b.2
Fs = 150.0; # sampling rate
Ts = 1.0/Fs; # sampling interval
t = np.arange(0,1,Ts) # time vector
ff = 5; # frequency of the signal
y = np.sin(2*np.pi*ff*t)
n = len(y) # length of the signal
k = np.arange(n)
T = n/Fs
frq = k/T # two sides frequency range
frq = frq[range(n/2)] # one side frequency range
Y = np.fft.fft(y)/n # fft computing and normalization
Y = Y[range(n/2)]
#Time vs. Amplitude
plot(t,y)
title('Figure 2: Time vs. Amplitude')
xlabel('Time')
ylabel('Amplitude')
plt.show()
#Amplitude Spectrum
plot(frq,abs(Y),'r')
title('Figure 2a: Amplitude Spectrum')
xlabel('Freq (Hz)')
ylabel('amplitude spectrum')
plt.show()
#Power Spectrum
plot(frq,abs(Y)**2,'r')
title('Figure 2b: Power Spectrum')
xlabel('Freq (Hz)')
ylabel('power spectrum')
plt.show()
#Exercise 3:
#part 1
t = np.linspace(-0.5*pi,0.5*pi,1000)
#contaminating our waves with successively increasing white noise
y_1 = sin(15*t) + np.random.normal(0,0.2*pi,1000)
y_2 = sin(15*t) + np.random.normal(0,0.3*pi,1000)
y_3 = sin(15*t) + np.random.normal(0,0.4*pi,1000)
y = y_1 + y_2 + y_3 # superposition of three contaminated waves
#Plotting the figure
plot(t,y,'-')
title('A superposition of three waves contaminated with Gaussian Noise')
xlabel('Time (s)')
ylabel('Y(t)')
show()
delta = pi/1000.0
n = len(y) ## calculate frequency in Hz
freq = fftfreq(n, delta) # Computing the FFT
Freq = fftfreq(len(y), delta) #Using Fast Fourier Transformation to #calculate frequencies
N = len(Freq)
fr = Freq[1:len(Freq)/2.0]
A = fft(y)
XF = A[1:len(A)/2.0]/float(len(A[1:len(A)/2.0]))
# Amplitude spectrum for contaminated waves
plt.plot(fr, abs(XF))
title('Figure 3a : Amplitude spectrum with Gaussian Noise')
xlabel('frequency')
ylabel('Amplitude')
show()
# Power spectrum for contaminated waves
plt.plot(fr,abs(XF)**2)
title('Figure 3b: Power spectrum with Gaussian Noise')
xlabel('frequency(cycles/year)')
ylabel('Power')
show()
# part 2
F_v = exp(-(abs(freq)-2)**2/2*0.5**2)
spectrum = A*F_v #Applying the Gaussian Filter to clean our waves
new_y = ifft(spectrum) #Computing the inverse FFT
plot(t,new_y,'-')
title('A superposition of three waves after Noise Filtering')
xlabel('Time (s)')
ylabel('Y(t)')
show()

Something like the code/images below would have been expected. I deviated in the plot of the sum of the three noisy waves to show off all three waves and the sum. Note that in the intensity spectrum of the noisy wave you don't see much. For those cases it can be instructive to also plot the logarithm of the spectrum (np.log) so you can see the noise better.
In the last plot I plotted both the Gaussian filter and the spectrum (different sizes) w/o rescaling just to show where the filter applies. It is effectively a low pass filter (lets low frequencies through), removing the higher frequency noise by multiplying it with numbers close to zero.
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline
#1(b)
p.figure(figsize=(20,16))
p.subplot(431)
t = np.arange(0,10, 0.001) #units in seconds
#cleaner to show the frequency change explicitly than y = sin(2.5*t*t)
f= 1+ t*0.1 # linear up chirp, i.e. frequency goes up , frequency units in Hz (1/sec)
y = np.sin(2* np.pi* f* t)
p.plot(t, y, '-')
p.title('Figure 2: Plotting a wave whose frequency depends on time ')
p.xlabel('Time (s)')
p.ylabel('Y(t)')
#b.2
Fs = 150.0; # sampling rate
Ts = 1.0/Fs; # sampling interval
t = np.arange(0,1,Ts) # time vector
ff = 5; # frequency of the signal
y = np.sin(2*np.pi*ff*t)
n = len(y) # length of the signal
k = np.arange(n) ## ok, the FFT has as many points in frequency space, as the original in time
T = n/Fs ## correct ; T=sampling time, the total frequency range is 1/sample time
frq = k/T # two sided frequency range
frq = frq[range(n/2)] # one sided frequency range
Y = np.fft.fft(y)/n # fft computing and normalization
Y = Y[range(n/2)]
# Amplitude vs. Time
p.subplot(434)
p.plot(t,y)
p.title('y(t)') # Amplitude vs Time is commonly said, but strictly not true, the amplitude is unchanging
p.xlabel('Time')
p.ylabel('Amplitude')
#Amplitude Spectrum
p.subplot(435)
p.plot(frq,abs(Y),'r')
p.title('Figure 2a: Amplitude Spectrum')
p.xlabel('Freq (Hz)')
p.ylabel('amplitude spectrum')
#Power Spectrum
p.subplot(436)
p.plot(frq,abs(Y)**2,'r')
p.title('Figure 2b: Power Spectrum')
p.xlabel('Freq (Hz)')
p.ylabel('power spectrum')
#Exercise 3:
#part 1
t = np.linspace(-0.5*np.pi,0.5*np.pi,1000)
# #contaminating our waves with successively increasing white noise
y_1 = np.sin(15*t) + np.random.normal(0,0.1,1000) # no need to get pi involved in this amplitude
y_2 = np.sin(15*t) + np.random.normal(0,0.2,1000)
y_3 = np.sin(15*t) + np.random.normal(0,0.4,1000)
y = y_1 + y_2 + y_3 # superposition of three contaminated waves
#Plotting the figure
p.subplot(437)
p.plot(t,y_1+2,'-',lw=0.3)
p.plot(t,y_2,'-',lw=0.3)
p.plot(t,y_3-2,'-',lw=0.3)
p.plot(t,y-6 ,lw=1,color='black')
p.title('A superposition of three waves contaminated with Gaussian Noise')
p.xlabel('Time (s)')
p.ylabel('Y(t)')
delta = np.pi/1000.0
n = len(y) ## calculate frequency in Hz
# freq = np.fft(n, delta) # Computing the FFT <-- wrong, you don't calculate the FFT from a number, but from a time dep. vector/array
# Freq = np.fftfreq(len(y), delta) #Using Fast Fourier Transformation to #calculate frequencies
# N = len(Freq)
# fr = Freq[1:len(Freq)/2.0]
# A = fft(y)
# XF = A[1:len(A)/2.0]/float(len(A[1:len(A)/2.0]))
# Why not do as before?
k = np.arange(n) ## ok, the FFT has as many points in frequency space, as the original in time
T = n/Fs ## correct ; T=sampling time, the total frequency range is 1/sample time
frq = k/T # two sided frequency range
frq = frq[range(n/2)] # one sided frequency range
Y = np.fft.fft(y)/n # fft computing and normalization
Y = Y[range(n/2)]
# Amplitude spectrum for contaminated waves
p.subplot(438)
p.plot(frq, abs(Y))
p.title('Figure 3a : Amplitude spectrum with Gaussian Noise')
p.xlabel('frequency')
p.ylabel('Amplitude')
# Power spectrum for contaminated waves
p.subplot(439)
p.plot(frq,abs(Y)**2)
p.title('Figure 3b: Power spectrum with Gaussian Noise')
p.xlabel('frequency(cycles/year)')
p.ylabel('Power')
# part 2
p.subplot(4,3,11)
F_v = np.exp(-(np.abs(frq)-2)**2/2*0.5**2) ## this is a Gaussian, plot it separately to see it; play with the values
cleaned_spectrum = Y*F_v #Applying the Gaussian Filter to clean our waves ## multiplication in FreqDomain is convolution in time domain
p.plot(frq,F_v)
p.plot(frq,cleaned_spectrum)
p.subplot(4,3,10)
new_y = np.fft.ifft(cleaned_spectrum) #Computing the inverse FFT of the cleaned spectrum to see the cleaned wave
p.plot(t[range(n/2)],new_y,'-')
p.title('A superposition of three waves after Noise Filtering')
p.xlabel('Time (s)')
p.ylabel('Y(t)')

Horizontal and vertical distance between two coordinates on map

I have coordinates between two points on the map. There are tools for calculating distance between them, but I want to find horizontal and vertical distance between the two places. Can someone help please?

Lets say you points A(x1/y1) and B(x2/y2).
The horizontal and vertical distances between the two are the differences of their coordinates:
x12 = x2 - x1
y12 = y2 - y1
Addition:
The total distance is
d12 = sqrt(y122 + x122)
= sqrt( (y2 - y1)2 + (x2 - x1)2)
where sqrt means "Square Root"

Let's say your 2 known points A and B have latitude and longitude latA, longA and latB, longB.
Now you could introduce two additional points C and D with latC = latA, longC = longB, and latD = latB, longD = longA, so the points A, B, C, D form a rectangle on the earth's surface.
Now you can simply use distanceBetween(A, C) and distanceBetween(A, D) to get the required distances.
Copied from: https://stackoverflow.com/a/62857233/5660341

On the northern hemisphere two positions a and b would look a bit like:
a--+
/ \
/ \
+--------b
So there is a common vertical distance, but a slightly different horizontal distance.
The shortest travel would be horizontally from a (greatest and then vertical to b.
With school math and knowing the zero-height (w.r.t. sea level) ellipse going through both poles you can easily calculate the three distances.
There are sufficient sites, search for longitude, latitude, haversine formula.
There are sufficient simplifications, around, and also one point may be on the northern and the other on the southern hemisphere. And with large values it might be better to horizontally walk left from a to the +. So you have to really dive into the calculation.
I would not dare to give an actual formula here.

Rough estimation of centroid with dft

I would like the coordinates for the centroid and I have already calculated the DFT (for a different purpose). I've seen some slides that hint on the possibility to get a rough estimation of the centroid by looking at the first values of the matrix.
The code is based on: http://docs.opencv.org/doc/tutorials/core/discrete_fourier_transform/discrete_fourier_transform.html
cv::dft(complexI, complexI);
// compute the magnitude and switch to logarithmic scale
// => log(1 + sqrt(Re(DFT(I))^2 + Im(DFT(I))^2))
cv::split(complexI, planes); // planes[0] = Re(DFT(I), planes[1] = Im(DFT(I))
double x = (double)planes[0].at<int>(0,0)/INT_MAX;
double y = ABS(((double)planes[1].at<int>(0,0)/INT_MAX));
But every time the y value becomes 0. The x value seem correct though. Am I missing something?

This is more commonly done using moments.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Analyzing noisy data - data-mining

You could try running the data through a low-pass filter. This will smooth out high frequency noise. Maybe a simple FIR. Also, you could pull your major events from the raw data, but use a polynomial fit for velocity and acceleration data.

ARIMA model and look for autocorrelation in the residual is standard procedure. Volatility model another.

Related

Algorithm for 'pixelated circle' image recognition

Using Eigen to solve a dense, constrained least squares fit

Computational Physics, FFT analysis

Horizontal and vertical distance between two coordinates on map

Rough estimation of centroid with dft

Categories

Resources