Turn data table into a matrix - python-2.7

I'm really desperate since I can't create a matrix using a given code. I am not allowed to use numpy or any other imported libraries.
Here's my code, which I will be translating since it's in Spanish so I'm really sorry if I miss a word or two:
start = float(input("First value of time: "))
incremento = float(input("Increase: "))
final = float(input("Final time: "))
max_height = 0.0
max_time = 0.0
print ("Tiempo\t Altitud(m)\t Velocidad(m/s)\t")
time = final
while (time <= final and time <= 48):
height= -0.12*time**4+12*time**3-380*time**2+4100*time+220
speed= -0.48*time**3+36*time**2-760*time+4100
speed/= 3600
print ("%.2f\t %.2f\t %.2f\t" %(time, height, speed))
if height> max_height:
max_heigt= height
max_time = time
time+= incremento
print ("Maximum height is %.2f m in time %.2f." %(max_height, max_time ))
I'm supposed to create a matrix from the information printed as a table.

Python doesn't have a builtin matrix, so you'd want to store the information in a list of lists.
Before the while loop add table = [].
Inside the while loop, add table.append([time, height, speed]).

Related

Component reconstruction for multivariate lagged time series

I am trying to write a multivariate Singular Spectrum Analysis with Monte Carlo test. To this extent I am working on a code piece that can reconstruct the input series using the lagged trajectory matrix and projection base (ST-PCs) that result from the pca/ssa decomposition of the input series. The attached code piece works for a lagged univariate (that is, single) time series, but I am struggling to make this reconstruction for a lagged multivariate time series. I don't quite get the procedure mathematically and - not surprisingly - I also did not manage to program it. Useful links are attached to the function descriptions of the accompanying code. Input data should be of the form (time * number of series), so say 288x3 implying 3 time series of 288 time levels.
I hope you can help me out!
import numpy as np
def lagged_covariance_matrix(data, M):
""" Computes the lagged covariance matrix using the Broomhead & King method
Background: Plaut, G., & Vautard, R. (1994). Spells of low-frequency oscillations and
weather regimes in the Northern Hemisphere. Journal of the atmospheric sciences, 51(2), 210-236.
Arguments:
data : pxn time series, where p denotes the length of the time series and n the number of channels
M : window length """
# explicitely 'add' spatial dimension if input is a single time series
if np.ndim(data) == 1:
data = np.reshape(data,(len(data),1))
T = data.shape[0]
L = data.shape[1]
N = T - M + 1
X = np.zeros((T, L, M))
for i in range(M):
X[:,:,i] = np.roll(data, -i, axis = 0)
X = X[:N]
# X constitutes the trajectory matrix and is a stacked hankel matrix
X = np.reshape(X, (N, M*L), order = 'C') # https://www.jstatsoft.org/article/viewFile/v067i02/v67i02.pdf
# choose the smallest projection basis for computation of the covariance matrix
if M*L >= N:
return 1/(M*L) * X.dot(X.T), X
else:
return 1/N * X.T.dot(X), X
def sort_by_eigenvalues(eigenvalues, PCs):
""" Sorts the PCs and eigenvalues by descending size of the eigenvalues """
desc = np.argsort(-eigenvalues)
return eigenvalues[desc], PCs[:,desc]
def Reconstruction(M, E, X):
""" Reconstructs the series as the sum of M subseries.
See: https://en.wikipedia.org/wiki/Singular_spectrum_analysis, 'Basic SSA' &
the work of Vivien Sainte Fare Garnot on univariate time series (https://github.com/VSainteuf/mcssa)
Arguments:
M : window length
E : eigenvector basis
X : trajectory matrix """
time = len(X) + M - 1
RC = np.zeros((time, M))
# step 3: grouping
for i in range(M):
d = np.zeros(M)
d[i] = 1
I = np.diag(d)
Q = np.flipud(X # E # I # E.T)
# step 4: diagonal averaging
for k in range(time):
RC[k, i] = np.diagonal(Q, offset = -(time - M - k)).mean()
return RC
#=====================================================================================================
#=====================================================================================================
#=====================================================================================================
# input data
data = None
# number of lags a.k.a. window length
M = 45 # M = 1 means no lag
covmat, X = lagged_covariance_matrix(data, M)
# get the eigenvalues and vectors of the covariance matrix
vals, vecs = np.linalg.eig(covmat)
eig_data, eigvec_data = sort_by_eigenvalues(vals, vecs)
# component reconstruction
recons_data = Reconstruction(M, eigvec_data, X)
The following works but does not make direct use of the projection base (ST-PCs). Hence the original question still stands, but this already helps a great lot and solves the problem for me. This code piece makes use of the similarity between the ST-PCs projection base and the u & vt matrices obtained from the single value decomposition of the lagged trajectory matrix. I think it gives back the same answer as one would obtain using the ST-PCs projection base?
def lag_reconstruction(data, X, M, pairs = None):
""" Reconstructs the series as the sum of M subseries using the lagged trajectory matrix.
Based on equation 2.9 of Plaut, G., & Vautard, R. (1994). Spells of low-frequency oscillations and weather regimes in the Northern Hemisphere. Journal of Atmospheric Sciences, 51(2), 210-236.
Inspired by work of R. van Westen and C. Wieners """
time = data.shape[0] # number of time levels of the original series
L = data.shape[1] # number of input series
N = time - M + 1
u, s, vt = np.linalg.svd(X, full_matrices = False)
rc = np.zeros((time, L, M))
for t in range(time):
counter = 0
for i in range(M):
if t-i >= 0 and t-i < N:
counter += 1
if pairs:
for k in pairs:
rc[t,:,i] += u[t-i, k] * s[k] * vt[k, i*L : i*L + L]
else:
for k in range(len(s)):
rc[t,:,i] += u[t-i, k] * s[k] * vt[k, i*L : i*L + L]
rc[t] = rc[t]/counter
return rc

Using gradient descent to solve a nonlinear system

I have the following code, which uses gradient descent to find the global minimum of y = (x+5)^2:
cur_x = 3 # the algorithm starts at x=3
rate = 0.01 # learning rate
precision = 0.000001 # this tells us when to stop the algorithm
previous_step_size = 1
max_iters = 10000 # maximum number of iterations
iters = 0 # iteration counter
df = lambda x: 2*(x+5) # gradient of our function
while previous_step_size > precision and iters < max_iters:
prev_x = cur_x # store current x value in prev_x
cur_x = cur_x - rate * df(prev_x) # grad descent
previous_step_size = abs(cur_x - prev_x) # change in x
iters = iters+1 # iteration count
print("Iteration",iters,"\nX value is",cur_x) # print iterations
print("The local minimum occurs at", cur_x)
The procedure is fairly simple, and among the most intuitive and brief for solving such a problem (at least, that I'm aware of).
I'd now like to apply this to solving a system of nonlinear equations. Namely, I want to use this to solve the Time Difference of Arrival problem in three dimensions. That is, given the coordinates of 4 observers (or, in general, n+1 observers for an n dimensional solution), the velocity v of some signal, and the time of arrival at each observer, I want to reconstruct the source (determine it's coordinates [x,y,z].
I've already accomplished this using approximation search (see this excellent post on the matter: ), and I'd now like to try doing so with gradient descent (really, just as an interesting exercise). I know that the problem in two dimensions can be described by the following non-linear system:
sqrt{(x-x_1)^2+(y-y_1)^2}+s(t_2-t_1) = sqrt{(x-x_2)^2 + (y-y_2)^2}
sqrt{(x-x_2)^2+(y-y_2)^2}+s(t_3-t_2) = sqrt{(x-x_3)^2 + (y-y_3)^2}
sqrt{(x-x_3)^2+(y-y_3)^2}+s(t_1-t_3) = sqrt{(x-x_1)^2 + (y-y_1)^2}
I know that it can be done, however I cannot determine how.
How might I go about applying this to 3-dimensions, or some nonlinear system in general?

Trying to fit a sine function to phased light curve

import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model,Parameters
f2= "KELT_N16_lc_006261_V01_west_tfa.dat"
t2="TIMES" # file name
NewData2 = np.loadtxt(t2, dtype=float, unpack=True)
NewData = np.loadtxt(f2,dtype=float, unpack=True, usecols=(1,))
flux = NewData
time= NewData2
new_flux=np.hstack([flux,flux])
# fold
period = 2.0232 # period (must be known already!)
foldTimes = ((time)/ period) # divide by period to convert to phase
foldTimes = foldTimes % 1 # take fractional part of phase only (i.e. discard whole number part)
new_phase=np.hstack([foldTimes+1,foldTimes])
print len(new_flux)
print len(new_phase)
def Wave(x, new_flux,new_phase):
wave = new_flux*np.sin(new_phase+x)
return wave
model = Model(Wave)
print "Independent Vars:", model.independent_vars
print "Parameters:",model.param_names
p = Parameters()
p.add_many(('new_flux',13.42, True, None, None, None) )
p.add_many(('new_phase',0,True, None, None, None) )
result=model.fit(new_flux,x=new_phase,params=p,weights= None)
plt.scatter(new_phase,new_flux,marker='o',edgecolors='none',color='blue',s=5.0, label="Period: 2.0232 days")
plt.ylim([13.42,13.54])
plt.xlim(0,2)
plt.gca().invert_yaxis()
plt.title('HD 240121 Light Curve with BJD Correction')
plt.ylabel('KELT Instrumental Magnitude')
plt.xlabel('Phase')
legend = plt.legend(loc='lower right', shadow=True)
plt.scatter(new_phase,result.best_fit,label="One Oscillation Fit", color='red',s=60.0)
plt.savefig('NewEpoch.png')
print result.fit_report()
I am trying to fit a sine function to phased light curve data for a research project. However, I am unsure as to where I am going wrong, and I believe it lays in my parameters. It appears that the fit has an amplitude that is too high, and a period that is too long. Any help would be appreciated. Thank you!
This is what the graph looks like now (Attempt at fitting a sine function to my dataset):
A couple of comments/suggestions:
First, it is almost certainly better to replace
p = Parameters()
p.add_many(('new_flux',13.42, True, None, None, None) )
p.add_many(('new_phase',0,True, None, None, None) )
with
p = Parameters()
p.add('new_flux', value=13.42, vary=True)
p.add('new_phase', value=0, vary=True)
Second, your model does not include a DC offset, but your data clearly has one. The offset is approximately 13.4 and the amplitude of the sine wave is approximately 0.05. While you're at it, you probably want to include a scale the phase as a well as an offset, so that the model is
offset + amplitude * sin(scale*x + phase_shift)
You don't necessarily have to vary all of those, but making your model more general will allow to see how the phase shift and scale are correlated -- given the noise level in your data, that might be important.
With the more general model, you can try a few sets of parameter values, using model.eval() to evaluate a model with a set of Parameters. Once you have a better model and reasonable starting points, you should get a reasonable fit.
How could we help you with your uncommented code?
How do we know what is what and what should it do?
What method for fitting are you using?
Where is the data and in what form ?
I would start with computing the approximate sin wave parameters. Let assume you got some input data in form of n points with x,y coordinates. And want to fit a sin wave:
y(t) = y0+A*sin(x0+x(t)*f)
Where y0 is the y offset, x0 is phase offset, A is amplitude and f is angular frequency.
I would:
Compute avg y value
y0 = sum(data[i].y)/n where i={0,1,2,...n-1}
this is the mean value representing possible y offset y0 of your sin wave.
compute avg distance to y0
d = sum (|data[i].y-y0|)/n where i={0,1,2,...n-1}
If my memory serves well this should be the effective value of amplitude so:
A = sqrt(2)*d
find zero crossings in the dataset
for this the dataset should be sorted by x so sort it if it is not. Remember index i for: first crossing i0, last crossing i1 and number of crossings found j from this we can estimate frequency and phase offset:
f=M_PI*double(j-1)/(datax[i1]-datax[i0]);
x0=-datax[i0]*f;
To determine which half sin wave we aligned to just check the sign of middle point between first two zero crossings
i1=i0+((i1-i0)/(j-1));
if (datay[(i0+i1)>>1]<=y0) x0+=M_PI;
Or check for specific zero crossing pattern instead.
That is all now we have approximate x0,y0,f,A parametters of sinwave.
Here some C++ code I tested with (sorry I do not use Python):
//---------------------------------------------------------------------------
#include <math.h>
// input data
const int n=5000;
double datax[n];
double datay[n];
// fitted sin wave
double A,x0,y0,f;
//---------------------------------------------------------------------------
void data_generate() // genere random noisy sinvawe
{
int i;
double A=150.0,x0=250.0,y0=200.0,f=0.03,r=20.0;
double x,y;
Randomize();
for (i=0;i<n;i++)
{
x=800.0*double(i)/double(n);
y=y0+A*sin(x0+x*f);
datax[i]=x+r*Random();
datay[i]=y+r*Random();
}
}
//---------------------------------------------------------------------------
void data_fit() // find raw approximate of x0,y0,f,A
{
int i,j,e,i0,i1;
double x,y,q0,q1;
// y0 = avg(y)
for (y0=0.0,i=0;i<n;i++) y0+=datay[i]; y0/=double(n);
// A = avg(|y-y0|)
for (A=0.0,i=0;i<n;i++) A+=fabs(datay[i]-y0); A/=double(n); A*=sqrt(2.0);
// bubble sort data by x asc
for (e=1,j=n;e;j--)
for (e=0,i=1;i<j;i++)
if (datax[i-1]>datax[i])
{
x=datax[i-1]; datax[i-1]=datax[i]; datax[i]=x;
y=datay[i-1]; datay[i-1]=datay[i]; datay[i]=y;
e=1;
}
// find zero crossings
for (i=0,j=0;i<n;)
{
// find value below zero
for (;i<n;i++) if (datay[i]-y0<=-0.75*A) break; e=i;
// find value above zero
for (;i<n;i++) if (datay[i]-y0>=+0.75*A) break;
if (i>=n) break;
// find point closest to zero
for (i1=e;e<i;e++)
if (fabs(datay[i1]-y0)>fabs(datay[e]-y0)) i1=e;
if (!j) i0=i1; j++;
}
f=2.0*M_PI*double(j-1)/(datax[i1]-datax[i0]);
x0=-datax[i0]*f;
}
//---------------------------------------------------------------------------
And preview:
The dots are generated noisy data and blue curve is fitted sin wave.
On top of all this you can build your fitting to increase precision. Does not matter which method you will use for the search around found parameters. For example I would go for:
How approximation search works

Return progress status when drawing a large NetworkX graph

I have a large graph that I'm drawing that is taking a long time to
process.
Is it possible to return a status, current_node, or percentage of the current status of the drawing?
I'm not looking to incrementally draw the network as all I'm doing it is saving it to a high dpi image.
Here's an example of the code I'm using:
path = nx.shortest_path(G, source=u'1234', target=u'98765')
path_edges = zip(path, path[1:])
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G,pos,nodelist=path,node_color='r')
nx.draw_networkx_edges(G,pos,edgelist=path_edges,edge_color='r',width=10)
plt.axis('equal')
plt.savefig('prototype_map.png', dpi=1000)
plt.show()
I believe the only way to do it is to accommodate the source code of draw function to print something saying 10%, 20% complete.... But when I checked the source code of draw_networkx_nodes & draw_networkx, I realized that it is not a straight forward task as the draw function stores the positions (nodes and edges) in a numpy array, send it to the ax.scatter function of matplotlib (sourcecode) which is a bit hard to manipulate without messing something up. The only thing I can think of is to change:
xy = numpy.asarray([pos[v] for v in nodelist]) # In draw_networkx_nodes function
To
xy = []
count = 0
for v in nodelist:
xy.append(pos[v])
count +=1
if (count == len(nodelist)):
print '50% of nodes completed'
print '100% of nodes completed'
xy = numpy.asarray(xy)
Similarly when draw_network_edges is called, to indicate progress in edges drawing. I am not sure how far from truth this will be because I do not know how much time is spent in the ax.scatter function. I also, looked in the source code of the scatter function but I could not pin point a loop or something to print an indication that some progress has been done.
Some layout functions accept pos argument to conduct incremental work. We can use this fact to split the computation into chunks and draw a progress bar using tqdm
def plot_graph(g, iterations=50, pos=None, k_numerator=None, figsize=(10, 10)):
if k_numerator is None:
k = None
else:
k = k_numerator / np.sqrt(g.number_of_nodes())
with tqdm(total=iterations) as pbar:
step = 5
iterations_done = 0
while iterations_done < iterations:
pos = nx.layout.fruchterman_reingold_layout(
g, iterations=step, pos=pos, k=k
)
iterations_done += step
pbar.update(step)
fig = plt.figure(figsize=figsize, dpi=120)
nx.draw_networkx(
g,
pos,
)
return fig, pos

How to calculate dice coefficient for measuring accuracy of image segmentation in python

I have an image of land cover and I segmented it using K-means clustering. Now I want to calculate the accuracy of my segmentation algorithm. I read somewhere that dice co-efficient is the substantive evaluation measure. But I am not sure how to calculate it.
I use Python 2.7
Are there any other effective evaluation methods? Please give a summary or a link to a source. Thank You!
Edits:
I used the following code for measuring the dice similarity for my original and the segmented image but it seems to take hours to calculate:
for i in xrange(0,7672320):
for j in xrange(0,3):
dice = np.sum([seg==gt])*2.0/(np.sum(seg)+np.sum(gt)) #seg is the segmented image and gt is the original image. Both are of same size
Please refer to Dice similarity coefficient at wiki
A sample code segment here for your reference. Please note that you need to replace k with your desired cluster since you are using k-means.
import numpy as np
k=1
# segmentation
seg = np.zeros((100,100), dtype='int')
seg[30:70, 30:70] = k
# ground truth
gt = np.zeros((100,100), dtype='int')
gt[30:70, 40:80] = k
dice = np.sum(seg[gt==k])*2.0 / (np.sum(seg) + np.sum(gt))
print 'Dice similarity score is {}'.format(dice)
If you are working with opencv you could use the following function:
import cv2
import numpy as np
#load images
y_pred = cv2.imread('predictions/image_001.png')
y_true = cv2.imread('ground_truth/image_001.png')
# Dice similarity function
def dice(pred, true, k = 1):
intersection = np.sum(pred[true==k]) * 2.0
dice = intersection / (np.sum(pred) + np.sum(true))
return dice
dice_score = dice(y_pred, y_true, k = 255) #255 in my case, can be 1
print ("Dice Similarity: {}".format(dice_score))
In case you want to evaluate with this metric within a deep learning model using tensorflow you can use the following:
def dice_coef(y_true, y_pred):
y_true_f = tf.reshape(tf.dtypes.cast(y_true, tf.float32), [-1])
y_pred_f = tf.reshape(tf.dtypes.cast(y_pred, tf.float32), [-1])
intersection = tf.reduce_sum(y_true_f * y_pred_f)
return (2. * intersection + 1.) / (tf.reduce_sum(y_true_f) + tf.reduce_sum(y_pred_f) + 1.)
This is an important clarification if what you're using has more than 2 classes (aka, a mask with 1 and 0).
If you are using multiple classes, make sure to specify that the prediction and ground truth also equal the value which you want. Otherwise you can end up getting DSC values greater than 1.
This is the extra ==k at the end of each [] statement:
import numpy as np
k=1
# segmentation
seg = np.zeros((100,100), dtype='int')
seg[30:70, 30:70] = k
# ground truth
gt = np.zeros((100,100), dtype='int')
gt[30:70, 40:80] = k
dice = np.sum(seg[gt==k]==k)*2.0 / (np.sum(seg[seg==k]==k) + np.sum(gt[gt==k]==k))
print 'Dice similarity score is {}'.format(dice)