I have a computation graph built with Theano. It goes like this:
import theano
from theano import tensor as T
import numpy as np
W1 = theano.shared( np.random.rand(45,32).astype('float32'), 'W1')
b1 = theano.shared( np.random.rand(32).astype('float32'), 'b1')
W2 = theano.shared( np.random.rand(32,3).astype('float32'), 'W2')
b2 = theano.shared( np.random.rand(3).astype('float32'), 'b2')
input = T.matrix('input')
hidden = T.tanh(T.dot(input, W1)+b1)
output = T.nnet.softmax(T.dot(hidden, W2)+b2)
Now, the mapping from a vector to a vector. However, input is set as a matrix type so I can pass many vectors through the mapping simultaneously. I'm doing some machine learning and this makes the learning phase more efficient.
The problem is that after the learning phase, I'd like to view the mapping as vector to vector so I can compute:
jac = theano.gradient.jacobian(output, wrt=input)
jacobian complains that input is not TensorType(float32, vector). Is there a way I can change the input tensor type without rebuilding the whole computation graph?
Technically, this a possible solution:
import theano
from theano import tensor as T
import numpy as np
W1 = theano.shared( np.random.rand(45,32).astype('float32'), 'W1')
b1 = theano.shared( np.random.rand(32).astype('float32'), 'b1')
W2 = theano.shared( np.random.rand(32,3).astype('float32'), 'W2')
b2 = theano.shared( np.random.rand(3).astype('float32'), 'b2')
input = T.vector('input') # it will be reshaped!
hidden = T.tanh(T.dot(input.reshape((-1, 45)), W1)+b1)
output = T.nnet.softmax(T.dot(hidden, W2)+b2)
#Here comes the trick
jac = theano.gradient.jacobian(output.reshape((-1,)), wrt=input).reshape((-1, 45, 3))
In this way jac.eval({input: np.random.rand(10*45)}).shape will result (100, 45, 3)!
The problem is that it calculates the derivative across the batch index. So in theory the first 1x45 number can effect all the 10x3 outputs (in a batch of length 10).
For that, there are several solutions.
You could take the diagonal across the first two axes, but unfortunately Theano does not implement it, numpy does!
I think it can be done with a scan, but this is an other matter.
Related
I have a very large 2D numpy array (~5e8 values). I have labeled that array using scipy.ndimage.label I then want to find a random index of the flattened array that contains each label. I can do this with:
import numpy as np
from scipy.ndimage import label
base_array = np.random.randint(0, 5, (100000, 5000))
labeled_array, nlabels = label(base_array)
for label_num in xrange(1, nlabels+1):
indices = np.where(labeled_array.flat == label_num)[0]
index = np.random.choice(indices)
But, it is slow with an array this large. I have also tried replacing the np.where with:
indices = np.argwhere(labeled_array.flat == label).squeeze()
And found it to be slower. I have a suspicion that the boolean masking is the slow part. Is there anyway to speed this up, or a better way to do this. I will say in my real application the array is fairly sparse with about 25% fill, though I have no experience with scipy's sparse array functions.
Your suspicion that masking separately for each label is expensive is correct, because no matter how you do it the masking will always be O(n).
We can circumvent this by argsorting by label and then randomly picking from each block of equal labels.
Since the labels are an integer range we can get the argsort cheaper than np.argsort by using some sparse matrix machinery available in scipy.
As my machine doesn't have an awful lot of ram I had to shrink your example a bit (factor 4). It then runs in about 5 seconds.
import numpy as np
from scipy.ndimage import label
from scipy import sparse
def multi_randint(bins):
"""draw one random int from each range(bins[i], bins[i+1])"""
high = np.diff(bins)
n = high.size
pick = np.random.randint(0, 1<<30, (n,))
reject = np.flatnonzero(pick + (1<<30) % high >= (1<<30))
while reject.size:
npick = np.random.randint(0, 1<<30, (reject.size,))
rejrej = npick + (1<<30) % sizes[reject] >= (1<<30)
pick[reject] = npick
reject = reject[rejrej]
return bins[:-1] + pick % high
# build mock data, note that I had to shrink by 4x b/c memory
base_array = np.random.randint(0, 5, (50000, 2500), dtype=np.int8)
labeled_array, nlabels = label(base_array)
# build auxiliary sparse matrix
h = sparse.csr_matrix(
(np.ones(labeled_array.size, bool), labeled_array.ravel(),
np.arange(labeled_array.size+1, dtype=np.int32)),
(labeled_array.size, nlabels+1))
# conversion to csc argsorts the labels (but cheaper than argsort)
h = h.tocsc()
# draw
result = h.indices[multi_randint(h.indptr)]
# check result
assert len(set(labeled_array.ravel()[result])) == nlabels+1
I have a simple Python code which reads two matrices in from a file, and each matrix element contains two undefined variables, A and B. After being read in, A and B are assigned values and the string is evaluated using eval() (I am aware of the security issues with this, but this code will only ever be used by me). A numpy array is formed for each of the matrices and the general eigenvalue problem is solved using both matrices. The code is as follows:
from __future__ import division
from scipy.linalg import eigh
import numpy as np
Mat_size = 4 ## Define matrix size
## Input the two matrices from each side of the generalised eigenvalue problem
HHfile = "Matrix_4x4HH.mat" ## left
SSfile = "Matrix_4x4SS.mat" ## right
## Read matrix files
f1 = open(HHfile, 'r')
HH_data = f1.read()
f2 = open(SSfile, 'r')
SS_data = f2.read()
# Define dictionary of variables
trans = {'A':2.1,
'B':3,
}
HHmat = eval(HH_data, trans)
SSmat = eval(SS_data, trans)
## Convert to numpy array
HHmat2 = np.array(HHmat)
SSmat2 = np.array(SSmat)
## Form correct matrix dimensions
shape = ( Mat_size, Mat_size )
HH = HHmat2.reshape( shape )
SS = SSmat2.reshape( shape )
ev, ew = eigh(HH,SS)
## solve eigenvalue problem
eigenValues,eigenVectors = eigh(HH,SS)
print("Eigenvalue:",eigenValues[0])
This works and outputs an answer. However I wish to vary the parameters A and B to minimise the value that it gives me. (Something like e.g.):
def func_min(x):
A, B = x
ev, ew=eigh(HH,SS)
return ev[0]
x0 = np.array([2.1, 3]) # Preliminary guess
minimize(func_min, x0)
I am having an issue with the fact that A and B have to be defined in order for the eval call to work; thus disallowing the optimisation routine as A and B cannot be varied since they have specified values. I have tried a few different methods of reading in the files (pandas, csv, np.genfromtxt etc...) but always seem to be left with the same issue. Is there a better way, or a modification to the current code to implement this?
Here are the inputs from the files. I was unsure of how to include them, but I can use a file upload service if that is better.
Input from Matrix_4x4HH.mat
-.7500000000000000/B**3*A**5/(A+B)**6-4.500000000000000/B**2*A**4/(A+B)**6-12./B*A**3/(A+B)**6-19.50000000000000*A**2/(A+B)**6-118.5000000000000*B*A/(A+B)**6-19.50000000000000*B**2/(A+B)**6-12.*B**3/A/(A+B)**6-4.500000000000000*B**4/A**2/(A+B)**6-.7500000000000000*B**5/A**3/(A+B)**6+2./B**3*A**4/(A+B)**6+13./B**2*A**3/(A+B)**6+36./B*A**2/(A+B)**6+165.*A/(A+B)**6+165.*B/(A+B)**6+36.*B**2/A/(A+B)**6+13.*B**3/A**2/(A+B)**6+2.*B**4/A**3/(A+B)**6,-.3750000000000000/B**3*A**5/(A+B)**6-2.125000000000000/B**2*A**4/(A+B)**6-4.750000000000000/B*A**3/(A+B)**6-23.87500000000000*A**2/(A+B)**6-1.750000000000000*B*A/(A+B)**6-23.87500000000000*B**2/(A+B)**6-4.750000000000000*B**3/A/(A+B)**6-2.125000000000000*B**4/A**2/(A+B)**6-.3750000000000000*B**5/A**3/(A+B)**6-1.500000000000000/B**2*A**3/(A+B)**6-5.500000000000000/B*A**2/(A+B)**6-25.*A/(A+B)**6-25.*B/(A+B)**6-5.500000000000000*B**2/A/(A+B)**6-1.500000000000000*B**3/A**2/(A+B)**6,1.500000000000000/B**3*A**6/(A+B)**7+10.12500000000000/B**2*A**5/(A+B)**7+29.12500000000000/B*A**4/(A+B)**7+45.25000000000000*A**3/(A+B)**7-135.1250000000000*B*A**2/(A+B)**7+298.7500000000000*B**2*A/(A+B)**7+14.87500000000000*B**3/(A+B)**7-5.750000000000000*B**4/A/(A+B)**7-2.375000000000000*B**5/A**2/(A+B)**7-.3750000000000000*B**6/A**3/(A+B)**7-4./B**3*A**5/(A+B)**7-27./B**2*A**4/(A+B)**7-76.50000000000000/B*A**3/(A+B)**7-9.500000000000000*A**2/(A+B)**7-305.*B*A/(A+B)**7-362.*B**2/(A+B)**7-14.50000000000000*B**3/A/(A+B)**7-1.500000000000000*B**4/A**2/(A+B)**7,-.3750000000000000/B**3*A**6/(A+B)**7-2.375000000000000/B**2*A**5/(A+B)**7-5.750000000000000/B*A**4/(A+B)**7+14.87500000000000*A**3/(A+B)**7+298.7500000000000*B*A**2/(A+B)**7-135.1250000000000*B**2*A/(A+B)**7+45.25000000000000*B**3/(A+B)**7+29.12500000000000*B**4/A/(A+B)**7+10.12500000000000*B**5/A**2/(A+B)**7+1.500000000000000*B**6/A**3/(A+B)**7-1.500000000000000/B**2*A**4/(A+B)**7-14.50000000000000/B*A**3/(A+B)**7-362.*A**2/(A+B)**7-305.*B*A/(A+B)**7-9.500000000000000*B**2/(A+B)**7-76.50000000000000*B**3/A/(A+B)**7-27.*B**4/A**2/(A+B)**7-4.*B**5/A**3/(A+B)**7,-.3750000000000000/B**3*A**5/(A+B)**6-2.125000000000000/B**2*A**4/(A+B)**6-4.750000000000000/B*A**3/(A+B)**6-23.87500000000000*A**2/(A+B)**6-1.750000000000000*B*A/(A+B)**6-23.87500000000000*B**2/(A+B)**6-4.750000000000000*B**3/A/(A+B)**6-2.125000000000000*B**4/A**2/(A+B)**6-.3750000000000000*B**5/A**3/(A+B)**6-1.500000000000000/B**2*A**3/(A+B)**6-5.500000000000000/B*A**2/(A+B)**6-25.*A/(A+B)**6-25.*B/(A+B)**6-5.500000000000000*B**2/A/(A+B)**6-1.500000000000000*B**3/A**2/(A+B)**6,-1.500000000000000/B**3*A**5/(A+B)**6-10.50000000000000/B**2*A**4/(A+B)**6-35./B*A**3/(A+B)**6-101.5000000000000*A**2/(A+B)**6-343.*B*A/(A+B)**6-101.5000000000000*B**2/(A+B)**6-35.*B**3/A/(A+B)**6-10.50000000000000*B**4/A**2/(A+B)**6-1.500000000000000*B**5/A**3/(A+B)**6+2./B**3*A**4/(A+B)**6+16./B**2*A**3/(A+B)**6+45./B*A**2/(A+B)**6+201.*A/(A+B)**6+201.*B/(A+B)**6+45.*B**2/A/(A+B)**6+16.*B**3/A**2/(A+B)**6+2.*B**4/A**3/(A+B)**6,.7500000000000000/B**3*A**6/(A+B)**7+5.625000000000000/B**2*A**5/(A+B)**7+18.87500000000000/B*A**4/(A+B)**7+19.62500000000000*A**3/(A+B)**7+170.3750000000000*B*A**2/(A+B)**7+73.87500000000000*B**2*A/(A+B)**7+57.62500000000000*B**3/(A+B)**7+4.875000000000000*B**4/A/(A+B)**7+.3750000000000000*B**5/A**2/(A+B)**7+1.500000000000000/B**2*A**4/(A+B)**7+7.500000000000000/B*A**3/(A+B)**7-1.*A**2/(A+B)**7+39.*B*A/(A+B)**7+47.50000000000000*B**2/(A+B)**7+1.500000000000000*B**3/A/(A+B)**7,.3750000000000000/B**2*A**5/(A+B)**7+4.875000000000000/B*A**4/(A+B)**7+57.62500000000000*A**3/(A+B)**7+73.87500000000000*B*A**2/(A+B)**7+170.3750000000000*B**2*A/(A+B)**7+19.62500000000000*B**3/(A+B)**7+18.87500000000000*B**4/A/(A+B)**7+5.625000000000000*B**5/A**2/(A+B)**7+.7500000000000000*B**6/A**3/(A+B)**7+1.500000000000000/B*A**3/(A+B)**7+47.50000000000000*A**2/(A+B)**7+39.*B*A/(A+B)**7-1.*B**2/(A+B)**7+7.500000000000000*B**3/A/(A+B)**7+1.500000000000000*B**4/A**2/(A+B)**7,1.500000000000000/B**3*A**6/(A+B)**7+10.12500000000000/B**2*A**5/(A+B)**7+29.12500000000000/B*A**4/(A+B)**7+45.25000000000000*A**3/(A+B)**7-135.1250000000000*B*A**2/(A+B)**7+298.7500000000000*B**2*A/(A+B)**7+14.87500000000000*B**3/(A+B)**7-5.750000000000000*B**4/A/(A+B)**7-2.375000000000000*B**5/A**2/(A+B)**7-.3750000000000000*B**6/A**3/(A+B)**7-4./B**3*A**5/(A+B)**7-27./B**2*A**4/(A+B)**7-76.50000000000000/B*A**3/(A+B)**7-9.500000000000000*A**2/(A+B)**7-305.*B*A/(A+B)**7-362.*B**2/(A+B)**7-14.50000000000000*B**3/A/(A+B)**7-1.500000000000000*B**4/A**2/(A+B)**7,.7500000000000000/B**3*A**6/(A+B)**7+5.625000000000000/B**2*A**5/(A+B)**7+18.87500000000000/B*A**4/(A+B)**7+19.62500000000000*A**3/(A+B)**7+170.3750000000000*B*A**2/(A+B)**7+73.87500000000000*B**2*A/(A+B)**7+57.62500000000000*B**3/(A+B)**7+4.875000000000000*B**4/A/(A+B)**7+.3750000000000000*B**5/A**2/(A+B)**7+1.500000000000000/B**2*A**4/(A+B)**7+7.500000000000000/B*A**3/(A+B)**7-1.*A**2/(A+B)**7+39.*B*A/(A+B)**7+47.50000000000000*B**2/(A+B)**7+1.500000000000000*B**3/A/(A+B)**7,-5.250000000000000/B**3/(A+B)**8*A**7-40.50000000000000/B**2/(A+B)**8*A**6-138.7500000000000/B/(A+B)**8*A**5-280.7500000000000/(A+B)**8*A**4-629.7500000000000*B/(A+B)**8*A**3+770.2500000000000*B**2/(A+B)**8*A**2-836.7500000000000*B**3/(A+B)**8*A-180.2500000000000*B**4/(A+B)**8-52.*B**5/(A+B)**8/A-12.75000000000000*B**6/(A+B)**8/A**2-1.500000000000000*B**7/(A+B)**8/A**3+14./B**3/(A+B)**8*A**6+107./B**2/(A+B)**8*A**5+355./B/(A+B)**8*A**4+778./(A+B)**8*A**3+284.*B/(A+B)**8*A**2+597.*B**2/(A+B)**8*A+911.*B**3/(A+B)**8+100.*B**4/(A+B)**8/A+20.*B**5/(A+B)**8/A**2+2.*B**6/(A+B)**8/A**3,.7500000000000000/B**3/(A+B)**8*A**7+5.625000000000000/B**2/(A+B)**8*A**6+18.25000000000000/B/(A+B)**8*A**5+52.50000000000000/(A+B)**8*A**4+397.*B/(A+B)**8*A**3-2356.250000000000*B**2/(A+B)**8*A**2+397.*B**3/(A+B)**8*A+52.50000000000000*B**4/(A+B)**8+18.25000000000000*B**5/(A+B)**8/A+5.625000000000000*B**6/(A+B)**8/A**2+.7500000000000000*B**7/(A+B)**8/A**3+1.500000000000000/B**2/(A+B)**8*A**5+10.50000000000000/B/(A+B)**8*A**4-276.5000000000000/(A+B)**8*A**3+1848.500000000000*B/(A+B)**8*A**2+1848.500000000000*B**2/(A+B)**8*A-276.5000000000000*B**3/(A+B)**8+10.50000000000000*B**4/(A+B)**8/A+1.500000000000000*B**5/(A+B)**8/A**2,-.3750000000000000/B**3*A**6/(A+B)**7-2.375000000000000/B**2*A**5/(A+B)**7-5.750000000000000/B*A**4/(A+B)**7+14.87500000000000*A**3/(A+B)**7+298.7500000000000*B*A**2/(A+B)**7-135.1250000000000*B**2*A/(A+B)**7+45.25000000000000*B**3/(A+B)**7+29.12500000000000*B**4/A/(A+B)**7+10.12500000000000*B**5/A**2/(A+B)**7+1.500000000000000*B**6/A**3/(A+B)**7-1.500000000000000/B**2*A**4/(A+B)**7-14.50000000000000/B*A**3/(A+B)**7-362.*A**2/(A+B)**7-305.*B*A/(A+B)**7-9.500000000000000*B**2/(A+B)**7-76.50000000000000*B**3/A/(A+B)**7-27.*B**4/A**2/(A+B)**7-4.*B**5/A**3/(A+B)**7,.3750000000000000/B**2*A**5/(A+B)**7+4.875000000000000/B*A**4/(A+B)**7+57.62500000000000*A**3/(A+B)**7+73.87500000000000*B*A**2/(A+B)**7+170.3750000000000*B**2*A/(A+B)**7+19.62500000000000*B**3/(A+B)**7+18.87500000000000*B**4/A/(A+B)**7+5.625000000000000*B**5/A**2/(A+B)**7+.7500000000000000*B**6/A**3/(A+B)**7+1.500000000000000/B*A**3/(A+B)**7+47.50000000000000*A**2/(A+B)**7+39.*B*A/(A+B)**7-1.*B**2/(A+B)**7+7.500000000000000*B**3/A/(A+B)**7+1.500000000000000*B**4/A**2/(A+B)**7,.7500000000000000/B**3/(A+B)**8*A**7+5.625000000000000/B**2/(A+B)**8*A**6+18.25000000000000/B/(A+B)**8*A**5+52.50000000000000/(A+B)**8*A**4+397.*B/(A+B)**8*A**3-2356.250000000000*B**2/(A+B)**8*A**2+397.*B**3/(A+B)**8*A+52.50000000000000*B**4/(A+B)**8+18.25000000000000*B**5/(A+B)**8/A+5.625000000000000*B**6/(A+B)**8/A**2+.7500000000000000*B**7/(A+B)**8/A**3+1.500000000000000/B**2/(A+B)**8*A**5+10.50000000000000/B/(A+B)**8*A**4-276.5000000000000/(A+B)**8*A**3+1848.500000000000*B/(A+B)**8*A**2+1848.500000000000*B**2/(A+B)**8*A-276.5000000000000*B**3/(A+B)**8+10.50000000000000*B**4/(A+B)**8/A+1.500000000000000*B**5/(A+B)**8/A**2,-1.500000000000000/B**3/(A+B)**8*A**7-12.75000000000000/B**2/(A+B)**8*A**6-52./B/(A+B)**8*A**5-180.2500000000000/(A+B)**8*A**4-836.7500000000000*B/(A+B)**8*A**3+770.2500000000000*B**2/(A+B)**8*A**2-629.7500000000000*B**3/(A+B)**8*A-280.7500000000000*B**4/(A+B)**8-138.7500000000000*B**5/(A+B)**8/A-40.50000000000000*B**6/(A+B)**8/A**2-5.250000000000000*B**7/(A+B)**8/A**3+2./B**3/(A+B)**8*A**6+20./B**2/(A+B)**8*A**5+100./B/(A+B)**8*A**4+911./(A+B)**8*A**3+597.*B/(A+B)**8*A**2+284.*B**2/(A+B)**8*A+778.*B**3/(A+B)**8+355.*B**4/(A+B)**8/A+107.*B**5/(A+B)**8/A**2+14.*B**6/(A+B)**8/A**3
Input from Matrix_4x4SS.mat
1./B**3*A**3/(A+B)**6+6./B**2*A**2/(A+B)**6+15./B*A/(A+B)**6+84./(A+B)**6+15.*B/A/(A+B)**6+6.*B**2/A**2/(A+B)**6+1.*B**3/A**3/(A+B)**6,-.5000000000000000/B**3*A**3/(A+B)**6-3.500000000000000/B**2*A**2/(A+B)**6-9.500000000000000/B*A/(A+B)**6-53./(A+B)**6-9.500000000000000*B/A/(A+B)**6-3.500000000000000*B**2/A**2/(A+B)**6-.5000000000000000*B**3/A**3/(A+B)**6,-2./B**3*A**4/(A+B)**7-12.50000000000000/B**2*A**3/(A+B)**7-32.50000000000000/B*A**2/(A+B)**7+18.50000000000000*A/(A+B)**7-253.*B/(A+B)**7-17.50000000000000*B**2/A/(A+B)**7-4.500000000000000*B**3/A**2/(A+B)**7-.5000000000000000*B**4/A**3/(A+B)**7,-.5000000000000000/B**3*A**4/(A+B)**7-4.500000000000000/B**2*A**3/(A+B)**7-17.50000000000000/B*A**2/(A+B)**7-253.*A/(A+B)**7+18.50000000000000*B/(A+B)**7-32.50000000000000*B**2/A/(A+B)**7-12.50000000000000*B**3/A**2/(A+B)**7-2.*B**4/A**3/(A+B)**7,-.5000000000000000/B**3*A**3/(A+B)**6-3.500000000000000/B**2*A**2/(A+B)**6-9.500000000000000/B*A/(A+B)**6-53./(A+B)**6-9.500000000000000*B/A/(A+B)**6-3.500000000000000*B**2/A**2/(A+B)**6-.5000000000000000*B**3/A**3/(A+B)**6,2./B**3*A**3/(A+B)**6+14./B**2*A**2/(A+B)**6+38./B*A/(A+B)**6+212./(A+B)**6+38.*B/A/(A+B)**6+14.*B**2/A**2/(A+B)**6+2.*B**3/A**3/(A+B)**6,1./B**3*A**4/(A+B)**7+6.500000000000000/B**2*A**3/(A+B)**7+16.50000000000000/B*A**2/(A+B)**7-19.*A/(A+B)**7+118.*B/(A+B)**7+4.500000000000000*B**2/A/(A+B)**7+.5000000000000000*B**3/A**2/(A+B)**7,.5000000000000000/B**2*A**3/(A+B)**7+4.500000000000000/B*A**2/(A+B)**7+118.*A/(A+B)**7-19.*B/(A+B)**7+16.50000000000000*B**2/A/(A+B)**7+6.500000000000000*B**3/A**2/(A+B)**7+1.*B**4/A**3/(A+B)**7,-2./B**3*A**4/(A+B)**7-12.50000000000000/B**2*A**3/(A+B)**7-32.50000000000000/B*A**2/(A+B)**7+18.50000000000000*A/(A+B)**7-253.*B/(A+B)**7-17.50000000000000*B**2/A/(A+B)**7-4.500000000000000*B**3/A**2/(A+B)**7-.5000000000000000*B**4/A**3/(A+B)**7,1./B**3*A**4/(A+B)**7+6.500000000000000/B**2*A**3/(A+B)**7+16.50000000000000/B*A**2/(A+B)**7-19.*A/(A+B)**7+118.*B/(A+B)**7+4.500000000000000*B**2/A/(A+B)**7+.5000000000000000*B**3/A**2/(A+B)**7,7./B**3/(A+B)**8*A**5+50./B**2/(A+B)**8*A**4+154./B/(A+B)**8*A**3+331./(A+B)**8*A**2-147.*B/(A+B)**8*A+848.*B**2/(A+B)**8+80.*B**3/(A+B)**8/A+19.*B**4/(A+B)**8/A**2+2.*B**5/(A+B)**8/A**3,1./B**3/(A+B)**8*A**5+8.500000000000000/B**2/(A+B)**8*A**4+31./B/(A+B)**8*A**3-152.5000000000000/(A+B)**8*A**2+1568.*B/(A+B)**8*A-152.5000000000000*B**2/(A+B)**8+31.*B**3/(A+B)**8/A+8.500000000000000*B**4/(A+B)**8/A**2+1.*B**5/(A+B)**8/A**3,-.5000000000000000/B**3*A**4/(A+B)**7-4.500000000000000/B**2*A**3/(A+B)**7-17.50000000000000/B*A**2/(A+B)**7-253.*A/(A+B)**7+18.50000000000000*B/(A+B)**7-32.50000000000000*B**2/A/(A+B)**7-12.50000000000000*B**3/A**2/(A+B)**7-2.*B**4/A**3/(A+B)**7,.5000000000000000/B**2*A**3/(A+B)**7+4.500000000000000/B*A**2/(A+B)**7+118.*A/(A+B)**7-19.*B/(A+B)**7+16.50000000000000*B**2/A/(A+B)**7+6.500000000000000*B**3/A**2/(A+B)**7+1.*B**4/A**3/(A+B)**7,1./B**3/(A+B)**8*A**5+8.500000000000000/B**2/(A+B)**8*A**4+31./B/(A+B)**8*A**3-152.5000000000000/(A+B)**8*A**2+1568.*B/(A+B)**8*A-152.5000000000000*B**2/(A+B)**8+31.*B**3/(A+B)**8/A+8.500000000000000*B**4/(A+B)**8/A**2+1.*B**5/(A+B)**8/A**3,2./B**3/(A+B)**8*A**5+19./B**2/(A+B)**8*A**4+80./B/(A+B)**8*A**3+848./(A+B)**8*A**2-147.*B/(A+B)**8*A+331.*B**2/(A+B)**8+154.*B**3/(A+B)**8/A+50.*B**4/(A+B)**8/A**2+7.*B**5/(A+B)**8/A**3
Edit
I guess the important part of this question is how to read in an array of equations from a file into a numpy array and be able to assign values to the variables without the need for eval. As an example of what can usually be done using numpy:
import numpy as np
A = 1
B = 1
arr = np.array([[A+2*B,2],[2,B+6*(B+A)]])
print arr
This gives:
[[ 3 2]
[ 2 13]]
The numpy array contained algebraic terms and when printed the specified values of A and B were inserted. Is this possible with equations read from a file into a numpy array? I have looked through the numpy documentation and is the issue to do with the type of data being read in? i.e. I cannot specify the dtype to be int or float as each array element contains ints, floats and text. I feel there is a very simple aspect of Python I am missing that can do this.
I am trying to use the package pynfft in python 2.7 to do the non-uniform fast Fourier transform (nfft). I have learnt python for only two months, so I have some difficulties.
This is my code:
import numpy as np
from pynfft.nfft import NFFT
#loading data, 104 lines
t_diff, x_diff = np.loadtxt('data/analysis/amplitudes.dat', unpack = True)
N = [13,8]
M = 52
#fourier coefficients
f_hat = np.fft.fft(x_diff)/(2*M)
#instantiation
plan = NFFT(N,M)
#precomputation
x = t_diff
plan.x = x
plan.precompute()
# vector of non uniform samples
f = x_diff[0:M]
#execution
plan.f = f
plan.f_hat = f_hat
f = plan.trafo()
I am basically following the instructions I found in the pynfft tutorial (http://pythonhosted.org/pyNFFT/tutorial.html).
I need the nfft because the time intervals in which my data are taken are not constant (I mean, the first measure is taken at t, the second after dt, the third after dt+dt' with dt' different from dt and so on).
The pynfft package wants the vector of the fourier coefficients ("f_hat") before execution, so I calculated it using numpy.fft, but I am not sure this procedure is correct. Is there another way to do it (maybe with the nfft)?
I would like also to calculate the frquencies; I know that with numpy.fft there is a command: is ther anything like that also for pynfft? I did not find anything in the tutorial.
Thank you for any advice you can give me.
Here is a working example, taken from here:
First we define the function we want to reconstruct, which is the sum of four harmonics:
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(12345)
%pylab inline --no-import-all
# function we want to reconstruct
k=[1,5,10,30] # modulating coefficients
def myf(x,k):
return sum(np.sin(x*k0*(2*np.pi)) for k0 in k)
x=np.linspace(-0.5,0.5,1000) # 'continuous' time/spatial domain; -0.5<x<+0.5
y=myf(x,k) # 'true' underlying trigonometric function
fig=plt.figure(1,(20,5))
ax =fig.add_subplot(111)
ax.plot(x,y,'red')
ax.plot(x,y,'r.')
# we should sample at a rate of >2*~max(k)
M=256 # number of nodes
N=128 # number of Fourier coefficients
nodes =np.random.rand(M)-0.5 # non-uniform oversampling
values=myf(nodes,k) # nodes&values will be used below to reconstruct
# original function using the Solver
ax.plot(nodes,values,'bo')
ax.set_xlim(-0.5,+0.5)
The we initialize and run the Solver:
from pynfft import NFFT, Solver
f = np.empty(M, dtype=np.complex128)
f_hat = np.empty([N,N], dtype=np.complex128)
this_nfft = NFFT(N=[N,N], M=M)
this_nfft.x = np.array([[node_i,0.] for node_i in nodes])
this_nfft.precompute()
this_nfft.f = f
ret2=this_nfft.adjoint()
print this_nfft.M # number of nodes, complex typed
print this_nfft.N # number of Fourier coefficients, complex typed
#print this_nfft.x # nodes in [-0.5, 0.5), float typed
this_solver = Solver(this_nfft)
this_solver.y = values # '''right hand side, samples.'''
#this_solver.f_hat_iter = f_hat # assign arbitrary initial solution guess, default is 0
this_solver.before_loop() # initialize solver internals
while not np.all(this_solver.r_iter < 1e-2):
this_solver.loop_one_step()
Finally, we display the frequencies:
import matplotlib.pyplot as plt
fig=plt.figure(1,(20,5))
ax =fig.add_subplot(111)
foo=[ np.abs( this_solver.f_hat_iter[i][0])**2 for i in range(len(this_solver.f_hat_iter) ) ]
ax.plot(np.abs(np.arange(-N/2,+N/2,1)),foo)
cheers
I can use np.polyfit to fit a line in my scatter plot as shown bellow
a = np.array([1.08,2.05,1.56,0.73,1.1,0.73,0.34,0.73,0.88,2.05])
b=np.array([4.72131259, 6.60937492, 6.41485738, 6.82386894, 6.20293278, 7.22670489, 6.15681295, 5.91595178, 6.43917035, 6.64453907])
m1, b1 = np.polyfit(a, b, 1)
corr1 =a1.plot(a, m1*a+b1, '-', color='black')
a1.scatter(a, b)
Is there any way to fit a line using polyfit this time taking the errors for my points as shown bellow?
ae = np.empty(10)
ae.fill(0.15)
be = ae
sca1=a1.errorbar(a, b, ae, be, capsize=0, ls='none', color='black', elinewidth=1)
If you want to compute the fit and plot the (fixed at the given value) error bars over the fit points, like this:
Then this code will do the job:
import numpy as np
import matplotlib.pyplot as mp
a = np.array([1.08,2.05,1.56,0.73,1.1,0.73,0.34,0.73,0.88,2.05])
b=np.array([4.72131259, 6.60937492, 6.41485738, 6.82386894, 6.20293278, 7.22670489, 6.15681295, 5.91595178, 6.43917035, 6.64453907])
ae = np.empty(10)
ae.fill(0.15)
be = ae
m1, b1 = np.polyfit(a, b, 1)
mp.figure()
corr1 =mp.errorbar(a,m1*a+b1,ae,be, '-', color='black')
mp.scatter(a, b)
mp.show()
If you want to get the covariance of the fit, and use the standard deviation to set the error bars, instead use the code
import numpy as np
import matplotlib.pyplot as mp
import math
a = np.array([1.08,2.05,1.56,0.73,1.1,0.73,0.34,0.73,0.88,2.05])
b=np.array([4.72131259, 6.60937492, 6.41485738, 6.82386894, 6.20293278, 7.22670489, 6.15681295, 5.91595178, 6.43917035, 6.64453907])
coeff,covar = np.polyfit(a, b, 1,cov=True)
m1= coeff[0]
b1= coeff[1]
xe = math.sqrt(covar[0][0])
ye = math.sqrt(covar[1][2])
mp.figure()
corr1 =mp.errorbar(a,m1*a+b1,xe,ye, '-', color='black')
mp.scatter(a, b)
mp.show()
which gives a plot like this:
If you want to do a weighted fit, you can supply a weight vector to polyfit with the syntax
m2, b2 = np.polyfit(a, b, 1,w=weightvector)
According to the documentation the weightvector should contain 1 over the standard deviation of the data points.
If you want to do a least squares fit weighted by errors in BOTH x and y, I don't think polyfit does this - it will accept a weight vector for one dimension.
To supply errors in both dimensions as weights you would have to use scipy.optimize.leastsq.
There is a documentation page at this link of the Scipy documentation about doing fits with scipy.optimize.leastsq. The example talks about a fit to power law, but clearly a straight line could be done as well.
For errors in one dimension (Y) here, an example using leastsq is:
import numpy as np
import matplotlib.pyplot as mp
import math
from scipy import optimize
a = np.array([1.08,2.05,1.56,0.73,1.1,0.73,0.34,0.73,0.88,2.05])
b=np.array([4.72131259, 6.60937492, 6.41485738, 6.82386894, 6.20293278, 7.22670489, 6.15681295, 5.91595178, 6.43917035, 6.64453907])
aerr = np.empty(10)
aerr.fill(0.15)
berr=aerr
# fit a straight line with scipy scipy.optimize.leastsq
# define our (line) fitting function
fitfunc = lambda p, x: p[0] + p[1] * x
errfunc = lambda p, x, y, err: (y - fitfunc(p, x)) / err
pinit = [1.0, -1.0]
out = optimize.leastsq(errfunc, pinit,args=(a, b, aerr), full_output=1)
coeff = out[0]
covar = out[1]
print 'coeff', coeff
print 'covar', covar
m1= coeff[1]
b1= coeff[0]
xe = math.sqrt(covar[0][0])
ye = math.sqrt(covar[1][1])
# plot results
mp.figure()
corr2 =mp.errorbar(a,m1*a+b1,xe,ye, '-', color='red')
mp.scatter(a, b)
mp.show()
To take into account errors in both X and Y, you would have to change the definition of errfunc to reflect the specific technique you are using to do that. If a lambda isn't convenient you can instead define a function that will do that. I can't comment further on this without knowing what technique is being used to weight by X and Y errors, there are several in the literature.
I am using matplotlib.pyplot to create histograms. I'm not actually interested in the plots of these histograms, but interested in the frequencies and bins (I know I can write my own code to do this, but would prefer to use this package).
I know I can do the following,
import numpy as np
import matplotlib.pyplot as plt
x1 = np.random.normal(1.5,1.0)
x2 = np.random.normal(0,1.0)
freq, bins, patches = plt.hist([x1,x1],50,histtype='step')
to create a histogram. All I need is freq[0], freq[1], and bins[0]. The problem occurs when I try and use,
freq, bins, patches = plt.hist([x1,x1],50,histtype='step')
in a function. For example,
def func(x, y, Nbins):
freq, bins, patches = plt.hist([x,y],Nbins,histtype='step') # create histogram
bincenters = 0.5*(bins[1:] + bins[:-1]) # center bins
xf= [float(i) for i in freq[0]] # convert integers to float
xf = [float(i) for i in freq[1]]
p = [ (bincenters[j], (1.0 / (xf[j] + yf[j] )) for j in range(Nbins) if (xf[j] + yf[j]) != 0]
Xt = [j for i,j in p] # separate pairs formed in p
Yt = [i for i,j in p]
Y = np.array(Yt) # convert to arrays for later fitting
X = np.array(Xt)
return X, Y # return arrays X and Y
When I call func(x1,x2,Nbins) and plot or print X and Y, I do not get my expected curve/values. I suspect it something to do with plt.hist, since there is a partial histogram in my plot.
I don't know if I'm understanding your question very well, but here, you have an example of a very simple home-made histogram (in 1D or 2D), each one inside a function, and properly called:
import numpy as np
import matplotlib.pyplot as plt
def func2d(x, y, nbins):
histo, xedges, yedges = np.histogram2d(x,y,nbins)
plt.plot(x,y,'wo',alpha=0.3)
plt.imshow(histo.T,
extent=[xedges.min(),xedges.max(),yedges.min(),yedges.max()],
origin='lower',
interpolation='nearest',
cmap=plt.cm.hot)
plt.show()
def func1d(x, nbins):
histo, bin_edges = np.histogram(x,nbins)
bin_center = 0.5*(bin_edges[1:] + bin_edges[:-1])
plt.step(bin_center,histo,where='mid')
plt.show()
x = np.random.normal(1.5,1.0, (1000,1000))
func1d(x[0],40)
func2d(x[0],x[1],40)
Of course, you may check if the centering of the data is right, but I think that the example shows some useful things about this topic.
My recommendation: Try to avoid any loop in your code! They kill the performance. If you look, In my example there aren't loops. The best practice in numerical problems with python is avoiding loops! Numpy has a lot of C-implemented functions that do all the hard looping work.
You can use np.histogram2d (for 2D histogram) or np.histogram (for 1D histogram):
hst = np.histogram(A, bins)
hst2d = np.histogram2d(X,Y,bins)
Output form will be the same as plt.hist and plt.hist2d, the only difference is there is no plot.
No.
But you can bypass the pyplot:
import matplotlib.pyplot
fig = matplotlib.figure.Figure()
ax = matplotlib.axes.Axes(fig, (0,0,0,0))
numeric_results = ax.hist(data)
del ax, fig
It won't impact active axes and figures, so it would be ok to use it even in the middle of plotting something else.
This is because any usage of plt.draw_something() will put the plot in current axis - which is a global variable.
If you would like to simply compute the histogram (that is, count the number of points in a given bin) and not display it, the np.histogram() function is available