Accessing specific pairwise distances in a distance matrix (scipy / numpy) - python-2.7

I am using scipy and its cdist function to compute a distance matrix from an array of vectors.
import numpy as np
from scipy.spatial import distance
vectorList = [(0, 10), (4, 8), (9.0, 11.0), (14, 14), (16, 19), (25.5, 17.5), (35, 16)]
#Convert to numpy array
arr = np.array(vectorList)
#Computes distances matrix and set self-comparisons to NaN
d = distance.cdist(arr, arr)
np.fill_diagonal(d, None)
Let's say I want to return all the distances that are below a specific threshold (6 for example)
#Find pairs of vectors whose separation distance is < 6
id1, id2 = np.nonzero(d<6)
#id1 --> array([0, 1, 1, 2, 2, 3, 3, 4])
#id2 --> array([1, 0, 2, 1, 3, 2, 4, 3])
I now have 2 arrays of indices.
Question: how can I return the distances between these pairs of vectors as an array / list ?
4.47213595499958 #d[0][1]
4.47213595499958 #d[1][0]
5.830951894845301 #d[1][2]
5.830951894845301 #d[2][1]
5.830951894845301 #d[2][2]
5.830951894845301 #d[3][2]
5.385164807134504 #d[3][4]
5.385164807134504 #d[4][3]
d[id1][id2] returns a matrix, not a list, and the only way I found so far is to iterate over the distance matrix again which doesn't make sense.
np.array([d[i1][i2] for i1, i2 in zip(id1, id2)])

Use
d[id1, id2]
This is the form that numpy.nonzero example shows (i.e. a[np.nonzero(a > 3)]) which is different from the d[id1][id2] you are using.
See arrays.indexing for more details on numpy indexing.

Related

Calculate alpha values with torch mean?

I'm trying to calculate the alpha values as explained here.
I have as argument a tensor with shape (1, 512, 14, 14). To calculate alpha values I need to calculate the average of all dimensions except the channel dimension, so the output will have the shape (1, k, 1, 1) which is essentialy (k,).
How can I do this in PyTorch?
Thanks!
You could permute the first and second axis to keep the channel dimension on dim=0, then flatten all other dimensions, and lastly, take the mean on that new axis:
x.permute(1, 0, 2, 3).flatten(start_dim=1).mean(dim=1)
Here are the shapes, step by step:
>>> x.permute(1, 0, 2, 3).shape
(512, 1, 14, 14)
>>> x.permute(1, 0, 2, 3).flatten(start_dim=1).shape
(512, 1, 196)
>>> x.permute(1, 0, 2, 3).flatten(start_dim=1).mean(dim=1).shape
(512,)

How to calculate the outer product of two matrices A and B per rows faster in python (numpy)?

Let say we have two matrices A and B.
A has the shape (r, k) and B has the shape (r, l).
Now I want to calculate the np.outer product of these two matrices per rows. After the outer product I then want to sum all values in axis 0. So my result matrix should have the shape (k, l).
E.g.:
Form of A is (4, 2), of B is (4, 3).
import numpy as np
A = np.array([[0, 7], [4, 1], [0, 2], [0, 5]])
B = np.array([[9, 7, 7], [6, 7, 5], [2, 7, 9], [6, 9, 7]])
# This is the first outer product for the first values of A and B
print(np.outer(A[0], B[0])) # This will give me
# First possibility is to use list comprehension and then
sum1 = np.sum((np.outer(x, y) for x, y in zip(A, B)), axis=0)
# Second possibility would be to use the reduce function
sum2 = reduce(lambda sum, (x, y): sum+np.outer(x, y), zip(A, B), np.zeros((A.shape[1], B.shape[1])))
# result for sum1 or sum2 looks like this:
# array([[ 175., 156., 133.], [ 133., 131., 137.]])
I'm asking my self, is there a better or faster solution? Because when I have e.g. two matrices with more than 10.000 rows this takes some time.
Only using the np.outer function is not the solution, because np.outer(A, B) will give me a matrix with shape (8, 12) (this is not what I want).
Need this for neural networks backpropagation.
You could literally transfer the iterators as string notation to np.einsum -
np.einsum('rk,rl->kl',A,B)
Or with matrix-multiplication using np.dot -
A.T.dot(B)

How to use block_diag repeatedly

I have rather simple question but still couldn´t make it work.
I want a block diagonal n^2*n^2 matrix. The blocks are sparse n*n matrices with just the diagonal, first off diagonals and forth off diag. For the simple case of n=4 this can easily be done
datanew = ones((5,n1))
datanew[2] = -2*datanew[2]
diagsn = [-4,-1,0,1,4]
DD2 = sparse.spdiags(datanew,diagsn,n,n)
new = sparse.block_diag([DD2,DD2,DD2,DD2])
Since this only useful for small n's, is there a way better way to use block_diag? Thinking of n -> 1000
A simple way of constructing a long list of DD2 matrices, is with a list comprehension:
In [128]: sparse.block_diag([DD2 for _ in range(20)]).A
Out[128]:
array([[-2, 1, 0, ..., 0, 0, 0],
[ 1, -2, 1, ..., 0, 0, 0],
[ 0, 1, -2, ..., 0, 0, 0],
...,
[ 0, 0, 0, ..., -2, 1, 0],
[ 0, 0, 0, ..., 1, -2, 1],
[ 0, 0, 0, ..., 0, 1, -2]])
In [129]: _.shape
Out[129]: (80, 80)
At least in my version, block_diag wants a list of arrays, not *args:
In [133]: sparse.block_diag(DD2,DD2,DD2,DD2)
...
TypeError: block_diag() takes at most 3 arguments (4 given)
In [134]: sparse.block_diag([DD2,DD2,DD2,DD2])
Out[134]:
<16x16 sparse matrix of type '<type 'numpy.int32'>'
with 40 stored elements in COOrdinate format>
This probably isn't the fastest way to construct such a block diagonal array, but it's a start.
================
Looking at the code for sparse.block_mat I deduce that it does:
In [145]: rows=[]
In [146]: for i in range(4):
arow=[None]*4
arow[i]=DD2
rows.append(arow)
.....:
In [147]: rows
Out[147]:
[[<4x4 sparse matrix of type '<type 'numpy.int32'>'
with 10 stored elements (5 diagonals) in DIAgonal format>,
None,
None,
None],
[None,
<4x4 sparse matrix of type '<type 'numpy.int32'>'
...
None,
<4x4 sparse matrix of type '<type 'numpy.int32'>'
with 10 stored elements (5 diagonals) in DIAgonal format>]]
In other words, rows is a 'matrix' of None with DD2 along the diagonals. It then passes these to sparse.bmat.
In [148]: sparse.bmat(rows)
Out[148]:
<16x16 sparse matrix of type '<type 'numpy.int32'>'
with 40 stored elements in COOrdinate format>
bmat in turn collects the data,rows,cols from the coo format of all the input matricies, joins them into master arrays, and builds a new coo matrix from them.
So an alternative is to construct those 3 arrays directly.

Find minimum N elements in theano

I've got a theano function which computes euclidean distances for 2 matrices—X (n vectors x k features) and Y (m vectors x k features). The result is an n x m matrix of pairwise distances of each vector (or row) in X from each vector (or row) in Y.
import theano
from theano import tensor as T
X, Y = T.dmatrices('X', 'Y')
X_squared_sum = T.sum(X ** 2, axis=1, keepdims=True)
Y_squared_sum = T.sum(Y.T ** 2, axis=0, keepdims=True)
squared_distances = X_squared_sum + Y_squared_sum - 2 * T.dot(X, Y.T)
f_distance = theano.function([X, Y], T.sqrt(squared_distances))
Let's say I change the above function to accept a single vector, an array of vectors, and the number of smallest distances. What I want is a theano function that will find the N smallest distances, similar to below:
import numpy as np
import theano
from theano import tensor as T
X = T.dvector('X')
Y = T.dmatrix('Y')
N = T.iscalar('N')
X_squared_sum = T.dot(X, X)
Y_squared_sum = T.sum(Y.T ** 2, axis=0)
squared_distances = X_squared_sum + Y_squared_sum - 2 * T.dot(X, Y.T)
dist_sorted = T.FIND_N_SMALLEST(T.sqrt(squared_distances), N)
n_closest = theano.function([X, Y, N], dist_sorted)
U = np.array([[1, 1, 1, 1]])
V = np.array([
[ 4, 4, 4, 4],
[ 2, 2, 2, 2],
[ 3, 3, 3, 3],
[ 1, 1, 1, 1]])
n_closest(U, V, 2) # [0.0, 2.0]
I'd like to avoid explicitly sorting all the distances, since the number that I want will generally be much much smaller than the total number of distances.

A pythonic way how to find if a value is between two values in a list

Having a sorted list and some random value, I would like to find in which range the value is.
List goes like this: [0, 5, 10, 15, 20]
And value is, say 8.
The standard way would be to either go from start until we hit value that is bigger than ours (like in the example below), or to perform binary search.
grid = [0, 5, 10, 15, 20]
value = 8
result_index = 0
while result_index < len(grid) and grid[result_index] < value:
result_index += 1
print result_index
I am wondering if there is a more pythonic approach, as this although short, looks bit of an eye sore.
Thank you for your time!
>>> import bisect
>>> grid = [0, 5, 10, 15, 20]
>>> value = 8
>>> bisect.bisect(grid, value)
2
Edit:
bisect — Array bisection algorithm
for min, max in zip(grid, grid[1:]): # [(0, 5), (5, 10), (10, 15), (15, 20), (20, 25)]
if max <= value < min: #previously: if value in xrange(min, max):
return min, max
raise ValueError("value out of range")