Index numpy arrays columns by another numpy array - python-2.7

I am trying to index a 2d matrix in numpy so that I can get all rows but only particular columns given by another numpy array. It's something as following:
a = [0,1,1,2,0,2,1]
d = [[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]]
I want to get all rows from d such that column is given by a. So for above example I want,
t = [1,2,2,3,1,3,2]
I tried some of the methods given on numpy documentation but am not able to get it.
I think this is doable in matlab without any iteration. Can I do this is python without looping over something?

This can be done with advanced indexing:
>>> a = numpy.array([0, 1, 1, 2, 0, 2, 1])
>>> d = numpy.array([[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]])
>>> d[numpy.arange(d.shape[0]), a]
array([1, 2, 2, 3, 1, 3, 2])
For arrays a, b, and c where b and c have integer dtype and b.shape == c.shape, advanced indexing d = a[b, c] gives d[i] == a[b[i], c[i]].

Related

Solving system of equations in sympy with matrix variables

I am looking for a matrix that solves a complicated system of equations; i.e., it would be hard to flatten the equations into vector form. Here is a toy example showing the error that I'm getting:
from sympy import nsolve, symbols, Inverse
from sympy.polys.polymatrix import PolyMatrix
import numpy as np
import itertools as itr
nnodes = 2
nodes = list(range(nnodes))
u_mat = PolyMatrix([symbols(f'u{i}{j}') for i, j in itr.product(nodes, nodes)]).reshape(2, 2)
u_mat_inv = Inverse(u_mat)
equations = [
u_mat_inv[0, 0] - 1,
u_mat_inv[0, 1] - 0,
u_mat_inv[1, 0] - 0,
u_mat_inv[1, 1] - 1
]
s = nsolve(equations, u_mat, np.ones(4))
This raises the following error:
TypeError: X must be a row or a column matrix
Is there a way around this without having to write the equations in vector form?
I think nsolve is getting confused because u_mat is a matrix. Passing list(u_mat) gives the input as expected by nsolve. The next problem is your choice of initial guess is a singularity of the system of equations.
You can use normal solve here though:
In [24]: solve(equations, list(u_mat))
Out[24]: [(1, 0, 0, 1)]

Comparison of Lists for the 2048 game

def helper(mat):
for row in mat:
zero_list = []
for subrow in row:
if subrow == 0:
row.remove(0)
zero_list.append(0)
row.extend(zero_list)
return mat
def merge_left(mat):
result = mat.copy()
helper(mat)
counter = 0
for i in range(len(mat)):
current_tile = 0
for j in range(len(mat)):
if mat[i][j] == current_tile:
mat[i][j-1] *= 2
mat[i][j] = 0
counter += mat[i][j-1]
current_tile = mat[i][j]
helper(mat)
return result == mat
print(merge_left([[2, 2, 0, 2], [4, 0, 0, 0], [4, 8, 0, 4], [0, 0, 0, 2]]))
Hey guys,
The result I get for merge_left in the above code is True for the test case.
Given that result is a duplicate copy of mat.
How is it so that result has also been altered in a similar way to mat through this code?
I'd understand this to be the case if I had written
result = mat instead of result = mat.copy()
Why is this the case? I'm aiming to compare the two states of the input mat. Before the code alters mat and after it does.
list.copy() only clones the outer list. The inner lists are still aliases, so modifying one of them modifies result and mat. Here's a minimal reproduction of the problem:
>>> x = [[1, 2]]
>>> y = x.copy()
>>> y[0][0] += 1
>>> y
[[2, 2]]
>>> x
[[2, 2]]
You can use [row[:] for row in mat] to deep copy each row within the matrix. Slicing and copy are pretty much the same.
You can also use copy.deepcopy, but it's overkill for this.
Also, row.remove(0) while iterating over row, as with iterating over any list while adding or removing elements from it, is very likely a bug. Consider a redesign or use for subrow in row[:]: at minimum.

Numpy - row-wise outer product of two matrices

I have two numpy arrays: A of shape (b, i) and B of shape (b, o). I would like to compute an array R of shape (b, i, o) where every line l of R contains the outer product of the row l of A and the row l of B. So far what i have is:
import numpy as np
A = np.ones((10, 2))
B = np.ones((10, 6))
R = np.asarray([np.outer(a, b) for a, b in zip(A, B)])
assert R.shape == (10, 2, 6)
I think this method is too slow, because of the zip and the final transformation into a numpy array.
Is there a more efficient way to do it ?
That is possible with numpy.matmul, which can do multiplication of "matrix stacks". In this case we want to multiply a stack of column vectors with a stack of row vectors. First bring matrix A to shape (b, i, 1) and B to shape (b, 1, o). Then use matmul to perform b times the outer product:
import numpy as np
i, b, o = 3, 4, 5
A = np.ones((b, i))
B = np.ones((b, o))
print(np.matmul(A[:, :, np.newaxis], B[:, np.newaxis, :]).shape) # (4, 3, 5)
An alternative could be to use numpy.einsum, which can directly represent your index notation:
np.einsum('bi,bo->bio', A, B)
Why not simply
A[:, :, None] * B[:, None, :]
Depending on your convention and your dtype, you might need to throw in another np.conj somewhere. Note that np.newaxis is simply None

Method for evaluating the unit vector ( or normalising a vector ) in Python or in the numerical libraries: numpy, scipy [duplicate]

I would like to convert a NumPy array to a unit vector. More specifically, I am looking for an equivalent version of this normalisation function:
def normalize(v):
norm = np.linalg.norm(v)
if norm == 0:
return v
return v / norm
This function handles the situation where vector v has the norm value of 0.
Is there any similar functions provided in sklearn or numpy?
If you're using scikit-learn you can use sklearn.preprocessing.normalize:
import numpy as np
from sklearn.preprocessing import normalize
x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = normalize(x[:,np.newaxis], axis=0).ravel()
print np.all(norm1 == norm2)
# True
I agree that it would be nice if such a function were part of the included libraries. But it isn't, as far as I know. So here is a version for arbitrary axes that gives optimal performance.
import numpy as np
def normalized(a, axis=-1, order=2):
l2 = np.atleast_1d(np.linalg.norm(a, order, axis))
l2[l2==0] = 1
return a / np.expand_dims(l2, axis)
A = np.random.randn(3,3,3)
print(normalized(A,0))
print(normalized(A,1))
print(normalized(A,2))
print(normalized(np.arange(3)[:,None]))
print(normalized(np.arange(3)))
This might also work for you
import numpy as np
normalized_v = v / np.sqrt(np.sum(v**2))
but fails when v has length 0.
In that case, introducing a small constant to prevent the zero division solves this.
As proposed in the comments one could also use
v/np.linalg.norm(v)
To avoid zero division I use eps, but that's maybe not great.
def normalize(v):
norm=np.linalg.norm(v)
if norm==0:
norm=np.finfo(v.dtype).eps
return v/norm
If you have multidimensional data and want each axis normalized to its max or its sum:
def normalize(_d, to_sum=True, copy=True):
# d is a (n x dimension) np array
d = _d if not copy else np.copy(_d)
d -= np.min(d, axis=0)
d /= (np.sum(d, axis=0) if to_sum else np.ptp(d, axis=0))
return d
Uses numpys peak to peak function.
a = np.random.random((5, 3))
b = normalize(a, copy=False)
b.sum(axis=0) # array([1., 1., 1.]), the rows sum to 1
c = normalize(a, to_sum=False, copy=False)
c.max(axis=0) # array([1., 1., 1.]), the max of each row is 1
If you don't need utmost precision, your function can be reduced to:
v_norm = v / (np.linalg.norm(v) + 1e-16)
You mentioned sci-kit learn, so I want to share another solution.
sci-kit learn MinMaxScaler
In sci-kit learn, there is a API called MinMaxScaler which can customize the the value range as you like.
It also deal with NaN issues for us.
NaNs are treated as missing values: disregarded in fit, and maintained
in transform. ... see reference [1]
Code sample
The code is simple, just type
# Let's say X_train is your input dataframe
from sklearn.preprocessing import MinMaxScaler
# call MinMaxScaler object
min_max_scaler = MinMaxScaler()
# feed in a numpy array
X_train_norm = min_max_scaler.fit_transform(X_train.values)
# wrap it up if you need a dataframe
df = pd.DataFrame(X_train_norm)
Reference
[1] sklearn.preprocessing.MinMaxScaler
There is also the function unit_vector() to normalize vectors in the popular transformations module by Christoph Gohlke:
import transformations as trafo
import numpy as np
data = np.array([[1.0, 1.0, 0.0],
[1.0, 1.0, 1.0],
[1.0, 2.0, 3.0]])
print(trafo.unit_vector(data, axis=1))
If you work with multidimensional array following fast solution is possible.
Say we have 2D array, which we want to normalize by last axis, while some rows have zero norm.
import numpy as np
arr = np.array([
[1, 2, 3],
[0, 0, 0],
[5, 6, 7]
], dtype=np.float)
lengths = np.linalg.norm(arr, axis=-1)
print(lengths) # [ 3.74165739 0. 10.48808848]
arr[lengths > 0] = arr[lengths > 0] / lengths[lengths > 0][:, np.newaxis]
print(arr)
# [[0.26726124 0.53452248 0.80178373]
# [0. 0. 0. ]
# [0.47673129 0.57207755 0.66742381]]
If you want to normalize n dimensional feature vectors stored in a 3D tensor, you could also use PyTorch:
import numpy as np
from torch import FloatTensor
from torch.nn.functional import normalize
vecs = np.random.rand(3, 16, 16, 16)
norm_vecs = normalize(FloatTensor(vecs), dim=0, eps=1e-16).numpy()
If you're working with 3D vectors, you can do this concisely using the toolbelt vg. It's a light layer on top of numpy and it supports single values and stacked vectors.
import numpy as np
import vg
x = np.random.rand(1000)*10
norm1 = x / np.linalg.norm(x)
norm2 = vg.normalize(x)
print np.all(norm1 == norm2)
# True
I created the library at my last startup, where it was motivated by uses like this: simple ideas which are way too verbose in NumPy.
Without sklearn and using just numpy.
Just define a function:.
Assuming that the rows are the variables and the columns the samples (axis= 1):
import numpy as np
# Example array
X = np.array([[1,2,3],[4,5,6]])
def stdmtx(X):
means = X.mean(axis =1)
stds = X.std(axis= 1, ddof=1)
X= X - means[:, np.newaxis]
X= X / stds[:, np.newaxis]
return np.nan_to_num(X)
output:
X
array([[1, 2, 3],
[4, 5, 6]])
stdmtx(X)
array([[-1., 0., 1.],
[-1., 0., 1.]])
For a 2D array, you can use the following one-liner to normalize across rows. To normalize across columns, simply set axis=0.
a / np.linalg.norm(a, axis=1, keepdims=True)
If you want all values in [0; 1] for 1d-array then just use
(a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
Where a is your 1d-array.
An example:
>>> a = np.array([0, 1, 2, 4, 5, 2])
>>> (a - a.min(axis=0)) / (a.max(axis=0) - a.min(axis=0))
array([0. , 0.2, 0.4, 0.8, 1. , 0.4])
Note for the method. For saving proportions between values there is a restriction: 1d-array must have at least one 0 and consists of 0 and positive numbers.
A simple dot product would do the job. No need for any extra package.
x = x/np.sqrt(x.dot(x))
By the way, if the norm of x is zero, it is inherently a zero vector, and cannot be converted to a unit vector (which has norm 1). If you want to catch the case of np.array([0,0,...0]), then use
norm = np.sqrt(x.dot(x))
x = x/norm if norm != 0 else x

Python - Subtracting elements in lists of tuples

I have two lists of tuples.
x = [(A1, B1, C1), (A2, B2, C2),...(AN, BN, CN)]
and
y = [(A1_, B1_, C1_), (A2_, B2_, C2_),...(AN_, BN_, CN_)]
I want to do the following things:
Obtain a new list[(A1, B1, C1 - C1_), (A2, B2, C2 -
C2_),...(AN, BN, CN - CN_)]
And from there, create a list that
solely consists of [C1 - C1_, C2 - C2_,...]
I'd venture to say that something in Numpy would allow me to do this, but I still have not dug up how to just do an operation on one element in a tuple, so I would appreciate any possible help.
Thanks.
If you start with x and y being a list of tuples, then it is easy to convert them to 2D NumPy arrays:
import numpy as np
x = np.array([(1,2,3), (4,5,6), (7,8,9)])
y = np.array([(10,20,30), (40,50,60), (70,80,90)])
Then to create an array similar to
[(A1, B1, C1 - C1_), (A2, B2, C2 - C2_),...(AN, BN, CN - CN_)]
you could do this:
z = x[:] # make a copy of array x
z[:,2] -= y[:,2] # subtract the 3rd column of y from z
print(z)
yields
[[ 1 2 -27]
[ 4 5 -54]
[ 7 8 -81]]
and to get
[C1 - C1_, C2 - C2_,...]
you could either use z[:, 2] or obtain it directly from x and y using x[:, 2] - y[:, 2]:
[-27 -54 -81]
I might be misunderstanding your question, but when you say "I still have not dug up how to just do an operation on one element in a tuple" it makes me think you might be storing tuples in a NumPy array. If that's true, then I'd urge you to reconsider the way you are using NumPy:
You see, when you use dtype=object to store Python objects in a NumPy array (such as a tuple), then all operations done on these objects ultimately involve calls to Python functions, rather than the faster C/Fortran compiled functions that NumPy normally calls.
Thus, while you may enjoy NumPy syntax for selecting items in the array, you do not gain any speed advantage over plain Python objects. In fact, it can be slower than using plain Python objects (such as a list of tuples).
For this reason, I would recommend avoiding storing Python objects in NumPy arrays whenever possible, and especially when those objects are numerical, since NumPy's native numeric dtypes serve much better.
Instead of storing 3-tuples in an array, it would be better to add an extra dimension (a so-called "axis") to the NumPy array and store the 3 components along this axis.
Once you do that, the numerical calculation you contemplate is a piece of cake. It could be done with something like:
x[:,2]-y[:,2]
Without numpy:
>>> x = [(1,2,3), (4,5,6), (7,8,9)]
>>> y = [(10,20,30), (40,50,60), (70,80,90)]
>>> [ (a[0], a[1], a[2] - b[2]) for a, b in zip(x, y) ]
[(1, 2, -27), (4, 5, -54), (7, 8, -81)]
>>> [ a[2] - b[2] for a, b in zip(x, y) ]
[-27, -54, -81]