Vectorized version of array calculation - python-2.7

Is there a way of vectorizing the following array calculation (i.e. without using for loops):
for i in range(numCells):
z[i] = ((i_mask == i)*s_image).sum()/pixel_counts[i]
s_image is an image stored as a 2-dimensional ndarray (I removed the colour dimension here for simplicity). i_mask is also a 2-dimensional array of the same size as s_image but it contains integers which are indexes to a list of 'cells' of length numCells. The result, z, is a 1-dimensional array of length numCells. The purpose of the calculation is to sum all the pixel values where the mask contains the same index and put the results in the z vector. (pixel_counts is also a 1-dimensional array of length numCells).

As one vectorized approach, you can take advantage of broadcasting and matrix-multiplication, like so -
# Generate a binary array of matches for all elements in i_mask against
# an array of indices going from 0 to numCells
matches = i_mask.ravel() == np.arange(numCells)[:,None]
# Do elementwise multiplication against s_image and sum those up for
# each such index going from 0 to numCells. This is essentially doing
# matix multiplicatio. Finally elementwise divide by pixel_counts
out = matches.dot(s_image.ravel())/pixel_counts
Alternatively, as another vectorized approach, you can do those multiplication and summation with np.einsum as well, which might give a boost to the performance, like so -
out = np.einsum('ij,j->i',matches,s_image.ravel())/pixel_counts
Runtime tests -
Function definitions:
def vectorized_app1(s_image,i_mask,pixel_counts):
matches = i_mask.ravel() == np.arange(numCells)[:,None]
return matches.dot(s_image.ravel())/pixel_counts
def vectorized_app2(s_image,i_mask,pixel_counts):
matches = i_mask.ravel() == np.arange(numCells)[:,None]
return np.einsum('ij,j->i',matches,s_image.ravel())/pixel_counts
def org_app(s_image,i_mask,pixel_counts):
z = np.zeros(numCells)
for i in range(numCells):
z[i] = ((i_mask == i)*s_image).sum()/pixel_counts[i]
return z
Timings:
In [7]: # Inputs
...: numCells = 100
...: m,n = 100,100
...: pixel_counts = np.random.rand(numCells)
...: s_image = np.random.rand(m,n)
...: i_mask = np.random.randint(0,numCells,(m,n))
...:
In [8]: %timeit org_app(s_image,i_mask,pixel_counts)
100 loops, best of 3: 8.13 ms per loop
In [9]: %timeit vectorized_app1(s_image,i_mask,pixel_counts)
100 loops, best of 3: 7.76 ms per loop
In [10]: %timeit vectorized_app2(s_image,i_mask,pixel_counts)
100 loops, best of 3: 4.08 ms per loop

Here is my solution (with all three colours handled). Not sure how efficient this is. Anyone got a better solution?
import numpy as np
import pandas as pd
# Unravel the mask matrix into a 1-d array
i = np.ravel(i_mask)
# Unravel the image into 1-d arrays for
# each colour (RGB)
r = np.ravel(s_image[:,:,0])
g = np.ravel(s_image[:,:,1])
b = np.ravel(s_image[:,:,2])
# prepare a dictionary to create the dataframe
data = {'i' : i, 'r' : r, 'g' : g, 'b' : b}
# create a dataframe
df = pd.DataFrame(data)
# Use pandas pivot table to average the colour
# intensities for each cell index value
pixAvgs = pd.pivot_table(df, values=['r', 'g', 'b'], index='i')
pixAvgs.head()
Output:
b g r
i
-1 26.719482 68.041868 101.603297
0 75.432432 170.135135 202.486486
1 92.162162 184.189189 208.270270
2 71.179487 171.897436 201.846154
3 76.026316 178.078947 211.605263

In the end I solved this problem a different way and it drastically increased the speed. Instead of using i_mask as above, a 2-dimensional array of indices into the 1-d array of output intensities, z, I created a different array, mask1593, of dimensions (numCells x 45). Each row is a list of about 35 to 45 indices into the flattened 256x256 pixel image (0 to 65536).
In [10]: mask1593[0]
Out[10]:
array([14853, 14854, 15107, 15108, 15109, 15110, 15111, 15112, 15363,
15364, 15365, 15366, 15367, 15368, 15619, 15620, 15621, 15622,
15623, 15624, 15875, 15876, 15877, 15878, 15879, 15880, 16131,
16132, 16133, 16134, 16135, 16136, 16388, 16389, 16390, 16391,
16392, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
Then I was able to achieve the same transformation as follows using numpy's advanced indexing:
def convert_image(self, image_array):
"""Convert 256 x 256 RGB image array to 1593 RGB led intensities."""
global mask1593
shape = image_array.shape
img_data = image_array.reshape(shape[0]*shape[1], shape[2])
return np.mean(img_data[mask1593], axis=1)
And here is the result! A 256x256 pixel colour image transformed into an array of 1593 colours for display on this irregular LED display:

Related

skimage.feature.greycomatrix only producing diagonal values

I am attempting to produce glcm on a trend-reduced digital elevation model. My current problem is that the output of skimage.feature.greycomatrix(image) only contains values in the diagonal entries of the matrix.
glcm = greycomatrix(image,distances=[1],levels=100,angles=[0] ,symmetric=True,normed=True)
The image is quantized prior with the following code:
import numpy as np
from skimage.feature import greycomatrix
def quantize(raster):
print("\n Quantizing \n")
raster += (np.abs(np.min(raster)) + 1)
mean = np.nanmean(raster.raster[raster.raster > 0])
std = np.nanstd(raster.raster[raster.raster > 0])
raster[raster == None] = 0 # set all None values to 0
raster[np.isnan(raster)] = 0
raster[raster > (mean + 1.5*std)] = 0
raster[raster < (mean - 1.5*std)] = 0 # High pass filter
raster[raster > 0] = raster[raster > 0] - (np.min(raster[raster > 0]) - 1)
raster[raster>101] = 0
raster = np.rint(raster)
flat = np.ndarray.flatten(raster[raster > 0])
range = np.max(flat) - np.min(flat)
print("\n\nRaster Range: {}\n\n".format(range))
raster = raster.astype(np.uint8)
raster[raster > 101] = 0
How would I go about making the glcm compute values outside of the diagonal matrix (i.e. just the frequencies of the values themselves), and is there something fundamentally wrong with my approach?
If pixel intensities are correlated in an image, the co-occurrence of two similar levels is highly probable, and therefore the nonzero elements of the corresponding GLCM will concentrate around the main diagonal. In contrast, if pixel intensities are uncorrelated the nonzero elements of the GLCM will be spread all over the matrix. The following example makes this apparent:
import numpy as np
from skimage import data
import matplotlib.pyplot as plt
from skimage.feature import greycomatrix
x = data.brick()
y = data.gravel()
mx = greycomatrix(x, distances=[1], levels=256, angles=[0], normed=True)
my = greycomatrix(y, distances=[1], levels=256, angles=[0], normed=True)
fig, ax = plt.subplots(2, 2, figsize=(12, 8))
ax[0, 0].imshow(x, cmap='gray')
ax[0, 1].imshow(mx[:, :, 0, 0])
ax[1, 0].imshow(y, cmap='gray')
ax[1, 1].imshow(my[:, :, 0, 0])
Despite I haven't seen your raster image I'm guessing that the intensity changes very smoothly across the image returned by quantize, and hence the GLCM is mostly diagonal.

Finding random index with specific value in large numpy array

I have a very large 2D numpy array (~5e8 values). I have labeled that array using scipy.ndimage.label I then want to find a random index of the flattened array that contains each label. I can do this with:
import numpy as np
from scipy.ndimage import label
base_array = np.random.randint(0, 5, (100000, 5000))
labeled_array, nlabels = label(base_array)
for label_num in xrange(1, nlabels+1):
indices = np.where(labeled_array.flat == label_num)[0]
index = np.random.choice(indices)
But, it is slow with an array this large. I have also tried replacing the np.where with:
indices = np.argwhere(labeled_array.flat == label).squeeze()
And found it to be slower. I have a suspicion that the boolean masking is the slow part. Is there anyway to speed this up, or a better way to do this. I will say in my real application the array is fairly sparse with about 25% fill, though I have no experience with scipy's sparse array functions.
Your suspicion that masking separately for each label is expensive is correct, because no matter how you do it the masking will always be O(n).
We can circumvent this by argsorting by label and then randomly picking from each block of equal labels.
Since the labels are an integer range we can get the argsort cheaper than np.argsort by using some sparse matrix machinery available in scipy.
As my machine doesn't have an awful lot of ram I had to shrink your example a bit (factor 4). It then runs in about 5 seconds.
import numpy as np
from scipy.ndimage import label
from scipy import sparse
def multi_randint(bins):
"""draw one random int from each range(bins[i], bins[i+1])"""
high = np.diff(bins)
n = high.size
pick = np.random.randint(0, 1<<30, (n,))
reject = np.flatnonzero(pick + (1<<30) % high >= (1<<30))
while reject.size:
npick = np.random.randint(0, 1<<30, (reject.size,))
rejrej = npick + (1<<30) % sizes[reject] >= (1<<30)
pick[reject] = npick
reject = reject[rejrej]
return bins[:-1] + pick % high
# build mock data, note that I had to shrink by 4x b/c memory
base_array = np.random.randint(0, 5, (50000, 2500), dtype=np.int8)
labeled_array, nlabels = label(base_array)
# build auxiliary sparse matrix
h = sparse.csr_matrix(
(np.ones(labeled_array.size, bool), labeled_array.ravel(),
np.arange(labeled_array.size+1, dtype=np.int32)),
(labeled_array.size, nlabels+1))
# conversion to csc argsorts the labels (but cheaper than argsort)
h = h.tocsc()
# draw
result = h.indices[multi_randint(h.indptr)]
# check result
assert len(set(labeled_array.ravel()[result])) == nlabels+1

Truth error while trying to find all equal minimum values in array, then retrieve indices

I'm trying to find all the minimum values in an array and retrieve their indices.
import numpy as np
a = np.array([[1,2],[1,4]])
minE = np.min(a)
ax,ay = np.unravel_index(minE, a.shape)
only returns minE = 1, ax, ay = 0 1
Can anyone help me out in a way that would also provide indices for all equal value minima (here, indices for both 1's)?
Were you looking for this:
x = np.array([[1,2,3],[1,4,2]])
np.where(x == np.amin(x))

Access indices from 2D list - Python

I'm trying to access a list of indices from a 2D list with the following error. Basically I want to find where my data is between two values, and set a 'weights' array to 1.0 to use for a later calculation.
#data = numpy array of size (141,141)
weights = np.zeros([141,141])
ind = [x for x,y in enumerate(data) if y>40. and y<50.]
weights[ind] = 1.0
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I've tried using np.extract() but that doesn't give indices...
Think I got it to work doing this:
#data = numpy array of size (141,141)
weights = np.zeros([141,141])
ind = ((data > 40.) & (data < 50.)).astype(float)
weights[np.where(ind==1)]=1.0
thanks to the helpful comment about using numpy's vectorizing capability. The third line outputs an array of size (141,141) of 1's where the conditions are met, and 0's where it fails. Then I filled my 'weights' array with 1.0s at those locations.
If you need to fill weights with ( (value - 40) / 10), then Use numpy.ma is better:
data = np.random.uniform(0, 100, size=(141, 141))
weights = ((np.ma.masked_outside(data, 40, 50) - 40) / 10).filled(0)

Fastest way to sample sequences from a non-uniform distribution in numpy

I want to generate a non-uniform random sample of a sequence of two elements, as made by numpy.choice()
e.g. I have the proportion p=[0.1, 0, 0.3, 0.6, 0] of elements e=[0,1,2,3,4] (elements are here identified by there indices)
I want a sample of 3 sequences of two elements drawn from those proportions:
[03,23,32]
Here, we first drew 0 by sampling the element indexed zero which represents 10% of the total elements, we then drew 3 by sampling the element of index 3 which represents 60% of the elements : those two elements together result in a sequence '03'
if you can find an integer N, that for every element x in p, that x*N is an integer, then you can:
p=[0.1, 0, 0.3, 0.6, 0]
N = 10
nums = []
for i, x in enumerate(p):
nums.extend([i] * int(x*N))
import random
random.choice(nums)
If N is very large, then you can cumsum the proportion, and use bisect to search a random number between 0 and 1:
cp = []
s = 0
for x in p:
s += x
cp.append(s)
import bisect
[bisect.bisect_left(cp, random.random()) for i in range(10)]
If you use numpy:
import numpy as np
cp = np.cumsum(p)
np.searchsorted(cp, np.random.rand(10))
or:
np.random.choice(range(5), 10, p=p)