Apply function to masked numpy array - python-2.7

I've got an image as numpy array and a mask for image.
from scipy.misc import face
img = face(gray=True)
mask = img > 250
How can I apply function to all masked elements?
def foo(x):
return int(x*0.5)

For that specific function, few approaches could be listed.
Approach #1 : You can use boolean indexing for in-place setting -
img[mask] = (img[mask]*0.5).astype(int)
Approach #2 : You can also use np.where for a possibly more intuitive solution -
img_out = np.where(mask,(img*0.5).astype(int),img)
With that np.where that has a syntax of np.where(mask,A,B), we are choosing between two equal shaped arrays A and B to produce a new array of the same shape as A and B. The selection is made based upon the elements in mask, which is again of the same shape as A and B. Thus for every True element in mask, we select A, otherwise B. Translating this to our case, A would be (img*0.5).astype(int) and B is img.
Approach #3 : There's a built-in np.putmask that seems to be the closest for this exact task and could be used to do in-place setting, like so -
np.putmask(img, mask, (img*0.5).astype('uint8'))

Related

determining whether two convex hulls overlap

I'm trying to find an efficient algorithm for determining whether two convex hulls intersect or not. The hulls consist of data points in N-dimensional space, where N is 3 up to 10 or so. One elegant algorithm was suggested here using linprog from scipy, but you have to loop over all points in one hull, and it turns out the algorithm is very slow for low dimensions (I tried it and so did one of the respondents). It seems to me the algorithm could be generalized to answer the question I am posting here, and I found what I think is a solution here. The authors say that the general linear programming problem takes the form Ax + tp >= 1, where the A matrix contains the points of both hulls, t is some constant >= 0, and p = [1,1,1,1...1] (it's equivalent to finding a solution to Ax > 0 for some x). As I am new to linprog() it isn't clear to me whether it can handle problems of this form. If A_ub is defined as on page 1 of the paper, then what is b_ub?
There is a nice explanation of how to do this problem, with an algorithm in R, on this website. My original post referred to the scipy.optimize.linprog library, but this proved to be insufficiently robust. I found that the SCS algorithm in the cvxpy library worked very nicely, and based on this I came up with the following python code:
import numpy as np
import cvxpy as cvxpy
# Determine feasibility of Ax <= b
# cloud1 and cloud2 should be numpy.ndarrays
def clouds_overlap(cloud1, cloud2):
# build the A matrix
cloud12 = np.vstack((-cloud1, cloud2))
vec_ones = np.r_[np.ones((len(cloud1),1)), -np.ones((len(cloud2),1))]
A = np.r_['1', cloud12, vec_ones]
# make b vector
ntot = len(cloud1) + len(cloud2)
b = -np.ones(ntot)
# define the x variable and the equation to be solved
x = cvxpy.Variable(A.shape[1])
constraints = [A*x <= b]
# since we're only determining feasibility there is no minimization
# so just set the objective function to a constant
obj = cvxpy.Minimize(0)
# SCS was the most accurate/robust of the non-commercial solvers
# for my application
problem = cvxpy.Problem(obj, constraints)
problem.solve(solver=cvxpy.SCS)
# Any 'inaccurate' status indicates ambiguity, so you can
# return True or False as you please
if problem.status == 'infeasible' or problem.status.endswith('inaccurate'):
return True
else:
return False
cube = np.array([[1,1,1],[1,1,-1],[1,-1,1],[1,-1,-1],[-1,1,1],[-1,1,-1],[-1,-1,1],[-1,-1,-1]])
inside = np.array([[0.49,0.0,0.0]])
outside = np.array([[1.01,0,0]])
print("Clouds overlap?", clouds_overlap(cube, inside))
print("Clouds overlap?", clouds_overlap(cube, outside))
# Clouds overlap? True
# Clouds overlap? False
The area of numerical instability is when the two clouds just touch, or are arbitrarily close to touching such that it isn't possible to definitively say whether they overlap or not. That is one of the cases where you will see this algorithm report an 'inaccurate' status. In my code I chose to consider such cases overlapping, but since it is ambiguous you can decide for yourself what to do.

How to Use MCMC with a Custom Log-Probability and Solve for a Matrix

The code is in PyMC3, but this is a general problem. I want to find which matrix (combination of variables) gives me the highest probability. Taking the mean of the trace of each element is meaningless because they depend on each other.
Here is a simple case; the code uses a vector rather than a matrix for simplicity. The goal is to find a vector of length 2, where the each value is between 0 and 1, so that the sum is 1.
import numpy as np
import theano
import theano.tensor as tt
import pymc3 as mc
# define a theano Op for our likelihood function
class LogLike_Matrix(tt.Op):
itypes = [tt.dvector] # expects a vector of parameter values when called
otypes = [tt.dscalar] # outputs a single scalar value (the log likelihood)
def __init__(self, loglike):
self.likelihood = loglike # the log-p function
def perform(self, node, inputs, outputs):
# the method that is used when calling the Op
theta, = inputs # this will contain my variables
# call the log-likelihood function
logl = self.likelihood(theta)
outputs[0][0] = np.array(logl) # output the log-likelihood
def logLikelihood_Matrix(data):
"""
We want sum(data) = 1
"""
p = 1-np.abs(np.sum(data)-1)
return np.log(p)
logl_matrix = LogLike_Matrix(logLikelihood_Matrix)
# use PyMC3 to sampler from log-likelihood
with mc.Model():
"""
Data will be sampled randomly with uniform distribution
because the log-p doesn't work on it
"""
data_matrix = mc.Uniform('data_matrix', shape=(2), lower=0.0, upper=1.0)
# convert m and c to a tensor vector
theta = tt.as_tensor_variable(data_matrix)
# use a DensityDist (use a lamdba function to "call" the Op)
mc.DensityDist('likelihood_matrix', lambda v: logl_matrix(v), observed={'v': theta})
trace_matrix = mc.sample(5000, tune=100, discard_tuned_samples=True)
If you only want the highest likelihood parameter values, then you want the Maximum A Posteriori (MAP) estimate, which can be obtained using pymc3.find_MAP() (see starting.py for method details). If you expect a multimodal posterior, then you will likely need to run this repeatedly with different initializations and select the one that obtains the largest logp value, but that still only increases the chances of finding the global optimum, though cannot guarantee it.
It should be noted that at high parameter dimensions, the MAP estimate is usually not part of the typical set, i.e., it is not representative of typical parameter values that would lead to the observed data. Michael Betancourt discusses this in A Conceptual Introduction to Hamiltonian Monte Carlo. The fully Bayesian approach is to use posterior predictive distributions, which effectively averages over all the high-likelihood parameter configurations rather than using a single point estimate for parameters.

keras custom loss function

I'm new to Keras framework and I want to implement the following loss function of
Root Mean Squared Logarithmic Error
Here is my code for Keras with tensorflow backend
def loss_function(y_true, y_pred):
ones = K.ones(shape=K.shape(y_pred).shape)
y_pred = tf.add(y_pred,ones)
y_true = tf.add(y_true,ones)
val = K.sqrt(K.mean(K.sum(K.log(y_pred)-K.log(y_true))))
return val
But I end up getting the following error:
ValueError: Error when checking input: expected dense_1_input to have shape (None, 16) but got array with shape (1312779, 11)
with the val returned to be 0.
The order of your operations is inverted.
Since "log(true) - log(pred)" can be either negative or positive (the result may be a little higher or a little lower than the expected), the square is the first thing that must happen. (The square is responsible for eliminating the negative signs).
And the mean is the last one (the most external), because you want first to compute the error for each element, and only after that you get the mean of the error. (The mean function already carries the sum function in it).
So:
def loss_function(y_true, y_pred):
y_pred = y_pred + 1
y_true = y_true + 1
return K.mean(K.square(K.log(y_pred)-K.log(y_true)))
Please note that this does not carry the "root" part. If you want to add it, I'd say that the root should go before the mean (different from the formula in the picture)
I'd use this instead:
return K.mean(K.sqrt(K.square(K.log(y_pred)-K.log(y_true))))
Make sure that your model ends with an activation that outputs numbers greater or equal to zero:
Relu is ok
Sigmoid is ok
Softmax is ok
Other activations may have negative values and will bring errors with log:
linear is not ok
tanh is not ok

method to circumvent if-else checking

I am just wondering if I want to implement a program that converts wavelength to (r,g,b) using the algorithm in this paper (p5-6): http://www.scientificbulletin.upb.ro/rev_docs_arhiva/full49129.pdf, instead of checking the value of wavelength using if-else like
if wavelength>380 and wavelength<410:
# do something
elif wavelength<440:
# do something
elif wavelength<490:
# do something, and so on
Are there some genius method to avoid using if-else statement so that I can speed up the code? More specifically, suppose I store the wavelengths in a list or a numpy array, is it possible to have some sort of 'vectorized' method to generate the (r,g,b) values?
Yes, there are. If you have your wavelengths in a numpy array you can use boolean masks instead of the if ... elif ... clauses.
For your second question about the vectorized operation ... I think you want something like this:
wavelengths = np.array([1,2,3])
conversion = np.array([-0.41,0,0.6]).reshape(3,1) # R, G, B Parts
wavelengths * conversion
# Reshape is needed to get a 3x3 result
array([[-0.41, -0.82, -1.23],
[ 0. , 0. , 0. ],
[ 0.6 , 1.2 , 1.8 ]])
The given formulas are a bit more complicated than this example but StackOverflow isn't about writing you the code. I think with the example you should be able to implement these formulas.

Theano get unique values in a tensor

I have a tensor which I convert into a vector by flattening, now I want to remove the duplicate values in this vector. How can I do this? What is equivalent for numpy.unique() in theano?
x1 = T.itensor3('x1')
y1 = T.flatten(x1)
#z1 = T.unique() How do I do this?
For e.g. my tensor may be : [1,1,2,3,3,4,4,5,1,3,4]
and I want : [1,2,3,4,5]
EDIT: this is now available in Theano: http://deeplearning.net/software/theano/library/tensor/extra_ops.html#theano.tensor.extra_ops.Unique
This question was also asked on theano-user mailing list. The conclusion is that this is one of the function NumPy function that isn't wrapped in Theano. As he don't need the grad, it can be rapidly wrapped. Here is an example who expect the outputs to be the same as the input.
from theano.compile.ops import as_op
#as_op(itypes=[theano.tensor.imatrix],
otypes=[theano.tensor.imatrix])
def numpy_unique(a):
return numpy.unique(a)
More doc about as_op is available here: http://deeplearning.net/software/theano/tutorial/extending_theano.html#as-op-example