I am just wondering if I want to implement a program that converts wavelength to (r,g,b) using the algorithm in this paper (p5-6): http://www.scientificbulletin.upb.ro/rev_docs_arhiva/full49129.pdf, instead of checking the value of wavelength using if-else like
if wavelength>380 and wavelength<410:
# do something
elif wavelength<440:
# do something
elif wavelength<490:
# do something, and so on
Are there some genius method to avoid using if-else statement so that I can speed up the code? More specifically, suppose I store the wavelengths in a list or a numpy array, is it possible to have some sort of 'vectorized' method to generate the (r,g,b) values?
Yes, there are. If you have your wavelengths in a numpy array you can use boolean masks instead of the if ... elif ... clauses.
For your second question about the vectorized operation ... I think you want something like this:
wavelengths = np.array([1,2,3])
conversion = np.array([-0.41,0,0.6]).reshape(3,1) # R, G, B Parts
wavelengths * conversion
# Reshape is needed to get a 3x3 result
array([[-0.41, -0.82, -1.23],
[ 0. , 0. , 0. ],
[ 0.6 , 1.2 , 1.8 ]])
The given formulas are a bit more complicated than this example but StackOverflow isn't about writing you the code. I think with the example you should be able to implement these formulas.
Related
I'm trying to find an efficient algorithm for determining whether two convex hulls intersect or not. The hulls consist of data points in N-dimensional space, where N is 3 up to 10 or so. One elegant algorithm was suggested here using linprog from scipy, but you have to loop over all points in one hull, and it turns out the algorithm is very slow for low dimensions (I tried it and so did one of the respondents). It seems to me the algorithm could be generalized to answer the question I am posting here, and I found what I think is a solution here. The authors say that the general linear programming problem takes the form Ax + tp >= 1, where the A matrix contains the points of both hulls, t is some constant >= 0, and p = [1,1,1,1...1] (it's equivalent to finding a solution to Ax > 0 for some x). As I am new to linprog() it isn't clear to me whether it can handle problems of this form. If A_ub is defined as on page 1 of the paper, then what is b_ub?
There is a nice explanation of how to do this problem, with an algorithm in R, on this website. My original post referred to the scipy.optimize.linprog library, but this proved to be insufficiently robust. I found that the SCS algorithm in the cvxpy library worked very nicely, and based on this I came up with the following python code:
import numpy as np
import cvxpy as cvxpy
# Determine feasibility of Ax <= b
# cloud1 and cloud2 should be numpy.ndarrays
def clouds_overlap(cloud1, cloud2):
# build the A matrix
cloud12 = np.vstack((-cloud1, cloud2))
vec_ones = np.r_[np.ones((len(cloud1),1)), -np.ones((len(cloud2),1))]
A = np.r_['1', cloud12, vec_ones]
# make b vector
ntot = len(cloud1) + len(cloud2)
b = -np.ones(ntot)
# define the x variable and the equation to be solved
x = cvxpy.Variable(A.shape[1])
constraints = [A*x <= b]
# since we're only determining feasibility there is no minimization
# so just set the objective function to a constant
obj = cvxpy.Minimize(0)
# SCS was the most accurate/robust of the non-commercial solvers
# for my application
problem = cvxpy.Problem(obj, constraints)
problem.solve(solver=cvxpy.SCS)
# Any 'inaccurate' status indicates ambiguity, so you can
# return True or False as you please
if problem.status == 'infeasible' or problem.status.endswith('inaccurate'):
return True
else:
return False
cube = np.array([[1,1,1],[1,1,-1],[1,-1,1],[1,-1,-1],[-1,1,1],[-1,1,-1],[-1,-1,1],[-1,-1,-1]])
inside = np.array([[0.49,0.0,0.0]])
outside = np.array([[1.01,0,0]])
print("Clouds overlap?", clouds_overlap(cube, inside))
print("Clouds overlap?", clouds_overlap(cube, outside))
# Clouds overlap? True
# Clouds overlap? False
The area of numerical instability is when the two clouds just touch, or are arbitrarily close to touching such that it isn't possible to definitively say whether they overlap or not. That is one of the cases where you will see this algorithm report an 'inaccurate' status. In my code I chose to consider such cases overlapping, but since it is ambiguous you can decide for yourself what to do.
The code is in PyMC3, but this is a general problem. I want to find which matrix (combination of variables) gives me the highest probability. Taking the mean of the trace of each element is meaningless because they depend on each other.
Here is a simple case; the code uses a vector rather than a matrix for simplicity. The goal is to find a vector of length 2, where the each value is between 0 and 1, so that the sum is 1.
import numpy as np
import theano
import theano.tensor as tt
import pymc3 as mc
# define a theano Op for our likelihood function
class LogLike_Matrix(tt.Op):
itypes = [tt.dvector] # expects a vector of parameter values when called
otypes = [tt.dscalar] # outputs a single scalar value (the log likelihood)
def __init__(self, loglike):
self.likelihood = loglike # the log-p function
def perform(self, node, inputs, outputs):
# the method that is used when calling the Op
theta, = inputs # this will contain my variables
# call the log-likelihood function
logl = self.likelihood(theta)
outputs[0][0] = np.array(logl) # output the log-likelihood
def logLikelihood_Matrix(data):
"""
We want sum(data) = 1
"""
p = 1-np.abs(np.sum(data)-1)
return np.log(p)
logl_matrix = LogLike_Matrix(logLikelihood_Matrix)
# use PyMC3 to sampler from log-likelihood
with mc.Model():
"""
Data will be sampled randomly with uniform distribution
because the log-p doesn't work on it
"""
data_matrix = mc.Uniform('data_matrix', shape=(2), lower=0.0, upper=1.0)
# convert m and c to a tensor vector
theta = tt.as_tensor_variable(data_matrix)
# use a DensityDist (use a lamdba function to "call" the Op)
mc.DensityDist('likelihood_matrix', lambda v: logl_matrix(v), observed={'v': theta})
trace_matrix = mc.sample(5000, tune=100, discard_tuned_samples=True)
If you only want the highest likelihood parameter values, then you want the Maximum A Posteriori (MAP) estimate, which can be obtained using pymc3.find_MAP() (see starting.py for method details). If you expect a multimodal posterior, then you will likely need to run this repeatedly with different initializations and select the one that obtains the largest logp value, but that still only increases the chances of finding the global optimum, though cannot guarantee it.
It should be noted that at high parameter dimensions, the MAP estimate is usually not part of the typical set, i.e., it is not representative of typical parameter values that would lead to the observed data. Michael Betancourt discusses this in A Conceptual Introduction to Hamiltonian Monte Carlo. The fully Bayesian approach is to use posterior predictive distributions, which effectively averages over all the high-likelihood parameter configurations rather than using a single point estimate for parameters.
I have an array of team names from NCAA, along with statistics associated with them. The school names are often shortened or left out entirely, but there is usually a common element in all variations of the name (like Alabama Crimson Tide vs Crimson Tide). These names are all contained in an array in no particular order. I would like to be able to take all variations of a team name by fuzzy matching them and rename all variants to one name. I'm working in python 2.7 and I have a numpy array with all of the data. Any help would be appreciated, as I have never used fuzzy matching before.
I have considered fuzzy matching through a for-loop, which would (despite being unbelievably slow) compare each element in the column of the array to every other element, but I'm not really sure how to build it.
Currently, my array looks like this:
{Names , info1, info2, info 3}
The array is a few thousand rows long, so I'm trying to make the program as efficient as possible.
The Levenshtein edit distance is the most common way to perform fuzzy matching of strings. It is available in the python-Levenshtein package. Another popular distance is Jaro Winkler's distance, also available in the same package.
Assuming a simple array numpy array:
import numpy as np
import Levenshtein as lv
ar = np.array([
'string'
, 'stum'
, 'Such'
, 'Say'
, 'nay'
, 'powder'
, 'hiden'
, 'parrot'
, 'ming'
])
We define helpers to give us indexes of Levenshtein and Jaro distances, between a string we have and all strings in the array.
def levenshtein(dist, string):
return map(lambda x: x<dist, map(lambda x: lv.distance(string, x), ar))
def jaro(dist, string):
return map(lambda x: x<dist, map(lambda x: lv.jaro_winkler(string, x), ar))
Now, note that Levenshtein distance is an integer value counted in number of characters, whilst Jaro's distance is a floating point value that normally varies between 0 and 1. Let's test this using np.where:
print ar[np.where(levenshtein(3, 'str'))]
print ar[np.where(levenshtein(5, 'str'))]
print ar[np.where(jaro(0.00000001, 'str'))]
print ar[np.where(jaro(0.9, 'str'))]
And we get:
['stum']
['string' 'stum' 'Such' 'Say' 'nay' 'ming']
['Such' 'Say' 'nay' 'powder' 'hiden' 'ming']
['string' 'stum' 'Such' 'Say' 'nay' 'powder' 'hiden' 'parrot' 'ming']
I've got an image as numpy array and a mask for image.
from scipy.misc import face
img = face(gray=True)
mask = img > 250
How can I apply function to all masked elements?
def foo(x):
return int(x*0.5)
For that specific function, few approaches could be listed.
Approach #1 : You can use boolean indexing for in-place setting -
img[mask] = (img[mask]*0.5).astype(int)
Approach #2 : You can also use np.where for a possibly more intuitive solution -
img_out = np.where(mask,(img*0.5).astype(int),img)
With that np.where that has a syntax of np.where(mask,A,B), we are choosing between two equal shaped arrays A and B to produce a new array of the same shape as A and B. The selection is made based upon the elements in mask, which is again of the same shape as A and B. Thus for every True element in mask, we select A, otherwise B. Translating this to our case, A would be (img*0.5).astype(int) and B is img.
Approach #3 : There's a built-in np.putmask that seems to be the closest for this exact task and could be used to do in-place setting, like so -
np.putmask(img, mask, (img*0.5).astype('uint8'))
map(-30, -89.75, 89.75, 0, 360)
I'm looking for something like this where:
-30 is the input value.
-89.75 to 89.75 is the range of possible input values
0 - 360 is the final range to be mapped to
I was told there is a way to do this using http://ruby-doc.org/core-1.9.3/Enumerable.html#method-i-map
.. however its not readily apparent !
If I'm understanding correctly, I think you just want to uniformly map one range onto another. So, we just need to calculate how far through the input range it is, and return that fraction of the output range.
def map_range(input, in_low, in_high, out_low, out_high)
# map onto [0,1] using input range
frac = (input - in_low) / (in_high-in_low)
# map onto output range
frac * (out_high-out_low) + out_low
end
Also, I should note that map has a bit of a different meaning in ruby, and a more appropriate description would probably be transform.