I'm new to Keras framework and I want to implement the following loss function of
Root Mean Squared Logarithmic Error
Here is my code for Keras with tensorflow backend
def loss_function(y_true, y_pred):
ones = K.ones(shape=K.shape(y_pred).shape)
y_pred = tf.add(y_pred,ones)
y_true = tf.add(y_true,ones)
val = K.sqrt(K.mean(K.sum(K.log(y_pred)-K.log(y_true))))
return val
But I end up getting the following error:
ValueError: Error when checking input: expected dense_1_input to have shape (None, 16) but got array with shape (1312779, 11)
with the val returned to be 0.
The order of your operations is inverted.
Since "log(true) - log(pred)" can be either negative or positive (the result may be a little higher or a little lower than the expected), the square is the first thing that must happen. (The square is responsible for eliminating the negative signs).
And the mean is the last one (the most external), because you want first to compute the error for each element, and only after that you get the mean of the error. (The mean function already carries the sum function in it).
So:
def loss_function(y_true, y_pred):
y_pred = y_pred + 1
y_true = y_true + 1
return K.mean(K.square(K.log(y_pred)-K.log(y_true)))
Please note that this does not carry the "root" part. If you want to add it, I'd say that the root should go before the mean (different from the formula in the picture)
I'd use this instead:
return K.mean(K.sqrt(K.square(K.log(y_pred)-K.log(y_true))))
Make sure that your model ends with an activation that outputs numbers greater or equal to zero:
Relu is ok
Sigmoid is ok
Softmax is ok
Other activations may have negative values and will bring errors with log:
linear is not ok
tanh is not ok
Related
I'm trying to find an efficient algorithm for determining whether two convex hulls intersect or not. The hulls consist of data points in N-dimensional space, where N is 3 up to 10 or so. One elegant algorithm was suggested here using linprog from scipy, but you have to loop over all points in one hull, and it turns out the algorithm is very slow for low dimensions (I tried it and so did one of the respondents). It seems to me the algorithm could be generalized to answer the question I am posting here, and I found what I think is a solution here. The authors say that the general linear programming problem takes the form Ax + tp >= 1, where the A matrix contains the points of both hulls, t is some constant >= 0, and p = [1,1,1,1...1] (it's equivalent to finding a solution to Ax > 0 for some x). As I am new to linprog() it isn't clear to me whether it can handle problems of this form. If A_ub is defined as on page 1 of the paper, then what is b_ub?
There is a nice explanation of how to do this problem, with an algorithm in R, on this website. My original post referred to the scipy.optimize.linprog library, but this proved to be insufficiently robust. I found that the SCS algorithm in the cvxpy library worked very nicely, and based on this I came up with the following python code:
import numpy as np
import cvxpy as cvxpy
# Determine feasibility of Ax <= b
# cloud1 and cloud2 should be numpy.ndarrays
def clouds_overlap(cloud1, cloud2):
# build the A matrix
cloud12 = np.vstack((-cloud1, cloud2))
vec_ones = np.r_[np.ones((len(cloud1),1)), -np.ones((len(cloud2),1))]
A = np.r_['1', cloud12, vec_ones]
# make b vector
ntot = len(cloud1) + len(cloud2)
b = -np.ones(ntot)
# define the x variable and the equation to be solved
x = cvxpy.Variable(A.shape[1])
constraints = [A*x <= b]
# since we're only determining feasibility there is no minimization
# so just set the objective function to a constant
obj = cvxpy.Minimize(0)
# SCS was the most accurate/robust of the non-commercial solvers
# for my application
problem = cvxpy.Problem(obj, constraints)
problem.solve(solver=cvxpy.SCS)
# Any 'inaccurate' status indicates ambiguity, so you can
# return True or False as you please
if problem.status == 'infeasible' or problem.status.endswith('inaccurate'):
return True
else:
return False
cube = np.array([[1,1,1],[1,1,-1],[1,-1,1],[1,-1,-1],[-1,1,1],[-1,1,-1],[-1,-1,1],[-1,-1,-1]])
inside = np.array([[0.49,0.0,0.0]])
outside = np.array([[1.01,0,0]])
print("Clouds overlap?", clouds_overlap(cube, inside))
print("Clouds overlap?", clouds_overlap(cube, outside))
# Clouds overlap? True
# Clouds overlap? False
The area of numerical instability is when the two clouds just touch, or are arbitrarily close to touching such that it isn't possible to definitively say whether they overlap or not. That is one of the cases where you will see this algorithm report an 'inaccurate' status. In my code I chose to consider such cases overlapping, but since it is ambiguous you can decide for yourself what to do.
The code is in PyMC3, but this is a general problem. I want to find which matrix (combination of variables) gives me the highest probability. Taking the mean of the trace of each element is meaningless because they depend on each other.
Here is a simple case; the code uses a vector rather than a matrix for simplicity. The goal is to find a vector of length 2, where the each value is between 0 and 1, so that the sum is 1.
import numpy as np
import theano
import theano.tensor as tt
import pymc3 as mc
# define a theano Op for our likelihood function
class LogLike_Matrix(tt.Op):
itypes = [tt.dvector] # expects a vector of parameter values when called
otypes = [tt.dscalar] # outputs a single scalar value (the log likelihood)
def __init__(self, loglike):
self.likelihood = loglike # the log-p function
def perform(self, node, inputs, outputs):
# the method that is used when calling the Op
theta, = inputs # this will contain my variables
# call the log-likelihood function
logl = self.likelihood(theta)
outputs[0][0] = np.array(logl) # output the log-likelihood
def logLikelihood_Matrix(data):
"""
We want sum(data) = 1
"""
p = 1-np.abs(np.sum(data)-1)
return np.log(p)
logl_matrix = LogLike_Matrix(logLikelihood_Matrix)
# use PyMC3 to sampler from log-likelihood
with mc.Model():
"""
Data will be sampled randomly with uniform distribution
because the log-p doesn't work on it
"""
data_matrix = mc.Uniform('data_matrix', shape=(2), lower=0.0, upper=1.0)
# convert m and c to a tensor vector
theta = tt.as_tensor_variable(data_matrix)
# use a DensityDist (use a lamdba function to "call" the Op)
mc.DensityDist('likelihood_matrix', lambda v: logl_matrix(v), observed={'v': theta})
trace_matrix = mc.sample(5000, tune=100, discard_tuned_samples=True)
If you only want the highest likelihood parameter values, then you want the Maximum A Posteriori (MAP) estimate, which can be obtained using pymc3.find_MAP() (see starting.py for method details). If you expect a multimodal posterior, then you will likely need to run this repeatedly with different initializations and select the one that obtains the largest logp value, but that still only increases the chances of finding the global optimum, though cannot guarantee it.
It should be noted that at high parameter dimensions, the MAP estimate is usually not part of the typical set, i.e., it is not representative of typical parameter values that would lead to the observed data. Michael Betancourt discusses this in A Conceptual Introduction to Hamiltonian Monte Carlo. The fully Bayesian approach is to use posterior predictive distributions, which effectively averages over all the high-likelihood parameter configurations rather than using a single point estimate for parameters.
I have been trying to implement the Stupid Backoff language model (the description is available here, though I believe the details are not relevant to the question).
The thing is, the code's working and producing the result that is expected, but works slower than I expected. I figured out the part that was slowing down everything is here (and NOT in the training part):
def compute_score(self, sentence):
length = len(sentence)
assert length <= self.n
if length == 1:
word = tuple(sentence)
return float(self.ngrams[length][word]) / self.total_words
else:
words = tuple(sentence[::-1])
count = self.ngrams[length][words]
if count == 0:
return self.alpha * self.compute_score(sentence[1:])
else:
return float(count) / self.ngrams[length - 1][words[:-1]]
def score(self, sentence):
""" Takes a list of strings as argument and returns the log-probability of the
sentence using your language model. Use whatever data you computed in train() here.
"""
output = 0.0
length = len(sentence)
for idx in range(length):
if idx < self.n - 1:
current_score = self.compute_score(sentence[:idx+1])
else:
current_score = self.compute_score(sentence[idx-self.n+1:idx+1])
output += math.log(current_score)
return output
self.ngrams is a nested dictionary that has n entries. Each of these entries is a dictionary of form (word_i, word_i-1, word_i-2.... word_i-n) : the count of this combination.
self.alpha is a constant that defines the penalty for going n-1.
self.n is the maximum length of that tuple that the program is looking for in the dictionary self.ngrams. It is set to 3 (though setting it to 2 or even 1 doesn't anything). It's weird because the Unigram and Bigram models work just fine in fractions of a second.
The answer that I am looking for is not a refactored version of my own code, but rather a tip which part of it is the most computationally expensive (so that I could figure out myself how to rewrite it and get the most educational profit from solving this problem).
Please, be patient, I am but a beginner (two months into the world of programming). Thanks.
UPD:
I timed the running time with the same data using time.time():
Unigram = 1.9
Bigram = 3.2
Stupid Backoff (n=2) = 15.3
Stupid Backoff (n=3) = 21.6
(It's on some bigger data than originally because of time.time's bad precision.)
If the sentence is very long, most of the code that's actually running is here:
def score(self, sentence):
for idx in range(len(sentence)): # should use xrange in Python 2!
self.compute_score(sentence[idx-self.n+1:idx+1])
def compute_score(self, sentence):
words = tuple(sentence[::-1])
count = self.ngrams[len(sentence)][words]
if count == 0:
self.compute_score(sentence[1:])
else:
self.ngrams[len(sentence) - 1][words[:-1]]
That's not meant to be working code--it just removes the unimportant parts.
The flow in the critical path is therefore:
For each word in the sentence:
Call compute_score() on that word plus the following 2. This creates a new list of length 3. You could avoid that with itertools.islice().
Construct a 3-tuple with the words reversed. This creates a new tuple. You could avoid that by passing the -1 step argument when making the slice outside this function.
Look up in self.ngrams, a nested dict, with the first key being a number (might be faster if this level were a list; there are only three keys anyway?), and the second being the tuple just created.
Recurse with the first word removed, i.e. make a new tuple (sentence[2], sentence[1]), or
Do another lookup in self.ngrams, implicitly creating another new tuple (words[:-1]).
In summary, I think the biggest problem you have is the repeated and nested creation and destruction of lists and tuples.
I have a file of binary values. The section I am looking at is 4 byte int with the values in the pattern of MW1, MVAR1, MW2, MVAR2,...
I read the values in with
temp = array.array("f")
temp.fromfile(file, length *2)
mw_mvar = temp.tolist()
I then calculate the magnitude like this.
mag = [0] * length
for x in range(0,length * 2, 2):
a = mw_mvar[x]
b = mw_mvar[x + 1]
mag[(x / 2)] = sqrt(a*a + b*b)
The calculations (not the read) are doubling the total length of my script. I know there is (theoretically) a way to do this faster because am mimicking a script that ultimately calls fortran (pyd to call function dlls in fortran i think) which is able to do this calculation with negligible affect on run time.
This is the best i can come up with. any suggestions for improvements?
I have also tried math.pow(), **.5, **2 with no differences.
with no luck improving the calculations, I went around the problem. I realised that I only needed 1% of those calculated values so I created a class to calculate them on demand. It was important (to me) that the resulting code act similar to as if it were a list of calculated values. A lot of the remainder of the process uses the values and different versions of the data are pre-calculated. The class means i don't need a set of procedures for each version of data
class mag:
def __init__(self,mw_mvar):
self._mw_mvar = mw_mvar
#_sgn = sgn
def __len__(self):
return len(self._mw_mvar/2)
def __getitem__(self, item):
return sqrt(self._mw_mvar[2*item] ** 2 + self._mw_mvar[2*item+1] ** 2)
ps this could also be done in a function and take both versions. i would have had to make more changes to the overall script.
function (a,b,x):
if b[x]==0:
return a[x]
else:
return sqrt(a[x]**2 + b[x]**2)
Is is possible in Python to have a generator that yields values in a loop be alternatively called as a normal function where the final value of that loop is returned? I tried setting a flag as argument and then choose to yield or return depending on that flag. But the mere existence of the keyword yield in a function transforms it automatically into a generator and Python complains that there was a return statement in a generator.
Here an example of such a function:
def function(generator=True):
a = 0
for i in range(10):
a = i
if generator:
yield a
if not generator:
return a
Such a function would be useful for me when in some cases I just need the final result (eg. using it as residual function for optimization) while in other cases I need the incremental results after each iteration (for example using a differential model for a robot, updating the robot's pose with each new velocity command). For now I am having two functions where one has the yield and the other has the return. So is it possible to combine those two?
You'll still have to iterate over it to get its single value, but you can use yield again instead of return to accomplish this.
>>> def function(generator=True):
... a = 0
... for i in range(10):
... a = i
... if generator:
... yield a
... if not generator:
... yield a
...
>>> a = function()
>>> print(*a)
0 1 2 3 4 5 6 7 8 9
>>> a = function(0)
>>> print(a)
<generator object function at 0x0000000001603240>
>>> print(*a)
9
>>> a = function(0)
Note that having return inside a generator is a SyntaxError in Python 2, but not Python 3. Replacing the return with yield produces the same result in 2 and 3.
it's still a generator, even calling return.
I'd to not mix both generator/regular function anyway.
You can wrap something on top of the iterator, in case you need to loop through the results anyway.
A simple code that could do what you want:
last = None
for last in function(): pass
Now last holds the value you want and you can use it on your code.
The return inside a generator was added to Python 3.3
And it's equivalent to StopIteration(value)
return expr in a generator causes StopIteration(expr) to be raised
upon exit from the generator.
In a generator, the statement
return value is semantically equivalent to
raise StopIteration(value) except that, as currently, the exception
cannot be caught by except clauses within the returning generator.