Two-sided moving average in python - python-2.7

Hi I have some data and I want to compute the centered moving average or two-sided moving average.
I've understood how easy this can be done with the numpy.convolve function and I wonder if there is an easy or similar way in which this can be done, but when the average needs to be two-sided.
The one sided moving average usually works in the following way if the interval contains three entries, N = 3:
import numpy
list = [3, 4, 7, 8, 9, 10]
N = 3
window = numpy.repeat(1., N)/N
moving_avg = numpy.convolve(list, window, 'valid')
moving_avg = array([ 4.66666667, 6.33333333, 8. , 9. ])
Now what I am aiming to get is the average that is centered, so that if N = 3, the intervals over which the mean is taken are: [[3, 4, 7], [4, 7, 8], [7, 8, 9], [8, 9, 10]]. This is also tricky if N is an even number. Is there a tool to compute this? I'd prefer to do it either by writing a function or using numpy.

Like the commenters, I'm also confused what you're trying to accomplish that's different than the way you demonstrated.
In any case, I did want to offer a solution that lets you write your own convolution operations using Numba's #stencil decorator:
from numba import stencil
#stencil
def ma(a):
return (a[-1] + a[0] + a[1]) / 3
data = np.array([3, 4, 7, 8, 9, 10])
print(ma(data))
[0. 4.66666667 6.33333333 8. 9. 0. ]
Not sure if that's exactly what you're looking for, but the stencil operator is great. The variable you pass it represents a given element, and any indexing you use is relative to that element. As you can see, it was pretty easy to make a 3-element window to calculate a moving average.
Hopefully this gives you what you need.
Using a Large Neighborhood
You can add a parameter to the stencil, which is inclusive. Let's make a neighborhood of 9:
#stencil(neighborhood = ((-4, 4),))
def ma(a):
cumul = 0
for i in range(-4, 5):
cumul += a[i]
return cumul / 9
You can shift the range forward or back with (-8, 0) or (0, 8) and changing the range.
Setting N Neighborhood
Not sure if this is the best way, but I accomplished it with a wrapper:
def wrapper(data, N):
#nb.stencil(neighborhood = ((int(-(N-1)/2), int((N-1)/2)),))
def ma(a):
cumul = 0
for i in np.arange(int(-(N-1)/2), int((N-1)/2)+1):
cumul += a[i]
return cumul / N
return ma(data)
Again, indexing is weird, so you'll have to play with it to get the desired effect.

Related

Knapsack using dynamic programming

There is a common algorithm for solving the knapsack problem using dynamic programming. But it's not work for W=750000000, because there is an error of bad alloc. Any ideas how to solve this problem for my value of W?
int n=this->items.size();
std::vector<std::vector<uint64_t>> dps(this->W + 1, std::vector<uint64_t>(n + 1, 0));
for (int j = 1; j <= n; j++)
for (int k = 1; k <= this->W; k++) {
if (this->items[j - 1]->wts <= k)
dps[k][j] = std::max(dps[k][j - 1], dps[k - this->items[j - 1]->wts][j - 1] + this->items[j - 1]->cost);
else
dps[k][j] = dps[k][j - 1];
}
First of all, you can use only one dimension to solve the knapsack problem. This will reduce your memory from dp[W][n] (n*W space) to dp[W] (W space). You can look here: 0/1 Knapsack Dynamic Programming Optimazion, from 2D matrix to 1D matrix
But, even if you use only dp[W], your W is really high, and might be too much memory. If your items are big, you can use some approach to reduce the number of possible weights. First, realize that you don't need all positions of W, only those such that the sum of weight[i] exists.
For example:
W = 500
weights = [100, 200, 400]
You will never use position dp[473] of your matrix, because the items can occupy only positions p = [0, 100, 200, 300, 400, 500]. It is easy to see that this problem is the same as when:
W = 5
weights = [1,2,4]
Another more complicated example:
W = 20
weights = [5, 7, 8]
Using the same approach as before, you don't need all weights from 0 to 20, because the items can occupy only fill up to positions
p = [0, 5, 7, 5 + 7, 5 + 8, 7 + 8, 5 + 7 + 8]
p = [0, 5, 7, 12, 13, 15, 20]
, and you can reduce your matrix from dp[20] to dp[size of p] = M[7].
You do not show n, but even if we assume it is 1, lets see how much data you are trying to allocate. So, it would be:
W*64*2 // Here we don't consider overhead of the vector
This comes out to be:
750000000*64*2 bits = ~11.1758Gb
I am guessing this is more space then your program will allow. You are going to need to take a new approach. Perhaps try to handle the problem as multiple blocks. Consider the first and second half seperatley, then swap.

How to get accurate predictions from a neural network

I created below neural network for the truth table for the 3-input logic AND gate, but the expected output for the [1,1,0] is not correct.Output should be 0. But it predicts as 0.9 that means approximately 1. So the output is not correct. So what I need to know is how to make the output prediction more accurate.Please guide me.
import numpy as np
class NeuralNetwork():
def __init__(self):
self.X = np.array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 1]])
self.y = np.array([[0],
[0],
[0],
[0],
[0],
[0],
[1]])
np.random.seed(1)
# randomly initialize our weights with mean 0
self.syn0 = 2 * np.random.random((3, 4)) - 1
self.syn1 = 2 * np.random.random((4, 1)) - 1
def nonlin(self,x, deriv=False):
if (deriv == True):
return x * (1 - x)
return 1 / (1 + np.exp(-x))
def train(self,steps):
for j in xrange(steps):
# Feed forward through layers 0, 1, and 2
l0 = self.X
l1 = self.nonlin(np.dot(l0, self.syn0))
l2 = self.nonlin(np.dot(l1, self.syn1))
# how much did we miss the target value?
l2_error = self.y - l2
if (j % 10000) == 0:
print "Error:" + str(np.mean(np.abs(l2_error)))
# in what direction is the target value?
# were we really sure? if so, don't change too much.
l2_delta = l2_error * self.nonlin(l2, deriv=True)
# how much did each l1 value contribute to the l2 error (according to the weights)?
l1_error = l2_delta.dot(self.syn1.T)
# in what direction is the target l1?
# were we really sure? if so, don't change too much.
l1_delta = l1_error * self.nonlin(l1, deriv=True)
self.syn1 += l1.T.dot(l2_delta)
self.syn0 += l0.T.dot(l1_delta)
print("Output after training:")
print(l2)
def predict(self,newInput):
# Multiply the input with weights and find its sigmoid activation for all layers
layer0 = newInput
print("predict -> layer 0 : "+str(layer0))
layer1 = self.nonlin(np.dot(layer0, self.syn0))
print("predict -> layer 1 : "+str(layer1))
layer2 = self.nonlin(np.dot(layer1, self.syn1))
print("predicted output is : "+str(layer2))
if __name__ == '__main__':
ann=NeuralNetwork()
ann.train(100000)
ann.predict([1,1,0])
Output:
Error:0.48402933124
Error:0.00603525276229
Error:0.00407346660344
Error:0.00325224335386
Error:0.00277628698655
Error:0.00245737222701
Error:0.00222508289674
Error:0.00204641406194
Error:0.00190360175536
Error:0.00178613765229
Output after training:
[[ 1.36893057e-04]
[ 5.80758383e-05]
[ 1.19857670e-03]
[ 1.85443483e-03]
[ 2.13949603e-03]
[ 2.19360982e-03]
[ 9.95769492e-01]]
predict -> layer 0 : [1, 1, 0]
predict -> layer 1 : [ 0.00998162 0.91479567 0.00690524 0.05241988]
predicted output is : [ 0.99515547]
Actually, it does produce correct output -- the model is ambiguous. Your input data fits A*B; the value of the third input never affects the given output, so your model has no way to know that it's supposed to matter in case 110. In terms of pure information theory, you don't have the input to force the result you want.
Seems like this is happening for every input you miss in the AND gate. For example try replacing [0, 1, 1] input with [1, 1, 0] and then try to predict [0, 1, 1] it predicts the final value close to 1. I tried including biases and learning rate but nothing seem to work.
Like Prune mentioned it might be because the BackPropagation Network is not able to work with the incomplete model.
To train your network to the fullest and get optimal weights, provide all the possible inputs i.e 8 inputs to the AND gate. Then you can always get the correct predictions because you already trained the network with those inputs, which might not make sense with predictions in this case. May be predictions on a small dataset do not work that great.
This is just my guess because almost all the networks I used for predictions used to have fairly bigger datasets.

Tensorflow maxpooling in conv2d filter instead of atrous_conv2d

I want to perform a convolution on a big patch of the image. But I don't want too have to many variables, one solution could be to use the atrous_conv2d function but I would prefer apply first a max_pool on the patch, then the regular conv2d. How can I do this?
I have to keep the same image size between input and output. Here the code with the atrous_conv2d function
x = tf.placeholder('float', shape=[None, size_x*size_y])
image = tf.reshape(x, [-1,size_x , size_y,1])
W = weight_variable([9, 9, 1, n])
conv =tf.nn.atrous_conv2d(image, W, 10, padding='SAME')
If I understand correctly the patch size of the atrous_conv2d convolution is (9*10 X 9*10) but it act on each different pixel at 10 pixel interval and need only 9X9Xn variables.
I would prefer to take the same patch size, apply a max_pool on it, then a conventional conv2d on the (9X9) patch resulting from the max_pool. At the end it would produce the same numbers of variables but it could provide smoother results. The code could look like this :
x = tf.placeholder('float', shape=[None, size_x*size_y])
image = tf.reshape(x, [-1,size_x , size_y,1])
W = weight_variable([9, 9, 1, n])
def maxp(patch):
tf.sum_reduce(tf.nn.max_pool(patch, ksize=[1,10,10,1],
strides=[1,10,10,1], padding='SAME')*W)
conv=conv_func(image,maxp,patch_size=[1,9*10,9*10,1],strides=[1,1,1,1])
where conv_func take as argument the value, a function and a patch_size and apply the function on the patch.

Advanced comparison of lists

I am currently making a program for a school project which is supposed to help farmers overfertilize less (this is kind of a big deal in Denmark). The way the program is supposed to work is that you enter some information about your fields(content of NPK, field size, type of dirt and other things), and then i'll be able to compare their field's content of nutrition to the recommended amount. Then I can create the theoretical ideal composition of fertilizer for this field.
This much I have been able to do, but here is the hard part.
I have a long list of fertilizers that are available in Denmark, and I want my program to compare 10 of them to my theoretical ideal composition, and then automatically pick the one that fits best.
I literally have no idea how to do this!
The way I format my fertilizer compositions is in lists like this
>>>print(idealfertilizercomp)
[43.15177154944473, 3.9661554732945534, 43.62771020624008, 4.230565838180857, 5.023796932839768]
Each number represent one element in percent. An example could be the first number, 43.15177154944473, which is the amount of potassium I want in my fertilizer in percent.
TL;DR:
How do I make a program or function that can compare a one list of integers to a handfull other lists of integers, and then pick the one that fits best?
So, while i had dinner i actually came up with a way to compare multiple lists in proportion to another:
def numeric(x):
if x >= 0:
return x
else:
return -x
def comparelists(x,y):
z1 = numeric(x[0]-y[0])
z2 = numeric(x[1]-y[1])
z3 = numeric(x[2]-y[2])
z4 = numeric(x[3]-y[3])
z5 = numeric(x[4]-y[4])
zt = z1+z2+z3+z4+z5
return zt
def compare2inproportion(x, y, z):
n1 = (comparelists(x, y))
n2 = (comparelists(x, z))
print(n1)
print(n2)
if n1 < n2:
print(y, "is closer to", x, "than", z)
elif n1 > n2:
print(z, "is closer to", x, "than", y)
else:
print("Both", y, "and", z, "are equally close to", x)
idealfertilizer = [1, 2, 3, 4, 5]
fertilizer1 = [2, 3, 4, 5, 6]
fertilizer2 = [5, 4, 3, 2, 1]
compare2inproportion(idealfertilizer, fertilizer1, fertilizer2)
This is just a basic version that compares two lists, but its really easy to expand upon. The output looks like this:
[2, 3, 4, 5, 6] is closer to [1, 2, 3, 4, 5] than [5, 4, 3, 2, 1]
Sorry for taking your time, and thanks for the help.

How do I dereference in python? (Image Processing with openCV)

I've been looking all over the internet for a simple thinning algorithm and I stumbled across this: Thinning algorithm The problem is, I do not have too much experience with the dereference operator. Also, my project is in python which has a different way of handling this situation. So I have a few questions
1: What is this bit of code doing?
void myThinningInit (CvMat ** kpw, CvMat ** kpb)
{
/ / Kernel for cvFilter2D
/ / The algorithm kpw kernel binary image and it has become a matching white, black,
/ / Convolution is divided into two sets of binary image was inverted kpb kernel, then take the AND
for (int i = 0; i <8; i + +) {
* (Kpw + i) = cvCreateMat (3, 3, CV_8UC1);
* (Kpb + i) = cvCreateMat (3, 3, CV_8UC1);
cvSet (* (kpw + i), cvRealScalar (0), NULL);
cvSet (* (kpb + i), cvRealScalar (0), NULL);
}.....
And 2: How can I translate this kernels creation into python?
He ends up making 8 kernels but I have no idea what their matrix form looks like.
I don't understand what "* (kpw + i)" or "* (kpb + i)" does in the grand scheme of the program.
3) Can I just make the kernels and store them in a list? If so, how could I do that?
UPDATE:
k = [1, 2, 3, 5, 6, 7, 8]
kpw = []
kpb = []
for i in k:
kpw.append [i] = cv.CreateMat (3, 3, cv.CV_8UC1)
kpb.append [i] = cv.CreateMat (3, 3, cv.CV_8UC1)
cv.cvSet (kpw [i], cv.RealScalar (0), cv.NULL)
cv.cvSet (kpb [i], cv.RealScalar (0), cv.NULL)
At first I didn't just had kpw [i] and it was throwing me an error. After a quick google search I found that you needed to index the array first and the way they did that was through append. I tried this bit of code in order to get 8 base kernels of 3x3 in size but I received this error:
Traceback (most recent call last):
File "/home/krtzer/Documents/python_scripts/thinning.py", line 14, in
kpw.append [i] = cv.CreateMat (3, 3, cv.CV_8UC1)
TypeError: 'builtin_function_or_method' object does not support item assignment
Does this mean I cannot have matrices in lists?
That dereference is just creating a Matrix, without initialising its data. The data is manually set to zero by those lines like cvSet (* (kpw + i), cvRealScalar (0), NULL).
In python, you can just do the same thing in one hit with numpy.zeros and then use cv.fromarray. Alternatively, use x = cv.CreateMat(3, 3, cv.CV_8UC1) and then cv.set(x, 0.).
Edit - made a (pretty big) mistake in this answer, will explain
Looks like an array of CvMats in both kpw and kpb.
Suppose I made a list of arrays kpw = [] in Python.
The *(kpw + i) = ... is just like saying kpw[i] = ....
Looks like the other code initialising the list of kernels to 3x3 matrices of 0, so you could do:
# make a list of 8 3x3 matrices of 0.
kpw = []
for i in xrange(8):
kpw.append(np.zeros((3,3)))
kpb.append(np.zeros((3,3)))
Note: I previously had:
kpw = [np.zeros((3,3))] * 8
kpb = [np.zeros((3,3))] * 8
which is wrong ! It produces 8 references to the same matrix within kpw, and so modifying kpw[0] will also modify all the other kpw[i]!
Then the cvSet2D(*(kpb+0), 0, 0, cvRealScalar(0)); can be translated to :
kpb[0][0,0] = 0
Because *(kpb+0) grabs the matrix in kpg[0], the 0,0 means element 0,0 of the matrix, and 0 is the value.
So: every time you see *(kpb+i) just substitute kpb[i] and you should be find translating that code.
I made a new one in python. Thinning(Python)