Related
I've been playing around with using more efficient data structures and parallel processing and a few other things. I've made good progress getting a script from running in ~60 seconds down to running in about ~9 seconds.
The one thing I can't for the life of me get my head around though is writing a loop in Rcpp. Specifically, a loop that calculates line-by-line depending on previous-line results and updates the data as it goes.
Wondering if someone could convert my code into Rcpp that way I can back-engineer and figure out, with an example that I'm very familiar with, how its done.
It's a loop that calculates the result of 3 variables at each line. Line 1 has to be calculated separately, and then line 2 onwards calculates based on values from the current and previous lines.
This example code is just 6 lines long but my original code is many thousands:
temp <- matrix(c(0, 0, 0, 2.211, 2.345, 0, 0.8978, 1.0452, 1.1524, 0.4154,
0.7102, 0.8576, 0, 0, 0, 1.7956, 1.6348, 0,
rep(NA, 18)), ncol=6, nrow=6)
const1 <- 0.938
for (p in 1:nrow(temp)) {
if (p==1) {
temp[p, 4] <- max(min(temp[p, 2],
temp[p, 1]),
0)
temp[p, 5] <- max(temp[p, 3] + (0 - const1),
0)
temp[p, 6] <- temp[p, 1] - temp[p, 4] - temp[p, 5]
}
if (p>1) {
temp[p, 4] <- max(min(temp[p, 2],
temp[p, 1] + temp[p-1, 6]),
0)
temp[p, 5] <- max(temp[p, 3] + (temp[p-1, 6] - const1),
0)
temp[p, 6] <- temp[p-1, 6] + temp[p, 1] - temp[p, 4] - temp[p, 5]
}
}
Thanks in advance, hopefully this takes someone with Rcpp skills just a minute or two!
Here is an the sample Rcpp equivalent code:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix getResult(NumericMatrix x, double const1){
for (int p = 0; p < x.nrow(); p++){
if (p == 0){
x(p, 3) = std::max(std::min(x(p, 1), x(p, 0)), 0.0);
x(p, 4) = std::max(x(p, 2) + (0.0 - const1), 0.0);
x(p, 5) = x(p, 0) - x(p, 3) - x(p, 4);
}
if (p > 0){
x(p, 3) = std::max(std::min(x(p, 1), x(p, 0) + x(p - 1, 5)), 0.0);
x(p, 4) = std::max(x(p, 2) + (x(p - 1, 5) - const1), 0.0);
x(p, 5) = x(p - 1, 5) + x(p, 0) - x(p, 3) - x(p, 4);
}
}
return x;
}
A few notes:
Save this in a file and do Rcpp::sourceCpp("myCode.cpp") in your session to compile it and make it available within the session.
We use NumericMatrix here to represent the matrix.
You'll see that we call std::max and std::min respectively. These functions require two common data types, i.e. if we do max(x, y), both x and y must be of the same type. Numeric matrix entries are double (I believe), so you need to provide a double; hence, the change from 0 (an int in C++) to 0.0 (a double)
In C++, indexing starts from 0 instead of 1. As such, you convert R code like temp[1, 4] to temp(0, 3)
Have a look at http://adv-r.had.co.nz/Rcpp.html for more information to support your development
Update: If x was a list of vectors, here's an approach:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List getResult(List x, double const1){
// Create a new list from x called `res`
Rcpp::List res(x);
for (int p = 0; p < x.size(); p++){
// Initiate a NumericVector `curr` with the contents of `res[p]`
Rcpp::NumericVector curr(res[p]);
if (p == 0){
curr(3) = std::max(std::min(curr(1), curr(0)), 0.0);
curr(4) = std::max(curr(2) + (0.0 - const1), 0.0);
curr(5) = curr(0) - curr(3) - curr(4);
}
if (p > 0){
// Initiate a NumericVector `prev` with the contents of `res[p-1]`
Rcpp::NumericVector prev(res[p-1]);
curr(3) = std::max(std::min(curr(1), curr(0) + prev(5)), 0.0);
curr(4) = std::max(curr(2) + (prev(5) - const1), 0.0);
curr(5) = prev(5) + curr(0) - curr(3) - curr(4);
}
}
return x;
}
So I tried both jav's answers, and did a little bit of reading. Looks to me like Lists are a R-kinda-thing and Rcpp seems to prefer simple vectors and matrices and whatnot.
So I decided to pass my vectors from the list directly into the Rcpp script. The whole thing works wonders. My ~70 second script which I got down to about ~5 seconds with parallel processing is now running in 0.3 seconds. So Rcpp is pretty awesome at this as I had read.
Here's the code I went with:
(temp is a list of 3 vectors that feed into the calculation of the other 3 vectors, and that const 1 is a constant defined earlier in the code)
R code that calls the script:
temp <- getResult(zero=temp[[i]][, 1],
one=temp[[i]][, 2],
two=temp[[i]][, 3],
const1=constant, rows=(as.double(rowslength)))
Output is a matrix with 3 columns calculated by the following Rcpp script:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix AWBMgetResult(NumericVector zero, NumericVector one,
NumericVector two, double const1, double rows){
// create numericmatrix x
Rcpp::NumericMatrix x(rows, 3);
// compute loop
for (int p = 0; p < rows; p++){
if (p == 0){
x(p, 0) = std::max(std::min(one(p), zero(p)), 0.0);
x(p, 1) = std::max(two(p) + (0.0 - const1), 0.0);
x(p, 2) = zero(p) - x(p, 0) - x(p, 1);
}
else{
x(p, 0) = std::max(std::min(one(p), zero(p) + x(p - 1, 2)), 0.0);
x(p, 1) = std::max(two(p) + (x(p - 1, 2) - const1), 0.0);
x(p, 2) = x(p - 1, 2) + zero(p) - x(p, 0) - x(p, 1);
}
}
return x;
}
I went with if else in the Rcpp code because I couldn't find out how to do 2 loops in a row, 1 of for (p in 1:1) and 1 of for (p in 2:rowslength). But it doesn't seem to matter from a speed point of view. I assumed an if else would still be quicker than an if = 1, if > 1 (since that is checking the value of p at every row)
I am trying to create an Euclidean algorithm (to solve Bezout's Relation) for 2 polynomials in the GF(2^8).
I currently have this code for my different operations
class ReedSolomon:
gfSize = 256
genPoly = 285
log = [0]*gfSize
antilog = [0]*gfSize
def _genLogAntilogArrays(self):
self.antilog[0] = 1
self.log[0] = 0
self.antilog[255] = 1
for i in range(1,255):
self.antilog[i] = self.antilog[i-1] << 1
if self.antilog[i] >= self.gfSize:
self.antilog[i] = self.antilog[i] ^ self.genPoly
self.log[self.antilog[i]] = i
def __init__(self):
self._genLogAntilogArrays()
def _galPolynomialDivision(self,dividend, divisor):
result = dividend.copy()
for i in range(len(dividend) - (len(divisor)-1)):
coef = result[i]
if coef != 0:
for j in range(1, len(divisor)):
if divisor[j] != 0:
result[i + j] ^= self._galMult(divisor[j], coef) # équivalent result[i + j] += -divisor[j] * coef car dans un champ GF(2) addition <=> substraction <=> XOR
remainderIndex = -(len(divisor)-1)
return result[:remainderIndex], result[remainderIndex:]
def _galMultiplicationPolynomiale(self, x,y):
result = [0]*(len(x)+len(y)-1)
for i in range(len(x)):
for j in range(len(y)):
result[i+j] ^= self._galMult(x[i],y[j])
return result
def _galMult(self,x,y):
if ((x == 0) or (y == 0)):
val = 0
else:
val = self.antilog[(self.log[x] + self.log[y])%255]
return val
def _galPolynomialAddition(self, a, b):
polSum = [0] * max(len(a), len(b))
for index in range(0, len(a)):
polSum[index + len(polSum) - len(a)] = a[index]
for index in range(0, len(b)):
polSum[index + len(polSum) - len(b)] ^= b[index]
return (polSum)
And here is my euclidean algorithm :
def _galEuclideanAlgorithm(self,a,b):
r0 = a.copy()
r1 = b.copy()
u0 = [1]
u1 = [0]
v0 = [0]
v1 = [1]
while max(r1) != 0:
print(r1)
q,r = self._galPolynomialDivision(r0,r1)
r0 = self._galPolynomialAddition(self._galMultiplicationPolynomiale(q,r1),r)
r1,r0 = self._galPolynomialAddition(r0,self._galMultiplicationPolynomiale(q,r1)),r1.copy()
u1,u0 = self._galPolynomialAddition(u0,self._galMultiplicationPolynomiale(q,u1)),u1.copy()
v1,v0 = self._galPolynomialAddition(v0,self._galMultiplicationPolynomiale(q,v1)),v1.copy()
return r1,u1,v1
I don't understand my issue where my algorithm is looping, here is my remainder output with my tests:
rs = ReedSolomon()
a = [1,15,7,8,0,11]
b = [1,0,0,0,0,0,0]
print(rs._galEuclideanAlgorithm(b,a))
#Console output
'''
[1, 15, 7, 8, 0, 11]
[0, 0, 82, 37, 120, 11, 105]
[1, 15, 7, 8, 0, 11]
[0, 0, 82, 37, 120, 11, 105]
[1, 15, 7, 8, 0, 11]
[0, 0, 82, 37, 120, 11, 105]
[1, 15, 7, 8, 0, 11]
'''
I know it might seem like I'm throwing some code just expecting an answer, but I'm genuinely searching for the error.
Thanks in advance !
I created a Python package called galois that does this. galois extends NumPy arrays to operate over Galois fields. The code is written in Python but JIT compiled with Numba for speed. In addition to array arithmetic, it also supports polynomials over Galois fields. ...And Reed-Solomon codes are implemented too :)
The Extended Euclidean Algorithm to solve the Bezout identity for two polynomials in GF(2^8) would be solved this way. Below is an abbreviated chunk of source code. You can see my full source code here.
def poly_egcd(a, b):
field = a.field
zero = Poly.Zero(field)
one = Poly.One(field)
r2, r1 = a, b
s2, s1 = one, zero
t2, t1 = zero, one
while r1 != zero:
q = r2 / r1
r2, r1 = r1, r2 - q*r1
s2, s1 = s1, s2 - q*s1
t2, t1 = t1, t2 - q*t1
# Make the GCD polynomial monic
c = r2.coeffs[0] # The leading coefficient
if c > 1:
r2 /= c
s2 /= c
t2 /= c
return r2, s2, t2
And here is a complete example using the galois library and the polynomials from your example. (I'm assuming the highest-degree coefficient is first?)
In [1]: import galois
In [2]: GF = galois.GF(2**8)
In [3]: print(GF.properties)
GF(2^8):
characteristic: 2
degree: 8
order: 256
irreducible_poly: x^8 + x^4 + x^3 + x^2 + 1
is_primitive_poly: True
primitive_element: x
In [4]: a = galois.Poly([1,15,7,8,0,11], field=GF); a
Out[4]: Poly(x^5 + 15x^4 + 7x^3 + 8x^2 + 11, GF(2^8))
In [5]: b = galois.Poly([1,0,0,0,0,0,0], field=GF); b
Out[5]: Poly(x^6, GF(2^8))
In [6]: d, s, t = galois.poly_egcd(a, b); d, s, t
Out[6]:
(Poly(1, GF(2^8)),
Poly(78x^5 + 7x^4 + 247x^3 + 74x^2 + 152, GF(2^8)),
Poly(78x^4 + 186x^3 + 45x^2 + x + 70, GF(2^8)))
In [7]: a*s + b*t == d
Out[7]: True
I am attempting to implement a perceptron. I have loaded a 100x2 array of values between 0 and 100. Each item in the array has a label of either -1 or 1.
I believe the perceptron is working, however I cannot plot decision boundary as shown here: plot decision boundary matplotlib
When I run my code I only see a single color background. I would expect to see two colors, one color for each label in my data set (-1 and 1).
My current output, I expect to see 2 colors for the background (-1 or 1)
An example of what I hope to see, from the sklearn documentation
import numpy as np
from matplotlib import pyplot as plt
def generate_data():
#generate a dataset that is linearly seperable
group_1 = np.random.randint(50, 100, size=(50,2))
group_1_labels = np.full((50,1), 1)
group_2 = np.random.randint(0, 49, size =(50,2))
group_2_labels = np.full((50,1), -1)
#add a bias value of -1
bias = np.full((50,1), -1)
#add labels, upper right quadrant are 1, lower left are -1
group_1_with_bias = np.hstack((group_1, bias))
group_2_with_bias = np.hstack((group_2, bias))
group_1_labeled = np.hstack((group_1_with_bias, group_1_labels))
group_2_labeled = np.hstack((group_2_with_bias, group_2_labels))
#merge our labeled data and shuffle!
merged_data = np.vstack((group_1_labeled, group_2_labeled))
np.random.shuffle(merged_data)
return merged_data
data = generate_data()
#load data, strip labels, add a -1 bias value
X = data[:, :3]
#create labels matrix
l = np.ravel(data[:, 3:])
def perceptron_sgd(X, l, c, epochs):
#initialize weights
w = np.zeros(3)
errors = []
for epoch in range(epochs):
total_error = 0
for i, x in enumerate(X):
if (np.dot(x, w) * l[i]) <= 0:
total_error += (np.dot(x, w) * l[i])
w = w + c * (x * l[i])
errors.append(total_error * -1)
print "epoch " + str(epoch) + ": " + str(w)
return w, errors
def classify(X, l, w):
z = np.dot(X, w)
print z
z[z <= 0] = -1
z[z > 0] = 1
#return a matrix of predicted labels
return z
w, errors = perceptron_sgd(X, l, .001, 36)
# X - some data in 2dimensional np.array
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, .2), np.arange(y_min, y_max, .2))
# here "model" is your model's prediction (classification) function
Z = classify(np.c_[xx.ravel(), yy.ravel()], l, w[:-1]) #strip the bias from weights
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
plt.axis('off')
#Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=l, cmap=plt.cm.Paired)
I got it to work.
Standardized your X
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(X[:, :-1])
X_trans = np.column_stack((scaler.transform(X[:, :-1]), X[:, -1]))
Better initialization than zero.
#initialize weights
r = np.sqrt(2)
w = np.random.uniform(-r, r, (3,))
Add learned biases during prediction
z = np.dot(X, w[:-1]) + w[-1]
Standardize during prediction as well (using standardization learned from input)
Z = classify(scaler.transform(np.c_[xx.ravel(), yy.ravel()]),
l, w) #strip the bias from weights
Generally, always a good idea to standardize the inputs.
Entire code:
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
def generate_data():
#generate a dataset that is linearly seperable
group_1 = np.random.randint(50, 100, size=(50,2))
group_1_labels = np.full((50,1), 1)
group_2 = np.random.randint(0, 49, size =(50,2))
group_2_labels = np.full((50,1), -1)
#add a bias value of -1
bias = np.full((50,1), -1)
#add labels, upper right quadrant are 1, lower left are -1
group_1_with_bias = np.hstack((group_1, bias))
group_2_with_bias = np.hstack((group_2, bias))
group_1_labeled = np.hstack((group_1_with_bias, group_1_labels))
group_2_labeled = np.hstack((group_2_with_bias, group_2_labels))
#merge our labeled data and shuffle!
merged_data = np.vstack((group_1_labeled, group_2_labeled))
np.random.shuffle(merged_data)
return merged_data
data = generate_data()
#load data, strip labels, add a -1 bias value
X = data[:, :3]
#create labels matrix
l = np.ravel(data[:, 3:])
from sklearn import preprocessing
scaler = preprocessing.StandardScaler().fit(X[:, :-1])
X_trans = np.column_stack((scaler.transform(X[:, :-1]), X[:, -1]))
def perceptron_sgd(X, l, c, epochs):
#initialize weights
r = np.sqrt(2)
w = np.random.uniform(-r, r, (3,))
errors = []
for epoch in range(epochs):
total_error = 0
for i, x in enumerate(X):
if (np.dot(x, w) * l[i]) <= 0:
total_error += (np.dot(x, w) * l[i])
w = w + c * (x * l[i])
errors.append(total_error * -1)
print("epoch " + str(epoch) + ": " + str(w))
return w, errors
def classify(X, l, w):
z = np.dot(X, w[:-1]) + w[-1]
print(z)
z[z <= 0] = -1
z[z > 0] = 1
#return a matrix of predicted labels
return z
w, errors = perceptron_sgd(X_trans, l, .01, 25)
# X - some data in 2dimensional np.array
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, .1), np.arange(y_min, y_max, .1))
# here "model" is your model's prediction (classification) function
Z = classify(scaler.transform(np.c_[xx.ravel(), yy.ravel()]), l, w) #strip the bias from weights
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.4)
#plt.axis('off')
#Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=l, cmap=plt.cm.Paired)
So I wanted to see if I could make fractal flames using matplotlib and figured a good test would be the sierpinski triangle. I modified a working version I had that simply performed the chaos game by normalizing the x range from -2, 2 to 0, 400 and the y range from 0, 2 to 0, 200. I also truncated the x and y coordinates to 2 decimal places and multiplied by 100 so that the coordinates could be put in to a matrix that I could apply a color map to. Here's the code I'm working on right now (please forgive the messiness):
import numpy as np
import matplotlib.pyplot as plt
import math
import random
def f(x, y, n):
N = np.array([[x, y]])
M = np.array([[1/2.0, 0], [0, 1/2.0]])
b = np.array([[.5], [0]])
b2 = np.array([[0], [.5]])
if n == 0:
return np.dot(M, N.T)
elif n == 1:
return np.dot(M, N.T) + 2*b
elif n == 2:
return np.dot(M, N.T) + 2*b2
elif n == 3:
return np.dot(M, N.T) - 2*b
def norm_x(n, minX_1, maxX_1, minX_2, maxX_2):
rng = maxX_1 - minX_1
n = (n - minX_1) / rng
rng_2 = maxX_2 - minX_2
n = (n * rng_2) + minX_2
return n
def norm_y(n, minY_1, maxY_1, minY_2, maxY_2):
rng = maxY_1 - minY_1
n = (n - minY_1) / rng
rng_2 = maxY_2 - minY_2
n = (n * rng_2) + minY_2
return n
# Plot ranges
x_min, x_max = -2.0, 2.0
y_min, y_max = 0, 2.0
# Even intervals for points to compute orbits of
x_range = np.arange(x_min, x_max, (x_max - x_min) / 400.0)
y_range = np.arange(y_min, y_max, (y_max - y_min) / 200.0)
mat = np.zeros((len(x_range) + 1, len(y_range) + 1))
random.seed()
x = 1
y = 1
for i in range(0, 100000):
n = random.randint(0, 3)
V = f(x, y, n)
x = V.item(0)
y = V.item(1)
mat[norm_x(x, -2, 2, 0, 400), norm_y(y, 0, 2, 0, 200)] += 50
plt.xlabel('x0')
plt.ylabel('y')
fig = plt.figure(figsize=(10,10))
plt.imshow(mat, cmap="spectral", extent=[-2,2, 0, 2])
plt.show()
The mathematics seem solid here so I suspect something weird is going on with how I'm handling where things should go into the 'mat' matrix and how the values in there correspond to the colormap.
If I understood your problem correctly, you need to transpose your matrix using the method .T. So just replace
fig = plt.figure(figsize=(10,10))
plt.imshow(mat, cmap="spectral", extent=[-2,2, 0, 2])
plt.show()
by
fig = plt.figure(figsize=(10,10))
ax = gca()
ax.imshow(mat.T, cmap="spectral", extent=[-2,2, 0, 2], origin="bottom")
plt.show()
The argument origin=bottom tells to imshow to have the origin of your matrix at the bottom of the figure.
Hope it helps.
I'm trying to calculate the points in a cuboid given its centre (which is a Vector3) and the lengths of the sides along the x, y and z axis. I found the following on math.stackexchange.com: https://math.stackexchange.com/questions/107778/simplest-equation-for-drawing-a-cube-based-on-its-center-and-or-other-vertices which says I can use the following formulae:
The constructor for the World class is:
World::World(Vector3 o, float d1, float d2, float d3) : origin(o)
{
// If we consider an edge length to be d, we need to find r such that
// 2r = d in order to calculate the positions of each vertex in the world.
float r1 = d1 / 2,
r2 = d2 / 2,
r3 = d3 / 2;
for (int i = 0; i < 8; i++)
{
/* Sets up the vertices of the cube.
*
* #see http://bit.ly/1cc2RPG
*/
float x = o.getX() + (std::pow(-1, i&1) * r1),
y = o.getY() + (std::pow(-1, i&2) * r2),
z = o.getZ() + (std::pow(-1, i&4) * r3);
points[i] = Vector3(x, y, z);
std::cout << points[i] << "\n";
}
}
And I passing the following parameters to the constructor:
Vector3 o(0, 0, 0);
World w(o, 100.f, 100.f, 100.f);
The coordinates being output for all 8 vertices are:
(50, 50, 50)
(-50, 50, 50)
(50, 50, 50)
(-50, 50, 50)
(50, 50, 50)
(-50, 50, 50)
(50, 50, 50)
(-50, 50, 50)
Which cannot be correct. Any guidance would be very much appreciated!
The problem lies in the bitwise & inside your pow calls:
In the y and z components, they always return 0 and 2 or 4, respectively. -1^2 = -1^4 = 1, which is why the sign of these components is always positive. You could try (i&2)!=0 or (i&2) >> 1 for the y component instead. The same goes for the z component.
Change this:
float x = o.getX() + (std::pow(-1, i&1) * r1),
y = o.getY() + (std::pow(-1, i&2) * r2),
z = o.getZ() + (std::pow(-1, i&4) * r3);
To this:
float x = o.getX() + (std::pow(-1, (i ) & 1) * r1), // pow(-1, 0) == 1, pow(-1, 1) == -1
y = o.getY() + (std::pow(-1, (i >> 1) & 1) * r2), // pow(-1, 0) == 1, pow(-1, 1) == -1
z = o.getZ() + (std::pow(-1, (i >> 2) & 1) * r3); // pow(-1, 0) == 1, pow(-1, 1) == -1
Or even to this:
float x = o.getX() + (std::pow(-1, (i )) * r1), // pow(-1, {0, 2, 4, 6}) == 1, pow(-1, {1, 3, 5, 7}) == -1
y = o.getY() + (std::pow(-1, (i >> 1)) * r2), // pow(-1, {0, 2}) == 1, pow(-1, {1, 3}) == -1
z = o.getZ() + (std::pow(-1, (i >> 2)) * r3); // pow(-1, 0) == 1, pow(-1, 1) == -1
The problem is that as written even though the values you mask out identify weather or not the lengths need to be negated. They are not in the correct place value to get the desired properties from the exponentiation of -1.
Rewriting the code as I have above will solve this issue, however it would be more readable and in general more permanent just to unroll the loop and manually write if each one is an addition or subtraction without using the pow function.