selecting vector based on index in R vs rcpp - c++

EDIT
After doing some more reading around C++, I managed to concise my question on what exactly I want to do. Basically, I have two vectors:
my_vec <- c(270, 291, 330, 378)
x <- 1:109
The four elements of my_vec correspond to 1st, 23rd, 61st and 109th element of vector x I want to write an Rcpp code that sums up x from 1:23rd value, 24th to 61st value and 62nd to 109th value as shown below in R:
sum(x[1:22]) # 253
sum(x[23:61]) # 1638
sum(x[62:109]) # 4104
I wrote the following Rcpp code to this:
library(Rcpp)
cppFunction('
List sum_calc(NumericVector x,
NumericVector my_vec){
int n = my_vec.size();
NumericVector sum_vec (n-1);
double sum;
int start_counter;
int counter = 0;
int start;
int end;
for(int i=0; i<(n-1); ++i){
start = my_vec[i];
end = my_vec[i + 1];
double x_sum = 0;
start_counter = start;
while(start_counter <= end){
x_sum += x[counter];
counter += 1;
start_counter += 1;
};
sum_vec[i] = x_sum;
};
return Rcpp::List::create(Rcpp::Named("sum_vec") = sum_vec);
}
')
sum_calc(x, my_vec)
$sum_vec
[1] 253 1700 4042
However only the first result is matching my output while the others two don't. I think this is an index question but I don't know how to go about it. I have read C++ online and some material but still I cannot solve this. Where I am getting it wrong?

Related

Is there a limit on working with matrix in R with Rcpp?

I was trying to develop a program in R to estimate a Spearman correlation with Rcpp. I did it, but it only works with matrix with less of a range between 45 00 - 50 000 vectors. I don't know why, but it only works with that dimension. I suppose there's limit with that type of information, maybe if I work it like a data.frame? I would really appreciate if someone gives me insight.
Here i post my code. Ive been trying to limit the max integer number that i call "denominador", which exceeds it. Maybe you could help me.
cppFunction('double spearman(NumericMatrix x){
int nrow = x.nrow(), ncol = x.ncol();
int nrow1 = nrow - 1;
double out = 0;
double cont = 0;
double cont1 = 0;
double r = 0;
int denominador = ncol*(pow(ncol,2.0)-1)
for(int i = 0; i < nrow1; i++){
#Here i use every combination of vectors starting with the first one, and so on
for(int j = i +1; j < nrow; j++){
cont1 = 0;
for(int t = 0; t < ncol; t++){
cont = pow(x(i,t)-x(j,t), 2.0);
cont1 += cont;
}
#Here i begin to store the mean correlation, in order to a final mean of all the possible correlations
r = 2*(1-6*(cont1/denominador))/(nrow*nrow1);
out += r;
}
}
return out;
}')
To repeat more succintly:
You can have more than 2^31-1 elements in a vector.
Matrices are vectors with dim attributes.
You can have more than 2^31-1 elements in a matrix (ie n times k)
Your row and column index are still limited to 2^31.
Example of a big vector:
R> n <- .Machine$integer.max + 100
R> tmpVec <- 1:n
R> length(tmpVec)
[1] 2147483747
R> newVec <- sqrt(tmpVec)
R>
A couple caveats
Before we get started, I'm assuming:
R > 3.0.0
Long Vectors that allow for 2 ^ 52 elements are then supported
Rcpp > 0.12.0
Patch where thirdwing replaced instances of int and size_t with R_xlen_t and R_xlength. See release post for more details...
Constructing a large NumericMatrix
I think you may be running into a memory allocation issue...
As the following works on my 32gb machine:
Rcpp::cppFunction("NumericMatrix make_matrix(){
NumericMatrix m(50000, 50000);
return m;
}")
m = make_matrix()
object.size(m)
## 20000000200 bytes # about 20.0000002 gb
Running:
# Creates an 18.6gb matrix!!!
m = matrix(0, ncol = 50000, nrow = 50000)
Rcpp::cppFunction("void get_length(NumericMatrix m){
Rcout << m.nrow() << ' ' << m.ncol();
}")
get_length(m)
## 50000 50000
object.size(m)
## 20000000200 bytes # about 20.0000002 gb
Matrix Bounds
In theory, you are bounded by the total number of elements in the matrix being less than (2^31 - 1)^2 = 4,611,686,014,132,420,609 per:
Arrays (including matrices) can be based on long vectors provided each of their dimensions is at most 2^31 - 1: thus there are no 1-dimensional long arrays.
See Long Vector
Now, fitting into a matrix:
m = matrix(nrow=2^31, ncol=1)
Error in matrix(nrow = 2^31, ncol = 1) :
invalid 'nrow' value (too large or NA)
In addition: Warning message:
In matrix(nrow = 2^31, ncol = 1) :
NAs introduced by coercion to integer range
The limit both R and Rcpp adhere to regarding the column/row is:
.Machine$integer.max
## 2147483647
Note that by 1 number we have:
2^31 = 2,147,483,648 > 2,147,483,647 = .Machine$integer.max
Maximum Amount of Elements in a Vector
However, the limit associated with a pure atomic vector is given as 2^52 (even though it should be in the ballpark of 2 ^ 64 - 1). Thus, we have the following example which illustrates the ability to access 2^32 by concatenating two vectors of 2^31 + 2^31:
v = numeric(2^31)
length(v)
## [1] 2147483648
object.size(v)
## 17179869224 bytes # about 17.179869224 gb
v2 = c(v,v)
length(v2)
## 4294967296
object.size(v2)
## 34359738408 bytes # about 34.359738408 gb
Suggestions
Use bigmemory via Rcpp
Maintain your own stack of vectors.

Reorganizing a vector in c++

I'd like to preface this question with the fact that I am very inexperienced when it comes to coding, so the solution to this problem could be much easier than what I have been trying. I have a vector 'phas' defined as vector<float> phase; that has 7987200 elements and I want to rearrange this vector into 133120 vectors of 60 elements (called line2 defined as vector<long double> line2;). Each vector of 60 should then be placed one after the other in a vector of vectors 'RFlines2' defined as vector< vector<long double> > RFlines2;and RFlines2.resize(7987200);. I want to fill each of the 60 element vectors with elements of 'phas' separated by 128. for example, the first vector of 60 elements would be filled with phas[0], phas[128], phas[256], ... phas[7680]. The second vector of 60 would then be filled with phas[1], phas[129], phas[257], ... phas[7681],...etc. My current code is as follows:
for(int x = 0; x<133120; x++){
if((x == 128 || x == 7680+128 || x == (7680*a)+128)){
x = 7680*a;
a = a + 1;
}
int j = x;
for(int i = 0; i<60;i++){
line2.pushback(i);
line2[i] = phas[j];
j = j + 128;
}
cout<<"This is x: "<<x<<endl;
RFlines2[x] = line2;
line2.clear();
}
however, after 128 iterations of the outter loop (128 vectors of 60 have been created and 7680 elements from phas have been used), I would need the x value to jump to 7680 to avoid putting elements from phas that have already been used into the next vector of 60 since when x = 128 the first element of the next vector of 60 would be phase[128], which was already used as the 2nd element of the first vector of 60. And then after another 128 x iterations, I would need the x value to jump to 15,360 and so on. The code above is my latest attempt, but when I try to do the fftw on each vector of 60 in RFlines2 as follows:
int c = 0;
for(int x = 0; x < 133120; x++){
//cout<<x<<endl;
fftw_plan p2;
inter = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * W);
outter = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * W);
/* cast elements in line to type fftw_complex */
for (int i = 0; i <60; i++) {
//cout<<i<<endl;
//inter[i][0] = phas[i];
//inter[x][0] = zlines[x];
inter[i][0] = RFlines2[x][i];
inter[i][1] = 0;
}
p2 = fftw_plan_dft_1d(60, inter, outter, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(p2);
//inter[x][0].clear();
for(int u = 0; u<60;u++){
if(u == 0){
cout<<' '<<outter[0][0]<<' '<<c++<<endl;
}
}
fftw_free(inter);
fftw_free(outter);
fftw_destroy_plan((p2));
}
the program crashes after displaying outer[0][0] 128 times. Any ideas how to fix this? Also, let me know if anything that I said doesn't make sense and I'll try to clarify. Thanks in advance!
-Mike
I don't know why your code crashes, because I can't see the whole code here. But I'm going to suggest a way to scatter your data and manage your vectors.
(There is an important caveat though: you should not be using vectors (at least not vectors of vectors) for this task; you are better off using 1D vectors and managing the 2D indexing yourself. But this is a performance thing, and does not impact correctness.)
This is how I suggest you fill your RFLines2: (I have not tried this code, so it may not work.)
// first, build the memory for RFLines2...
vector<vector<long double>> RFLines2 (133120, vector<long double>(60));
// assuming a "phase" vector...
for (unsigned i = 0; i < 7987200; ++i)
{
unsigned const row = (i / (128 * 60)) * 128 + (i % (128 * 60)) % 128;
unsigned const col = (i % (128 * 60)) / 128;
RFLines[row][col] = phase[i];
}
You won't need the line2 intermediate this way.
The rest of the code "should" work. (BTW, I don't understand the inner for loop on u at all. What were you trying to do there?)

C++ Nested for loops

I am writing a program that modifies data in a csv file.
In the csv file, the COLUMNS are organized as follows..
X-coordinate, Y-coordinate, Z-coordinate, info, X, Y, Z, info, X, Y, Z info..
The first X-coordinate begins in column 4 and the next one is 4 columns after, in 8. For Y, it's column 5 and column 9, so on. Since I saved the data onto a deque, the first ones correspond to data[row#][3] for x, and y would be data[row#][5].
for(int k=0; k<618; k++) { //all rows 618
for(int l=3; l<96; l=l+4) { //x columns
for(int m=4; m<97; m=m+4) { //y columns
data[k][l] = (data[k][l] )*(data[k][2]) + (data[k][m])*(data[k][1]);
In the calculation in the loop, I want it to replace all the x values (l) in columns (k) with the value I get from this equation (as I created for the loop)
x' = x* cos(theta) + y* sin(theta)
the values for cos(theta) and sin(theta) are found in columns 2 and 3 for all the rows (hence, data[k][2] and data[k][1].
Unfortunately, in testing this out with several cout statements, I noticed it is not doing as desired.
DESIRED BEHAVIOR OF LOOP:
1st time through loop: Calculation is done for row 1, x = value inside column 4 and y= value in col.5
*end of loop iteration, re-start, k, l, and m get updated to 2,9,10.
Calculation in the loop is executed for these new values, so on.
Main issue is k, l, m are not all three being updated as desired after the data[k][l] line What could be causing this?
Thank you.
You do not understand nested loops.
What you intend is something like this:
for(int k=0; k<618; k++) { //all rows 618
for(int n=0; n<24; ++n) { //groups
l = 4*n + 3;
m = 4*n + 4
data[k][l] = (data[k][l] )*(data[k][2]) + (data[k][m])*(data[k][1]);
}
}

Find a centroid of a dataset

If I have some random data set let's say
X Y
1.2 16
5.7 0.256
128.54 6.879
0 2.87
6.78 0
2.98 3.7
... ...
x' y'
How can I find the centroid coordinates of this data set?
p.s. Here what I tried but got wrong results
float Dim1[K];
float Dim2[K];
float centroidD1[K];
float centroidD2[K];
int K = 4;
int counter[K];
for(int i = 0; i < K ; i++)
{
Dim1[i] = 0;
Dim2[i] = 0;
counter[i] = 0;
for(int j = 0; j < hash["Cluster"].size(); j++)
{
if(hash["Cluster"].value(j) == i+1)
{
Dim1[i] += hash["Dim_1"].value(j);
Dim2[i] += hash["Dim_2"].value(j);
counter[i]++;
}
}
}
for(int l = 0; l < K; l++)
{
centroidD1[l] = Dim1[l] / counter[l];
centroidD2[l] = Dim2[l] / counter[l];
}
I guess I choose wrong algorithm for doing it, as I get wrong results.
Calculating a sum and dividing by N is not a good idea if you have a large data set. As your floating point accumulator grows adding a new point eventually stop working due to the magnitude difference. An incremental formula might work better, see: https://math.stackexchange.com/questions/106700/incremental-averageing
If the issue is too large a data set you can verify the basic functioning of your code by using a smaller data set with a hand verified result. For example, just 1 data point, or 10 data points.

Generating incomplete iterated function systems

I am doing this assignment for fun.
http://groups.csail.mit.edu/graphics/classes/6.837/F04/assignments/assignment0/
There are sample outputs at site if you want to see how it is supposed to look. It involves iterated function systems, whose algorithm according the the assignment is:
for "lots" of random points (x0, y0)
for k=0 to num_iters
pick a random transform fi
(xk+1, yk+1) = fi(xk, yk)
display a dot at (xk, yk)
I am running into trouble with my implementation, which is:
void IFS::render(Image& img, int numPoints, int numIterations){
Vec3f color(0,1,0);
float x,y;
float u,v;
Vec2f myVector;
for(int i = 0; i < numPoints; i++){
x = (float)(rand()%img.Width())/img.Width();
y = (float)(rand()%img.Height())/img.Height();
myVector.Set(x,y);
for(int j = 0; j < numIterations;j++){
float randomPercent = (float)(rand()%100)/100;
for(int k = 0; k < num_transforms; k++){
if(randomPercent < range[k]){
matrices[k].Transform(myVector);
}
}
}
u = myVector.x()*img.Width();
v = myVector.y()*img.Height();
img.SetPixel(u,v,color);
}
}
This is how my pick a random transform from the input matrices:
fscanf(input,"%d",&num_transforms);
matrices = new Matrix[num_transforms];
probablility = new float[num_transforms];
range = new float[num_transforms+1];
for (int i = 0; i < num_transforms; i++) {
fscanf (input,"%f",&probablility[i]);
matrices[i].Read3x3(input);
if(i == 0) range[i] = probablility[i];
else range[i] = probablility[i] + range[i-1];
}
My output shows only the beginnings of a Sierpinski triangle (1000 points, 1000 iterations):
My dragon is better, but still needs some work (1000 points, 1000 iterations):
If you have RAND_MAX=4 and picture width 3, an evenly distributed sequence like [0,1,2,3,4] from rand() will be mapped to [0,1,2,0,1] by your modulo code, i.e. some numbers will occur more often. You need to cut off those numbers that are above the highest multiple of the target range that is below RAND_MAX, i.e. above ((RAND_MAX / 3) * 3). Just check for this limit and call rand() again.
Since you have to fix that error in several places, consider writing a utility function. Then, reduce the scope of your variables. The u,v declaration makes it hard to see that these two are just used in three lines of code. Declare them as "unsigned const u = ..." to make this clear and additionally get the compiler to check that you don't accidentally modify them afterwards.