Fast access to Rcpp::List elements - c++

I have a data set that I really want to work with as a 3D array. Rather than deal with an attempt to get an R array into a RcppArmadillo Cube, which I'm not sure would work (?), I'm sending in a list of matrices. My problem, however, is that the list is of large matrices and I want to be able to loop over the 3rd dimension in the middle of loops over rows or columns. With medium size matrices (list of 20 matrices of size 50,000x5), flattening the list into one long array gets me my result in less than a second.
I'd prefer to avoid copying the data in order to accommodate larger matrices. But using as< NumericMatrix >(list_obj[t]) inside a loop over the rows makes the function take several minutes at least. An example of my code use as<> that is incredibly slow is below. dat is the list sent into the function. steps is an int sent into the function.
T = dat.size()
N = as<NumericMatrix>(dat[0]).nrow()
M = as<NumericMatrix>(dat[0]).ncol()
// Temp vals
double top, bot;
// Output vector
NumericVector out(M);
// Loop through each signal
for (int j=0; j<M; j++) {
// Reset numerator and denominator
top = 0;
bot = 0;
// Loop through each time dimension
for (int tm = 0; tm < (T - steps); tm++) {
// Loop through each row
for (int i = 0; i < N; i++) {
// Check if entry is positive
if (as<NumericMatrix>(dat[tm])(i, j) > 0) {
// Increment denominator
bot += 1.0;
// Compute future product
top = 1.0;
for (int k = 1; k <= steps; k++) {
if (as<NumericMatrix>(dat[tm + k])(i, j) == 0) {
top = 0.0;
break;
}
}
}
}
out(j) = top / bot;
}
}
Is there a fast way to do this without flattening the matrix and requiring a full copy of the potentially large data?

Related

Multiplying Matrices with two for loops in C++ [duplicate]

I came up with this algorithm for matrix multiplication. I read somewhere that matrix multiplication has a time complexity of o(n^2).
But I think my this algorithm will give o(n^3).
I don't know how to calculate time complexity of nested loops. So please correct me.
for i=1 to n
for j=1 to n
c[i][j]=0
for k=1 to n
c[i][j] = c[i][j]+a[i][k]*b[k][j]
Using linear algebra, there exist algorithms that achieve better complexity than the naive O(n3). Solvay Strassen algorithm achieves a complexity of O(n2.807) by reducing the number of multiplications required for each 2x2 sub-matrix from 8 to 7.
The fastest known matrix multiplication algorithm is Coppersmith-Winograd algorithm with a complexity of O(n2.3737). Unless the matrix is huge, these algorithms do not result in a vast difference in computation time. In practice, it is easier and faster to use parallel algorithms for matrix multiplication.
The naive algorithm, which is what you've got once you correct it as noted in comments, is O(n^3).
There do exist algorithms that reduce this somewhat, but you're not likely to find an O(n^2) implementation. I believe the question of the most efficient implementation is still open.
See this wikipedia article on Matrix Multiplication for more information.
The standard way of multiplying an m-by-n matrix by an n-by-p matrix has complexity O(mnp). If all of those are "n" to you, it's O(n^3), not O(n^2). EDIT: it will not be O(n^2) in the general case. But there are faster algorithms for particular types of matrices -- if you know more you may be able to do better.
In matrix multiplication there are 3 for loop, we are using since execution of each for loop requires time complexity O(n). So for three loops it becomes O(n^3)
I recently had a matrix multiplication problem in my college assignment, this is how I solved it in O(n^2).
import java.util.Scanner;
public class q10 {
public static int[][] multiplyMatrices(int[][] A, int[][] B) {
int ra = A.length; // rows in A
int ca = A[0].length; // columns in A
int rb = B.length; // rows in B
int cb = B[0].length; // columns in B
// if columns of A is not equal to rows of B, then the two matrices,
// cannot be multiplied.
if (ca != rb) {
System.out.println("Incorrect order, multiplication cannot be performed");
return A;
} else {
// AB is the product of A and B, and it will have rows,
// equal to rown in A and columns equal to columns in B
int[][] AB = new int[ra][cb];
int k = 0; // column number of matrix B, while multiplying
int entry; // = Aij, value in ith row and at jth index
for (int i = 0; i < A.length; i++) {
entry = 0;
k = 0;
for (int j = 0; j < A[i].length; j++) {
// to evaluate a new Aij, clear the earlier entry
if (j == 0) {
entry = 0;
}
int currA = A[i][j]; // number selected in matrix A
int currB = B[j][k]; // number selected in matrix B
entry += currA * currB; // adding to the current entry
// if we are done with all the columns for this entry,
// reset the loop for next one.
if (j + 1 == ca) {
j = -1;
// put the evaluated value at its position
AB[i][k] = entry;
// increase the column number of matrix B as we are done with this one
k++;
}
// if this row is done break this loop,
// move to next row.
if (k == cb) {
j = A[i].length;
}
}
}
return AB;
}
}
#SuppressWarnings({ "resource" })
public static void main(String[] args) {
Scanner ip = new Scanner(System.in);
System.out.println("Input order of first matrix (r x c):");
int ra = ip.nextInt();
int ca = ip.nextInt();
System.out.println("Input order of second matrix (r x c):");
int rb = ip.nextInt();
int cb = ip.nextInt();
int[][] A = new int[ra][ca];
int[][] B = new int[rb][cb];
System.out.println("Enter values in first matrix:");
for (int i = 0; i < ra; i++) {
for (int j = 0; j < ca; j++) {
A[i][j] = ip.nextInt();
}
}
System.out.println("Enter values in second matrix:");
for (int i = 0; i < rb; i++) {
for (int j = 0; j < cb; j++) {
B[i][j] = ip.nextInt();
}
}
int[][] AB = multiplyMatrices(A, B);
System.out.println("The product of first and second matrix is:");
for (int i = 0; i < AB.length; i++) {
for (int j = 0; j < AB[i].length; j++) {
System.out.print(AB[i][j] + " ");
}
System.out.println();
}
}
}

How to access a vector inside a vector?

So I have a vector of vectors type double. I basically need to be able to set 360 numbers to cosY, and then put those 360 numbers into cosineY[0], then get another 360 numbers that are calculated with a different a now, and put them into cosineY[1].Technically my vector is going to be cosineYa I then need to be able to take out just cosY for a that I specify...
My code is saying this:
for (int a = 0; a < 8; a++)
{
for int n=0; n <= 360; n++
{
cosY[n] = cos(a*vectorOfY[n]);
}
cosineY.push_back(cosY);
}
which I hope is the correct way of actually setting it.
But then I need to take cosY for a that I specify, and calculate another another 360 vector, which will be stored in another vector again as a vector of vectors.
Right now I've got:
for (int a = 0; a < 8; a++
{
for (int n = 0; n <= 360; n++)
{
cosProductPt[n] = (VectorOfY[n]*cosY[n]);
}
CosProductY.push_back(cosProductPt);
}
The VectorOfY is besically the amplitude of an input wave. What I am doing is trying to create a cosine wave with different frequencies (a). I am then calculation the product of the input and cosine wave at each frequency. I need to be able to access these 360 points for each frequency later on in the program, and right now also I need to calculate the addition of all elements in cosProductPt, for every frequency (stored in cosProductY), and store it in a vector dotProductCos[a].
I've been trying to work it out but I don't know how to access all the elements in a vector of vectors to add them. I've been trying to do this for the whole day without any results. Right now I know so little that I don't even know how I would display or access a vector inside a vector, but I need to use that access point for the addition.
Thank you for your help.
for (int a = 0; a < 8; a++)
{
for int n=0; n < 360; n++) // note traded in <= for <. I think you had an off by one
// error here.
{
cosY[n] = cos(a*vectorOfY[n]);
}
cosineY.push_back(cosY);
}
Is sound so long as cosY has been pre-allocated to contain at least 360 elements. You could
std::vector<std::vector<double>> cosineY;
std::vector<double> cosY(360); // strongly consider replacing the 360 with a well-named
// constant
for (int a = 0; a < 8; a++) // same with that 8
{
for int n=0; n < 360; n++)
{
cosY[n] = cos(a*vectorOfY[n]);
}
cosineY.push_back(cosY);
}
for example, but this hangs on to cosY longer than you need to and could cause problems later, so I'd probably scope cosY by throwing the above code into a function.
std::vector<std::vector<double>> buildStageOne(std::vector<double> &vectorOfY)
{
std::vector<std::vector<double>> cosineY;
std::vector<double> cosY(NumDegrees);
for (int a = 0; a < NumVectors; a++)
{
for int n=0; n < NumDegrees; n++)
{
cosY[n] = cos(a*vectorOfY[n]); // take radians into account if needed.
}
cosineY.push_back(cosY);
}
return cosineY;
}
This looks horrible, returning the vector by value, but the vast majority of compilers will take advantage of Copy Elision or some other sneaky optimization to eliminate the copying.
Then I'd do almost the exact same thing for the second step.
std::vector<std::vector<double>> buildStageTwo(std::vector<double> &vectorOfY,
std::vector<std::vector<double>> &cosineY)
{
std::vector<std::vector<double>> CosProductY;
for (int a = 0; a < numVectors; a++)
{
for (int n = 0; n < NumDegrees; n++)
{
cosProductPt[n] = (VectorOfY[n]*cosineY[a][n]);
}
CosProductY.push_back(cosProductPt);
}
return CosProductY;
}
But we can make a couple optimizations
std::vector<std::vector<double>> buildStageTwo(std::vector<double> &vectorOfY,
std::vector<std::vector<double>> &cosineY)
{
std::vector<std::vector<double>> CosProductY;
for (int a = 0; a < numVectors; a++)
{
// why risk constantly looking up cosineY[a]? grab it once and cache it
std::vector<double> & cosY = cosineY[a]; // note the reference
for (int n = 0; n < numDegrees; n++)
{
cosProductPt[n] = (VectorOfY[n]*cosY[n]);
}
CosProductY.push_back(cosProductPt);
}
return CosProductY;
}
And the next is kind of an extension of the first:
std::vector<std::vector<double>> buildStageTwo(std::vector<double> &vectorOfY,
std::vector<std::vector<double>> &cosineY)
{
std::vector<std::vector<double>> CosProductY;
std::vector<double> cosProductPt(360);
for (std::vector<double> & cosY: cosineY) // range based for. Gets rid of
{
for (int n = 0; n < NumDegrees; n++)
{
cosProductPt[n] = (VectorOfY[n]*cosY[n]);
}
CosProductY.push_back(cosProductPt);
}
return CosProductY;
}
We could do the same range-based for trick for the for (int n = 0; n < NumDegrees; n++), but since we are iterating multiple arrays here it's not all that helpful.

Spatial Locality for 3D array's?

why is A[k][i][j] better for spatial locality in a 3D array? ( where i,j,k are row, col, depth) CMU lecture 55min
I think that OP's question
why is A[k][i][j] better for spatial locality in a 3D array? ( where i,j,k are row, col, depth)
Comes from a misunderstanding of the exercise given as an example of spatial locality, where the reader is asked to
permute the loops so that the function ... has good spatial locality
and this code is given:
int sum_array_3d(int a[M][N][N])
{
int i, j, k, sum = 0;
for (i = 0; i < M; i++)
for (j = 0; j < N; j++)
for (k = 0; k < N; k++)
sum += a[k][i][j];
return sum;
}
My interpretation of this task is that the students are asked to either rewrite the inner statement as sum += a[i][j][k]; or change the order of the loops:
int sum_array_3d(int a[M][N][N])
{
int i, j, k, sum = 0;
for (k = 0; k < M; k++) // <-- those are reordered
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
sum += a[k][i][j]; // <-- this is mantained, verbatim
return sum;
}
Actually, that example is completely wrong. While rank 0 goes from 0..M-1, that loop is iterating 0..N-1. Unless M==N, you'll be reading the wrong element.
The goal is to have your loop iteratively access physically-adjacent locations in memory by manipulating the order of the loops.
Whenever your program reads a value, the CPU requests it from the cache controller. If it's not in cache, that value - and those near it - are retrieved from memory and stored in the cache.
If you then read the next element, it should (usually) already be in the cache, so there's no slow round-trip out to the next cache or host RAM.
If your loop is walking all over the place rather than taking advantage of spatial locality, then you run the risk of suffering far more cache misses, which makes things slow.
In short: getting stuff from the cache is fast, getting it from RAM is slow, and ordering your loops so that they touch adjacent locations helps keep the cache happy.
In graphics, we typically do this:
int a[M*N*N];
for(int offset=0; offset < M*N*N; ++offset)
{
//int y = offset / cols;
//int x = offset % rows;
sum += a[offset];
}
if you need an element by it's X,Y, just
offset = Y * cols + X;
int val = a[offset];
or for 3D
offset = Z*N*N + Y*N + X
or
offset = Z * rows * cols + Y * cols + X;
... and skip all the multidimensional array silliness.
Personally, I'd just do this:
int *p = &a[0][0][0]; // could probably just do int* p=a, but for clarity...
//... array gets populated somehow
for(int i=0;i<M*N*N;++i)
{
sum += p[i];
}
... but that assumes the array is a regular square array, not an array of pointers, or an array of an array of pointers.

compute error from linear system - extract every column

I want to compute the error in linear least squares method.
I have matrices A,B and X. (AX=B).
Sizes are : A(NxN) , B(NxNRHS) , X(N,NRHS) ,where NRHS is number of right hand side.
The error is computed as sqrt(sum(B-AX)).
But I must take into account every column of B and X in order to make the substraction.
I must substract B[i]-A[..]X[i] -> where i is every column of B and X.
I can't figure how to do it ,hence how to extract every column.I can't find the right indices for B and X matrices (I think) ,because I must go beyond whole A matrix and only beyond every column of B and X.
I am doing something like this (using column major order):
int N=128;
int NRHS =1;
int Asize=N*N;
int Bsize=N*NRHS;
int Xsize=N*NRHS;
A=(double*)malloc(Asize*sizeof(double));
B=(double*)malloc(Bsize*sizeof(double));
X=(double*)malloc(Xsize*sizeof(double));
...
for(int i = 0; i < N; i++)
{
for (int j=0;j<NRHS; j++){
diff[i+j*N] = fabs(B[i+j*N] - A[i+j*N]*X[i+j*N]);
abs_error=sqrt(sums(diff,N));
}
}
I thought of adding some statement using the modulo operator but I couldn't figure.
sums is just a function which gives the sum of an array where the second argument is the number of elements.
You could first do a matrix multiplication of A and X using loops.
Then you could write another 2 loops to compute the difference (B - AX). This would simply your problem.
Edit
After you compute the product of A and X, assuming that you store the product in a variable named AX,the following code will give you the difference between corresponding elements.
differenceMatrix = (double*)malloc(Bsize*sizeof(double));
for(int i = 0; i < N; i++)
{
for (int j = 0; j < NRHS; j++){
differenceMatrix[i+j*N] = fabs(B[i+j*N] - AX[i+j*N]);
}
}
Each column of the differenceMatrix contains the difference between corresponding elements.
Edit
To obtain the sum of difference of each column
double sumOfDifferencePerColumn;
for(int i = 0; i < N; i++)
{
sumOfDifferencePerColumn = 0.0;
for (int j = 0; j < NRHS; j++){
sumOfDifferencePerColumn += ( fabs(B[i+j*N] - AX[i+j*N]) );
}
// add code to take square root or use the sum of difference of each column
}

Algorithm for smoothing

I wrote this code for smoothing of a curve .
It takes 5 points next to a point and adds them and averages it .
/* Smoothing */
void smoothing(vector<Point2D> &a)
{
//How many neighbours to smooth
int NO_OF_NEIGHBOURS=10;
vector<Point2D> tmp=a;
for(int i=0;i<a.size();i++)
{
if(i+NO_OF_NEIGHBOURS+1<a.size())
{
for(int j=1;j<NO_OF_NEIGHBOURS;j++)
{
a.at(i).x+=a.at(i+j).x;
a.at(i).y+=a.at(i+j).y;
}
a.at(i).x/=NO_OF_NEIGHBOURS;
a.at(i).y/=NO_OF_NEIGHBOURS;
}
else
{
for(int j=1;j<NO_OF_NEIGHBOURS;j++)
{
a.at(i).x+=tmp.at(i-j).x;
a.at(i).y+=tmp.at(i-j).y;
}
a.at(i).x/=NO_OF_NEIGHBOURS;
a.at(i).y/=NO_OF_NEIGHBOURS;
}
}
}
But i get very high values for each point, instead of the similar values to the previous point . The shape is maximized a lot , what is going wrong in this algorithm ?
What it looks like you have here is a bass-ackwards implementation of a finite impulse response (FIR) filter that implements a boxcar window function. Thinking about the problem in terms of DSP, you need to filter your incoming vector with NO_OF_NEIGHBOURS equal FIR coefficients that each have a value of 1/NO_OF_NEIGHBOURS. It is normally best to use an established algorithm rather than reinvent the wheel.
Here is a pretty scruffy implementation that I hammered out quickly that filters doubles. You can easily modify this to filter your data type. The demo shows filtering of a few cycles of a rising saw function (0,.25,.5,1) just for demonstration purposes. It compiles, so you can play with it.
#include <iostream>
#include <vector>
using namespace std;
class boxFIR
{
int numCoeffs; //MUST be > 0
vector<double> b; //Filter coefficients
vector<double> m; //Filter memories
public:
boxFIR(int _numCoeffs) :
numCoeffs(_numCoeffs)
{
if (numCoeffs<1)
numCoeffs = 1; //Must be > 0 or bad stuff happens
double val = 1./numCoeffs;
for (int ii=0; ii<numCoeffs; ++ii) {
b.push_back(val);
m.push_back(0.);
}
}
void filter(vector<double> &a)
{
double output;
for (int nn=0; nn<a.size(); ++nn)
{
//Apply smoothing filter to signal
output = 0;
m[0] = a[nn];
for (int ii=0; ii<numCoeffs; ++ii) {
output+=b[ii]*m[ii];
}
//Reshuffle memories
for (int ii = numCoeffs-1; ii!=0; --ii) {
m[ii] = m[ii-1];
}
a[nn] = output;
}
}
};
int main(int argc, const char * argv[])
{
boxFIR box(1); //If this is 1, then no filtering happens, use bigger ints for more smoothing
//Make a rising saw function for demo
vector<double> a;
a.push_back(0.); a.push_back(0.25); a.push_back(0.5); a.push_back(0.75); a.push_back(1.);
a.push_back(0.); a.push_back(0.25); a.push_back(0.5); a.push_back(0.75); a.push_back(1.);
a.push_back(0.); a.push_back(0.25); a.push_back(0.5); a.push_back(0.75); a.push_back(1.);
a.push_back(0.); a.push_back(0.25); a.push_back(0.5); a.push_back(0.75); a.push_back(1.);
box.filter(a);
for (int nn=0; nn<a.size(); ++nn)
{
cout << a[nn] << endl;
}
}
Up the number of filter coefficients using this line to see a progressively more smoothed output. With just 1 filter coefficient, there is no smoothing.
boxFIR box(1);
The code is flexible enough that you can even change the window shape if you like. Do this by modifying the coefficients defined in the constructor.
Note: This will give a slightly different output to your implementation as this is a causal filter (only depends on current sample and previous samples). Your implementation is not causal as it looks ahead in time at future samples to make the average, and that is why you need the conditional statements for the situation where you are near the end of your vector. If you want output like what you are attempting to do with your filter using this algorithm, run the your vector through this algorithm in reverse (This works fine so long as the window function is symmetrical). That way you can get similar output without the nasty conditional part of algorithm.
in following block:
for(int j=0;j<NO_OF_NEIGHBOURS;j++)
{
a.at(i).x=a.at(i).x+a.at(i+j).x;
a.at(i).y=a.at(i).y+a.at(i+j).y;
}
for each neighbour you add a.at(i)'s x and y respectively to neighbour values.
i understand correctly, it should be something like this.
for(int j=0;j<NO_OF_NEIGHBOURS;j++)
{
a.at(i).x += a.at(i+j+1).x
a.at(i).y += a.at(i+j+1).y
}
Filtering is good for 'memory' smoothing. This is the reverse pass for the learnvst's answer, to prevent phase distortion:
for (int i = a.size(); i > 0; --i)
{
// Apply smoothing filter to signal
output = 0;
m[m.size() - 1] = a[i - 1];
for (int j = numCoeffs; j > 0; --j)
output += b[j - 1] * m[j - 1];
// Reshuffle memories
for (int j = 0; j != numCoeffs; ++j)
m[j] = m[j + 1];
a[i - 1] = output;
}
More about zero-phase distortion FIR filter in MATLAB: http://www.mathworks.com/help/signal/ref/filtfilt.html
The current-value of the point is used twice: once because you use += and once if y==0. So you are building the sum of eg 6 points but only dividing by 5. This problem is in both the IF and ELSE case. Also: you should check that the vector is long enough otherwise your ELSE-case will read at negative indices.
Following is not a problem in itself but just a thought: Have you considered to use an algorithm that only touches every point twice?: You can store a temporary x-y-value (initialized to be identical to the first point), then as you visit each point you just add the new point in and subtract the very-oldest point if it is further than your NEIGHBOURS back. You keep this "running sum" updated for every point and store this value divided by the NEIGHBOURS-number into the new point.
You make addition with point itself when you need to take neighbor points - just offset index by 1:
for(int j=0;j<NO_OF_NEIGHBOURS;j++)
{
a.at(i).x += a.at(i+j+1).x
a.at(i).y += a.at(i+j+1).y
}
This works fine for me:
for (i = 0; i < lenInput; i++)
{
float x = 0;
for (int j = -neighbours; j <= neighbours; j++)
{
x += input[(i + j <= 0) || (i + j >= lenInput) ? i : i + j];
}
output[i] = x / (neighbours * 2 + 1);
}