I have a 6*6 matrix called m1, and I want to use Do Loop in SAS to create matrices such that m2=m1*m1; m3=m2*m1; m4=m3*m1 ... mi=m(i-1)*m1.
Here is what I wrote:
proc iml;
use a;
read all into cat(m,1);
do i=2 to 10;
j=i-1;
cat(m,i)=cat(m,j)*cat(m,1);
print cat(m,i);
end;
quit;
And it won't work because cat(m,1) may not be correct. How can I use the Do Loop for this? Thank you very much for your time and help!
cat() is not going to work. It is a character function. It is not going to create a matrix named by the string output.
Why not just use the matrix power operator?
m2 = m1**2;
m3 = m1**3;
Unless you have big matrices, the time saved iterating the calculation instead of just using the power is next to 0.
For many iterative algorithms, you want to perform some computation on EACH matrix, but you don't need all matrices at the same time. For example, if you wanted to know the determinant of m, m##2, m##2, etc, you would write
result = j(10,1); /* store the 10 results */
m = I(nrow(a)); /* identity matrix */
do i = 1 to 10;
m = m*a; /* m is a##i during i_th iteration */
result[i] = det(m);
end;
print result;
If you actually need all 10 matrices at the same time in different matrices (this is rare), you can use the VALSET and VALUE functions as explained in this article: Indirect assignment: How to create and use matrices named x1, x2,..., xn
As an aside, you might also be interested in the trick of packing matries into an array by flattening them. It is sometimes a useful technique when you need to return k matrices from function module and k is a parameter to the module.
Related
I am using R bigmemory package and Rcpp to handle big matrices (1 to 10 Million column x 1000 rows). Once I read an interger matrix consisting in 0, 2 and NA into a filebacked bigmemory matrix in R I would like to modify through C++ all the NA values in order to do imputation of the mean values per column or an arbitrary-value-imputation (I show here the latter).
Below is the Rcpp function I have written and that does not work. My hope was that calling BigNA(mybigmatrix#address) from within R could find the elements in the matrix that are NAs and modify its values directly in the backing file.
I think the problem might be in the evaluation of std::isnan(mat[j][i]). I checked this by creating an alternative function that counts the NA values with an accumulator and indeed did not count any NA. But once this is solved, I am also not sure if the expression mat[j][i] = 1 would modify the value in the backing file. Writing those statements feels intuitive for me having an R background but might be wrong.
Any help/suggestion would be very much appreciated.
#include <stdio.h>
#include <Rcpp.h>
#include <bigmemory/MatrixAccessor.hpp>
#include <numeric>
// [[Rcpp::depends(BH, bigmemory)]]
// [[Rcpp::depends(Rcpp)]]
// [[Rcpp::export]]
void BigNA(SEXP pBigMat) {
/*
* Imputation of "NA" values for "1" in a big 0, 2 NA matrix.
*/
// Create the external bigmatrix pointer and iniciate matrix accessor
XPtr<BigMatrix> xpMat(pBigMat);
MatrixAccessor<int> mat = (*xpMat);
// Iterater over the elements in a matrix and when NA is found, substitute for "1"
for(int i=0; i< xpMat->ncol(); i++){
for(int j=0; j< xpMat->nrow(); j++){
if(std::isnan(mat[j][i])){
mat[j][i] = 1;
}
}
}
}
The problem stems from the difference between NA in R and NAN in C++.
MatrixAccessor<int> gives you an accessor for values of type int. Any number in R can be NA, but an int in C++ is never NAN. An optimizing compiler could completely ignore std::isnan(x) where x is of type int, as in your case.
To fix this, you could either:
Use MatrixAccessor<float> (or double). This implies actually storing a different data type.
Check what value you're actually getting for NA elements. I think you will find it is INT_MIN in C++ (-2147483648). Replace isnan(x) with x == INT_MIN.
Related: Extracting a column with NA's from a bigmemory object in Rcpp
Package bigmemory has some functions to check NAs.
Just add the header with #include <bigmemory/isna.hpp>.
And replace std::isnan(mat[j][i]) by isna(mat[j][i]).
Is there an efficient approach to only retain rows of an Armadillo sparse matrix that sum up to at least some level of total count across columns of the matrix? For instance, I would want to retain the ith row, if the sum of its values is >=C, where C is some chosen value. Armadillo's documentation says that only contiguous submatrix views are allowed with sparse matrices. So I am guessing this is not easily obtainable by sub-setting. Is there an alternative to plainly looping through elements and creating a new sparse matrix with new locations, values and colPtr settings that match the desired condition? Thanks!
It may well be that the fastest executing solution is the one you propose. If you want to take advantage of high-level armadillo functionality (i.e. faster to code but perhaps slower to run) you can build a std::vector of "bad" rows ids and then use shed_row(id). Take care with the indexing when shedding rows. This is accomplished here by always shedding from the bottom of the matrix.
auto mat = arma::sp_mat(rowind, colptr, values, n_rows, n_cols)
auto threshold_value = 0.01 * arma::accu(sp_mat); // Sum of all elements
std::vector<arma::uword> bad_ids; // The rows that we want to shed
auto row_sums = arma::sum(mat); // Row sums
// Iterate over rows in reverse order.
for (const arma::uword row_id = mat.nrows; i-- > 0; ) {
if (row_sum(row_id) < threshold_value) {
bad_ids.push_back(row_id);
}
}
// Shed the bad rows from the bottom of the matrix and up.
for (const auto &bad_id : bad_ids) {
matrix.shed_row(bad_id);
}
I have a x matrix with two columns(c1,c2). I want to fix the first column (c1), add 10 columns each have values C2+m, C2+m...C2+m to the X matrix, m is a random integer. finally the matrix going to be:
C1, C2+m, C2+m, C2+m...C2+m;
CODE:
proc iml;
use nonpar;
read all var{treat response} into x;
do i=1 to 10;
call randseed(123);
call randgen(u, "Uniform");
Max = 300; Min = 68;
m = min + floor( (1+Max-Min)*u );
x = x[,1]||x[,2]+m;
end;
quit;
Can someone help me fix that..
Thanks
Couple of things that should lead you in the right direction.
First, pre-create your full destination matrix; don't concatenate constantly. So, once your read the dataset into x, make another x_new that is the same number of rows as x but has 11 columns. j will do this for you.
Second, you can make all of your random numbers at once, but you have to initially define the size of the matrix to fill, again using j. This is assuming you want a new random integer for each of the 10 columns AND each of the rows; if you want just each of the rows or just one 'm' in total you need to do this differently, but you need to clarify that. If you just want one row of 10 m's, then you can do that first (generate a u that has 10 columns 1 row) then expand that to the full number of rows of x using matrix multiplication.
Here's a simplified example using SASHELP.CLASS showing these two concepts at work.
proc iml;
use sashelp.class;
read all var {age weight} into x;
x_new = j(nrow(x),11); *making the new matrix with lots of columns;
x_new[,1] = x[,1]; *copy in column 1;
call randseed(123);
u = j(nrow(x),10); *make the to be filled random matrix;
call randgen(u,'Uniform',68,300); *the min/max parameters can go here;
u = floor(u+0.5); *as Rick noted in comments, needed to get the max value properly;
x_new[,2:11] = u[,1:10] + x[,2]; *populate x_new here;
print x_new;
quit;
I have a large, dense matrix A, and I aim to find the solution to the linear system Ax=b using an iterative method (in MATLAB was the plan using its built in GMRES). For more than 10,000 rows, this is too much for my computer to store in memory, but I know that the entries in A are constructed by two known vectors x and y of length N and the entries satisfy:
A(i,j) = .5*(x[i]-x[j])^2+([y[i]-y[j])^2 * log(x[i]-x[j])^2+([y[i]-y[j]^2).
MATLAB's GMRES command accepts as input a function call that can compute the matrix vector product A*x, which allows me to handle larger matrices than I can store in memory. To write the matrix-vecotr product function, I first tried this in matlab by going row by row and using some vectorization, but I avoid spawning the entire array A (since it would be too large). This was fairly slow unfortnately in my application for GMRES. My plan was to write a mex file for MATLAB to, which is in C, and ideally should be significantly faster than the matlab code. I'm rather new to C, so this went rather poorly and my naive attempt at writing the code in C was slower than my partially vectorized attempt in Matlab.
#include <math.h>
#include "mex.h"
void Aproduct(double *x, double *ctrs_x, double *ctrs_y, double *b, mwSize n)
{
mwSize i;
mwSize j;
double val;
for (i=0; i<n; i++) {
for (j=0; j<i; j++) {
val = pow(ctrs_x[i]-ctrs_x[j],2)+pow(ctrs_y[i]-ctrs_y[j],2);
b[i] = b[i] + .5* val * log(val) * x[j];
}
for (j=i+1; j<n; j++) {
val = pow(ctrs_x[i]-ctrs_x[j],2)+pow(ctrs_y[i]-ctrs_y[j],2);
b[i] = b[i] + .5* val * log(val) * x[j];
}
}
}
The above is the computational portion of the code for the matlab mex file (which is slightly modified C, if I understand correctly). Please note that I skip the case i=j, since in that case the variable val will be a 0*log(0), which should be interpreted as 0 for me, so I just skip it.
Is there a more efficient or faster way to write this? When I call this C function via the mex file in matlab, it is quite slow, slower even than the matlab method I used. This surprises me since I suspected that C code should be much faster than matlab.
The alternative matlab method which is partially vectorized that I am comparing it with is
function Ax = Aprod(x,ctrs)
n = length(x);
Ax = zeros(n,1);
for j=1:(n-3)
v = .5*((ctrs(j,1)-ctrs(:,1)).^2+(ctrs(j,2)-ctrs(:,2)).^2).*log((ctrs(j,1)-ctrs(:,1)).^2+(ctrs(j,2)-ctrs(:,2)).^2);
v(j)=0;
Ax(j) = dot(v,x(1:n-3);
end
(the n-3 is because there is actually 3 extra components, but they are dealt with separately,so I excluded that code). This is partly vectorized and only needs one for loop, so it makes some sense that it is faster. However, I was hoping I could go even faster with C+mex file.
Any suggestions or help would be greatly appreciated! Thanks!
EDIT: I should be more clear. I am open to any faster method that can help me use GMRES to invert this matrix that I am interested in, which requires a faster way of doing the matrix vector product without explicitly loading the array into memory. Thanks!
If you have Parallel Computing Toolbox and MATLAB Distributed Computing Server, you can solve large dense linear systems using backslash directly. (If you don't have a cluster available to you, you might like to use Amazon EC2 machines). Like so: http://www.mathworks.co.uk/help/distcomp/examples/benchmarking-a-b.html
I am working on a binary linear program problem.
I am not really familiar with any computer language(just learned Java and C++ for a few months), but I may have to use computer anyway since the problem is quite complicated.
The first step is to declare variables m_ij for every entry in (at least 8 X 8) a matrix M.
Then I assign corresponding values of each element of a matrix to each of these variables.
The next is to generate other sets of variables, x_ij1, x_ij2, x_ij3, x_ij4, and x_ij5, whenever the value of m_ij is not 0.
The value of x_ijk variable is either 0 or 1, and I do not have to assign values for x_ijk variables.
Probably the simplest way to do it is to declare and assign a value to each variable, e.g.
int* m_11 = 5, int* m_12 = 2, int* m_13 = 0, ... int* m_1n = 1
int* m_21 = 3, int* m_12 = 1, int* m_13 = 2, ... int* m_2n = 3
and then pick variables, the value of which is not 0, and declare x_ij1 ~ x_ij5 accordingly.
But this might be too much work, especially since I am going to consider many different matrices for this problem.
Is there any way to do this automatically?
I know a little bit of Java and C++, and I am considering using lp_solve package in C++(to solve binary integer linear program problem), but I am willing to use any other language or program if I could do this easily.
I am sure there must be some way to do this(probably using loops, I guess?), and this is a very simple task, but I just don't know about it because I do not have much programming language.
One of my cohort wrote a program for generating a random matrix satisfying some condition we need, so if I could use that matrix as my input, it might be ideal, but just any way to do this would be okay as of now.
Say, if there is a way to do it with MS excel, like putting matrix entries to the cells in an excel file, and import it to C++ and automatically generate variables and assign values to them, then this would simplify the task by a great deal!
Matlab indeed seems very suitable for the task. Though the example offered by #Dr_Sam will indeed create the matrices on the fly, I would recommend you to initialize them before you assign the values. This way your code still ends up with the right variable if something with the same name already existed in the workspace and also your variable will always have the expected size.
Assuming you want to define a square 8x8 matrix:
m = zeros(8)
Now in general, if you want to initialize a three dimensional matrixh of size imax,jmax,kmax:
imax = 8;
jmax = 8;
kmax = 5;
x = zeros(imax,jmax,kmax);
Now assigning to or reading from these matrices is very easy, note that length and with of m have been chosen the same as the first dimensions of x:
m(3,4) = 4; %Assign a value
myvalue = m(3,4) %read the value
m(:,1) = 1:8 *Assign the values 1 through 8 to the first column
x(2,4,5) = 12; %Assign a single value to the three dimensional matrix
x(:,:,2) = m+1; Assign the entire matrix plus one to one of the planes in x.
In C++ you could use a std::vector of vectors, like
std::vector<std::vector<int>> matrix;
You don't need to use separate variables for the matrix values, why would you when you have the matrix?
I don't understand the reason you need to get all values where you evaluate true or false. Instead just put directly into a std::vector the coordinates where your condition evaluates to true:
std::vector<std::pair<int, int> true_values;
for (int i = 0; i < matrix.size(); i++)
{
for (int j = 0; j < matrix[i].size(); j++)
{
if (some_condition_for_this_matrix_value(matrix[i][j], i, j) == true)
true_values.emplace_back(std::make_pair(i, j));
}
}
Now you have a vector of all matrix coordinates where your condition is true.
If you really want to have both true and false values, you could use a std::unordered_map with a std::pair containing the matrix coordinates as key and bool as value:
// Create a type alias, as this type will be used multiple times
typedef std::map<std::pair<int, int>, bool> bool_map_type;
bool_map_type bool_map;
Insert into this map all values from the matrix, with the coordinates of the matrix as the key, and the map value as true or false depending on whatever condition you have.
To get a list of all entries from the bool_map you can remove any false entries with std::remove_if:
std::remove_if(bool_map.begin(), bool_map.end(),
[](const bool_map_type::value_type& value) {
return value.second == false;
};
Now you have a map containing only entries with their value as true. Iterate over this map to get the coordinates to the matrix
Of course, I may totally have misunderstood your problem, in which case you of course are free to disregard this answer. :)
I know both C++ and Matlab (not Python) and in your case, I would really go for Matlab because it's way easier to use when you start programming (but don't forget to come back to C++ when you will find the limitations to Matlab).
In Matlab, you can define matrices very easily: just type the name of the matrix and the index you want to set:
m(1,1) = 1
m(2,2) = 1
gives you a 2x2 identity matrix (indices start with 1 in Matlab and entries are 0 by default). You can also define 3d matrices the same way:
x(1,2,3) = 2
For the import from Excel, it is possible if you save your excel file in CSV format, you can use the function dlmread to read it in Matlab. You could also try later to implement your algorithm directly in Matlab.
Finally, if you want to solve your binary integer programm, there is already a built-in function in Matlab, called bintprog which can solve it for you.
Hope it helps!