Populate the matrix proc iml SAS - sas

I have a x matrix with two columns(c1,c2). I want to fix the first column (c1), add 10 columns each have values C2+m, C2+m...C2+m to the X matrix, m is a random integer. finally the matrix going to be:
C1, C2+m, C2+m, C2+m...C2+m;
CODE:
proc iml;
use nonpar;
read all var{treat response} into x;
do i=1 to 10;
call randseed(123);
call randgen(u, "Uniform");
Max = 300; Min = 68;
m = min + floor( (1+Max-Min)*u );
x = x[,1]||x[,2]+m;
end;
quit;
Can someone help me fix that..
Thanks

Couple of things that should lead you in the right direction.
First, pre-create your full destination matrix; don't concatenate constantly. So, once your read the dataset into x, make another x_new that is the same number of rows as x but has 11 columns. j will do this for you.
Second, you can make all of your random numbers at once, but you have to initially define the size of the matrix to fill, again using j. This is assuming you want a new random integer for each of the 10 columns AND each of the rows; if you want just each of the rows or just one 'm' in total you need to do this differently, but you need to clarify that. If you just want one row of 10 m's, then you can do that first (generate a u that has 10 columns 1 row) then expand that to the full number of rows of x using matrix multiplication.
Here's a simplified example using SASHELP.CLASS showing these two concepts at work.
proc iml;
use sashelp.class;
read all var {age weight} into x;
x_new = j(nrow(x),11); *making the new matrix with lots of columns;
x_new[,1] = x[,1]; *copy in column 1;
call randseed(123);
u = j(nrow(x),10); *make the to be filled random matrix;
call randgen(u,'Uniform',68,300); *the min/max parameters can go here;
u = floor(u+0.5); *as Rick noted in comments, needed to get the max value properly;
x_new[,2:11] = u[,1:10] + x[,2]; *populate x_new here;
print x_new;
quit;

Related

Sum function in SAS

I am trying to create a small function to do the follow:
c = (a/3)*(1/((1+x)*2)+1/((1+x)*4)+1/((1+x)*8))+1/((1+x)*8)
Input variables are a and x. As you can see, the number to multiply by is a multiple of 2 (up to 8).
My main difficult is in the recursive sum. I know that SAS has a SUM function, but I am wondering how to use it in this exercise. I thought the parameter could be (1/((1+x)*n*2) (where n is a number).
Help is welcome. Thanks.
You can code a loop to iterate the pieces to accumulate. Something like:
data want;
a = 2;
x = 7;
steps = 4;
S = 0;
do index = 1 to steps;
S = sum (S, 1/(1+x)*index*2);
end;
c = a/3 * S;
run;

How to drop rows of an SpMat<unsigned int> in Armadillo based on a condition on row totals?

Is there an efficient approach to only retain rows of an Armadillo sparse matrix that sum up to at least some level of total count across columns of the matrix? For instance, I would want to retain the ith row, if the sum of its values is >=C, where C is some chosen value. Armadillo's documentation says that only contiguous submatrix views are allowed with sparse matrices. So I am guessing this is not easily obtainable by sub-setting. Is there an alternative to plainly looping through elements and creating a new sparse matrix with new locations, values and colPtr settings that match the desired condition? Thanks!
It may well be that the fastest executing solution is the one you propose. If you want to take advantage of high-level armadillo functionality (i.e. faster to code but perhaps slower to run) you can build a std::vector of "bad" rows ids and then use shed_row(id). Take care with the indexing when shedding rows. This is accomplished here by always shedding from the bottom of the matrix.
auto mat = arma::sp_mat(rowind, colptr, values, n_rows, n_cols)
auto threshold_value = 0.01 * arma::accu(sp_mat); // Sum of all elements
std::vector<arma::uword> bad_ids; // The rows that we want to shed
auto row_sums = arma::sum(mat); // Row sums
// Iterate over rows in reverse order.
for (const arma::uword row_id = mat.nrows; i-- > 0; ) {
if (row_sum(row_id) < threshold_value) {
bad_ids.push_back(row_id);
}
}
// Shed the bad rows from the bottom of the matrix and up.
for (const auto &bad_id : bad_ids) {
matrix.shed_row(bad_id);
}

Do Loop for Matrix Multiplication in SAS

I have a 6*6 matrix called m1, and I want to use Do Loop in SAS to create matrices such that m2=m1*m1; m3=m2*m1; m4=m3*m1 ... mi=m(i-1)*m1.
Here is what I wrote:
proc iml;
use a;
read all into cat(m,1);
do i=2 to 10;
j=i-1;
cat(m,i)=cat(m,j)*cat(m,1);
print cat(m,i);
end;
quit;
And it won't work because cat(m,1) may not be correct. How can I use the Do Loop for this? Thank you very much for your time and help!
cat() is not going to work. It is a character function. It is not going to create a matrix named by the string output.
Why not just use the matrix power operator?
m2 = m1**2;
m3 = m1**3;
Unless you have big matrices, the time saved iterating the calculation instead of just using the power is next to 0.
For many iterative algorithms, you want to perform some computation on EACH matrix, but you don't need all matrices at the same time. For example, if you wanted to know the determinant of m, m##2, m##2, etc, you would write
result = j(10,1); /* store the 10 results */
m = I(nrow(a)); /* identity matrix */
do i = 1 to 10;
m = m*a; /* m is a##i during i_th iteration */
result[i] = det(m);
end;
print result;
If you actually need all 10 matrices at the same time in different matrices (this is rare), you can use the VALSET and VALUE functions as explained in this article: Indirect assignment: How to create and use matrices named x1, x2,..., xn
As an aside, you might also be interested in the trick of packing matries into an array by flattening them. It is sometimes a useful technique when you need to return k matrices from function module and k is a parameter to the module.

How to output a random set of observations from a SAS data set

I have a data set that selects random numbers from a uniform distribution. How do you only output those row indices? I basically want to select a random set of rows from a SAS data set.
data Unif(keep=u x k n m);
call streaminit(123);
a = -1; b = 1;
Min = 1; Max = 28000000;
do i = 1 to &NObs;
u = rand("Uniform"); /* U[0,1] */
x = a + (b-a)*u; /* U[a,b] */
k = ceil( Max*u ); /* uniform integer in 1..Max */
n = floor( (1+Max)*u ); /* uniform integer in 0..Max */
m = min + floor((1+Max-Min)*u); /* uniform integer in Min..Max */
output;
end;
keep k
run;
*not sure about this part;
data final;
set final;
where obs in (k);
run;
The best way to do this is to use PROC SURVEYSELECT.
proc surveyselect data=final out=selected seed=123 n=10;
run;
Or something along those lines depending on how you want to run it - the documentation has a lot of detail on the various options for how to perform the sampling.
If you want to do it in the datastep, you need to be running the code from Unif inside the second datastep, in some fashion. I don't entirely follow what it's trying to do; if that's a form of k/n sampling, search 'SAS k/n sampling' and you'll find lots out there as it's a common question, but the general approach is
data final_selected;
set final;
... code to determine if it should be selected...
if (condition); *subsetting if;
run;

ranking within rows of a data set [sas]

Suppose I've got a data set with n rows and p columns such that each entry in the data set contains a real number. I am looking for a way to rank the p columns within each row. The output of this ranking should be a length-p vector of ranks that accounts for ties.
So, let's say my data set has 5 columns. The first row could be something like row 1 = {10, 13, 3, 3, -4}. I'd like to perform some operations on this row and in the end get back the result row 1 ranks = {3, 4, 2, 2, 1}. The second row could be something like row 2 = {8, 3, -6, 5, 2} and the result on this row should be row 2 ranks = {5, 3, 1, 4, 2}.
Is this functionality implemented in SAS? I've generated code that doesn't account for ties, but they occur often enough that it would take an unreasonable amount of time to correct the row rankings that were done incorrectly.
Interesting question; here is one possible solution:
data have;
p1=10; p2=13; p3=3; p4=3; p5=-4; output;
p1=8; p2=3; p3=-6; p4=5; p5=2; output;
run;
data want;
set have;
array p(*) p1-p5;
array c(*) c1-c5;
array r(*) r1-r5;
/* Copy vector to temp array and sort */
do i=1 to dim(p);
c(i) = p(i);
end;
call sortn(of c(*));
/* Search new sorted array for the original position */
do i=1 to dim(c);
if i = 1 then rank=1;
else if c(i) ne c(i-1) then rank + 1;
do j=1 to dim(p);
if p(j) = c(i) then do;
r(j) = rank;
end;
end;
end;
/* PUT statement to see result in log */
put +3 p(*)
/ +3 c(*)
/ +3 r(*);
drop i j rank c1-c5;
run;
Sounds to me like you'll need several arrays to do this.
Array 1: Array to store the ranks
Array 2: Array to sort the values
Array 3: The original un-altered data
I don't have time right now to write the code but using someething like this would do a lot of the heavy lifting:
http://support.sas.com/kb/24/754.html
Might as well add this even though OP said he doesn't use IML in case others find this useful searching for it. IML is really the easiest way to solve this problem, since its fundamentally a vector/matrix problem...
proc iml;
p={10 13 3 3 -4, 5 6 5 2 3};
r=j(2,5,.);
print p r;
do i = 1 to nrow(p);
r[i,]=ranktie(p[i,]);
end;
print p r;
quit;
It does treat tries slightly differently from the OP, and thus would need some work to make it exactly like the solution requested - but in general, 1,2.5,2.5,4,5 [or 1,2,2,4,5] is probably what you really want, not 1,2,2,3,4. 4 and 5 should stay 4 and 5, not move up to 3 and 4, when 2 and 3 tie.
Just for fun, given the OP's answer to wanting a new dataset with ranks, here's the PROC RANK method. Probably not faster than a data step, but perhaps simpler and easier to use in multiple situations, and with the added advantage that you can't really make a mistake in the coding (without it actually crashing).
data have;
input id x1-x5;
datalines;
1 10 13 3 3 -4
2 5 6 5 2 3
;;;;
run;
proc transpose data=have out=temp;
by id;
var x1-x5;
run;
proc rank data=temp out=temprank;
var col1;
by id;
run;
proc transpose data=temprank out=want(drop=_name_ _label_);
by id;
var col1;
id _name_;
run;