How to select the 5 minimum values with SAS Proc IML? - sas

I would like to know if it's possible to select the 5 minimum or maximum values by rows with IML ?
This is my code :
Proc iml ;
use table;
read all var {&varlist} into matrix ;
n=nrow(matrix) ; /* n=369 here*/
p=ncol(matrix); /* p=38 here*/
test=J(n,5,.) ;
Do i=1 to n ;
test[i,1]=MIN(taux[i,]);
End;
Quit ;
So I would like to obtain a matrix test that contains for the 1rst column the maximal minimum value, then for the 2nd column the minimum value of my row EXCEPTING the 1rst value, etc...
If you have any idea ! :)
Event if it's not with IML (but with SAS : base, sql..)
So for example :
Data test; input x1-x10 ; cards;
1 9 8 7 3 4 2 6
9 3 2 1 4 7 12 -2
;run;
And I would like to obtain the results sorted by row:
1 2 3 4 6 7 8 9
-2 1 2 3 4 7 12
in order to select my 5 minimum values in another table :
y1 y2 y3 y4 y5
1 2 3 4 6
-2 1 2 3 4

Read the article "Compute the kth smallest data value in SAS"
Define the modules as in the article. Then use the following:
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
x = have`; /* transpose */
ord = j(5,ncol(x));
do j = 1 to ncol(x);
ord[,j] = ordinal(1:5, x[,j]);
end;
print ord;
If you have missing values in your data and want to exclude them, use the SMALLEST module instead of the ORDINAL module.

You can use call sort() in PROC IML to sort a column. Because you want to separate the columns and not sort the whole matrix, extract the column, sort it, and then update the original.
You want to sort rows, so transpose your matrix, do the sorting, and then transpose back.
proc iml;
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
print have;
n = nrow(have);
have = have`; /*Transpose because sort works on columns*/
do i=1 to n;
tmp = have[,i];
call sort(tmp,1);
have[,i]=tmp;
end;
have = have`;
want = have[,1:5];
print want;
quit;

Related

how to multiply each row with each row of another matrix elementwise in sas?

I have a row matrix (vector) A and another square matrix B. How can I multiply each row of matrix B with the row matrix A in SAS using proc iml or otherwise?
Let's say
a = {1 2 3}
b =
{2 3 4
1 5 3
5 9 10}
My output c would be:
{2 6 12
1 10 9
5 18 30}
Thanks!
Use the element-wise multiplication operator, # in IML:
proc iml;
a = {1 2 3};
b = {2 3 4,
1 5 3,
5 9 10};
c = a#b;
print c;
quit;
There's of course a non-IML solution, or twenty, though IML as Dom notes is probably easiest. Here's two.
First, get them onto one dataset, where the a dataset is on every row (with some other variable names) - see below. Then, either just do the math (use arrays) or use PROC MEANS or similar to use the a dataset as weights.
data a;
input w_x w_y w_z;
datalines;
1 2 3
;;;;
run;
data b;
input x y z;
id=_n_;
datalines;
2 3 4
1 5 3
5 9 10
;;;;
run;
data b_a;
if _n_=1 then set a;
set b;
*you could just multiply things here if you wanted;
run;
proc means data=b_a;
class id;
types id;
var x/weight=w_x;
var y/weight=w_y;
var z/weight=w_z;
output out=want sum=;
run;

SAS reverse count within ID group

I am used to creating count variables within a group where the count goes upwards +1 at each time using :
data objective ;
set eg ;
count + 1 ;
by id age ;
if first.age then count = 1 ;
run ;
However I would like to do the reverse, i.e. where the first value of age in each id group has a value of 10 and each subsequently line has a value of -1 that of the preceding line:
data eg ;
input id age desire ;
cards;
1 5 10
1 4 9
1 3 8
1 2 7
1 1 6
2 10 10
2 9 9
2 8 8
2 7 7
2 6 6
2 5 5
2 4 4
2 3 3
2 2 2
2 1 1
3 7 10
3 6 9
3 5 8
3 4 7
3 3 6
3 2 5
3 1 4
;
run;
data objective ;
set eg ;
count - 1 ;
by id age ;
if first.age_ar then count = 10 ;
run ;
Is there a way to do this as count-1 is not recognised.
You can add -1 without using retain as follows:
data objective;
set eg;
count + -1;
by id descending age;
if first.id then count = 10;
run;
Try this (see comments in code for explanation):
data objective ;
retain count 10; /*retain last countvalue for every observation, 10 is optional as initial value*/
set eg ;
count=count - 1 ; /*count -1 does not work, but count=count-1 with count as retainvariable*/
by id age notsorted;/*notsorted because age is ordered descending*/
if first.id then count = 10 ;/*not sure why you hade age_ar here, should be id to get your desired output*/
run ;
output:

Cumulative sum by row

For the following data set
data example;
input x y;
cards;
1 8
2 7
3 6
4 5
;
run;
I would like to create a new data set that has the following column added:
sum
10
9
7
4
10 comes from summing the first column (1+2+3+4), 9 comes from (2+3+4), and so on.
Consider a correlated aggregate subquery in with proc sql:
proc sql;
create table cumsum as
(select sup.x, sup.y,
(select sum(sub.x) from example as sub where sub.x >= sup.x) as sum
from example as sup);
quit;
x y sum
1 8 10
2 7 9
3 6 7
4 5 4
Alternatively, you can run a cumulative sum via a data step. Since you need a decreasing cumulative sum, sorting prior and after is needed:
proc sort data=example out=example;
by descending x;
run;
data cumsum2;
set example;
if first.x then sum = 0;
sum + x;
run;
proc sort data=cumsum2 out=cumsum2;
by x;
run;
x y sum
1 8 10
2 7 9
3 6 7
4 5 4

Inputting a row which number is in a variable

I have an array t that specifies numbers of rows that I want to read from file.txt. So my code should look like this:
data a;
do i = 1 to dim(t);
infile "C:\sas\file.txt" firstobs = t(i) obs = t(i);
input x1-x10;
output;
end;
run;
Of course this solution (firstobs) works only if the number of a column is a constant. How can I do this using an array (which is also read from the same file - from the first row)?
For example if the file.txt looked like this:
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
Then I want the output to be:
2 2 2 2 2 2 2 2 2 2
4 4 4 4 4 4 4 4 4 4
6 6 6 6 6 6 6 6 6 6
Here's an answer similar to Tom's, but which does not attempt to read in off-path data. This may be superior for cases where your skipped rows have data which are not formatted in the same manner as your on-path data. It uses Tom's parmcards and structure so you can more easily see the differences.
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
%let ncol=9 ;
%let maxrows=1000;
data want ;
infile tempdata truncover end=eof;
array rows (&maxrows) _temporary_;
do i=1 by 1 until (rows(i)=.); *read in first line, just like Toms answer;
input rows(i) #;
drop i;
end;
input ; * stop inputting on the first line;
* Here you may need to use CALL SORTN to sort row array if it is not already sorted;
_currow = 2; * current row indicator;
do _i = 1 to dim(rows); * iterate through row array;
if rows[_i]=. then leave; * leave if row array is empty;
do while (_currow lt rows[_i] and not eof); * skip rows not in row array;
input;
_currow = _currow + 1;
end;
input x1-x&ncol; * now you know you are on a desired row, so input it;
output; * and output it;
_currow = _currow + 1;
end;
run;
You may as I noted above have to use CALL SORTN, if the array is not already sorted (i.e., if the missings are not at the end and the numbers are out of order).
Sounds like the first row contains the list of rows to keep. It would probably be easier to read that from a separate file, but you could make it work with a single file. You did not mention how to know the number of columns of data or the maximum number of row numbers that could be in the first row. For now let's assume that you can set these numbers in macro variables.
Let's get your example data into a file:
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
Now let's read it into a dataset.
%let ncol=9 ;
%let maxrows=1000;
data want ;
infile tempdata truncover ;
array rows (&maxrows) _temporary_;
if _n_=1 then do i=1 by 1 until (rows(i)=.);
input rows(i) #;
drop i;
end;
else do;
input x1-x&ncol;
if whichn(_n_,of rows(*)) then output;
end;
run;
If the other rows of the file have invalid data such that the INPUT statement would cause errors you can skip trying to read the data from those rows with a minor modification in the ELSE block.
else do;
input #;
if whichn(_n_,of rows(*)) then do;
input x1-x&ncol;
output;
end;
end;
If you find that you frequently want to not read lots of records at the end of the file you could add this line to the end of the data step to stop when you have read past the last line you want.
if _n_ > max(of rows(*)) then stop;
If your file is structured (i.e. same delimiter/one continuous 'row' of input data ) then the approach below should work. I'm sure that you can tweak to make a bit more efficient but I put some comments in to explain what each section is doing. I also suggest reading through the infile documentation for an explanation of the _infile_ automatic variable and other ways to manipulate the input data buffer. Also, if your input data file needs split up into individual rows itself then you will need to adjust for that.
filename in_data 'C:\sas\file.txt';
data out_data (keep=x1-x10);
infile in_data;
input fn;
/*get the number of vars based on delimiter*/
count = count(strip(_infile_), ' ') + 1;
/*iterate through vars*/
do i =1 to count;
/*set new value to current var*/
rec = scan(strip(_infile_), i, ' ');
/*set array values to new value*/
array obs(10) x1-x10;
do j=1 to dim(obs);
obs(j) = rec;
end;
/*output to dataset*/
output out_data;
end;
run;
Input
2 4 6 7 8 9 10 11 2 3
Output
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
2 2 2 2 2 2 2 2 2 2
4 4 4 4 4 4 4 4 4 4
6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9
10 10 10 10 10 10 10 10 10 10
11 11 11 11 11 11 11 11 11 11
2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3
Hope that this helps.
OK, I figured it out. Assuming I know the number of columns (10) and number of rows (10) I can get what I wanted using the following code:
data a;
w=1;
infile "C:\sas\file.txt" n=10;
input #w x1-x10;
array x(*) x1-x10;
array t(10) _temporary_;
do i=1 to 10;
if(x(i)^=.) then t(i)=x(i);
else leave;
end;
do j=1 to i-1;
w=t(j);
input #w x1-x10;
output;
end;
stop;
run;
What is left is to do the same without knowing numbers of rows and columns. This way I only read the rows I'm interested in as opposed to reading all rows and only outputting the ones I need.
It would probably be a lot easier program to maintain if you just read the whole matrix into a dataset and then used the row numbers to pick the data you want. Your file would probably need to have hundreds of thousands of observations for the time saved to be worth the programming effort to avoid reading the full file.
Here is one way using the POINT= option of the SET statement to select the rows.
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
data rows;
infile tempdata obs=1 ;
input row ##;
row=row-1;
run;
proc import datafile="%sysfunc(pathname(tempdata))" dbms=dlm out=full replace;
getnames=no;
delimiter=' ';
datarow=2;
run;
data want ;
set rows ;
pointer=row ;
set full point=pointer ;
run;
proc print; run;

Aggregating Using Proc SQL

Suppose I've a dataset in the form:
A B C
1 3 5
1 4 8
1 3 3
2 2 2
2 7 6
2 3 3
3 4 4
3 4 7
3 2 8
Now, I want to take weighted average of each segment of A and then add them up over A. For example in A var for 1, I want to take the weighted avg as (3*5+4*8+3*3)/(3+4+3). And then add up to get 5.6. Same with other 2 segments of A. So, finally the table looks like the following:
A B C D
1 3 7 5.6
2 6 6 7
3 5 9 8.2
Thank you.
Just to provide an alternative approach, you can use the WEIGHT statement in PROC SUMMARY to achieve the same result. The only thing I'm not clear on from your example final table table is where the values of columns B & C come from (I've left these out of my solution below).
proc summary data=test nway;
class a;
var c / weight=b;
output out=agg2 (drop=_:) mean=d;
run;
You can find the solution below. I am curious about your result. For A=2, the weighted average should be (2*2+7*6+3*3)/(2+7+3), about 4.5. Why here you have 7?
data test;
input a b c ;
datalines;
1 3 5
1 4 8
1 3 3
2 2 2
2 7 6
2 3 3
3 4 4
3 4 7
3 2 8
;
run;
proc sql;
create table agg as
select a, b, c, sum(b*c)/sum(b) as d from test
group by a;
quit;
proc sort data=agg nodupkey;
by a d;
run;