SAS- Combining multiple variables into one by choosing the maximum value - sas

Combining multiple variables into one by choosing the maximum value
id v1 v2 v3 v4 v5 v6
1 1 2 5 3 1 1
2 4 2 3 5 1
3 3 2 2 1 3
4 2 1 2 5 7
5 6 7 1 2 1 7
into n1=max(v1,v2), n2=v3, n3=max(v4,v5,v6)
id n1 n2 n3
1 2 5 3
2 4 3 5
3 3 2 3
4 2 2 7
5 7 1 7
How do I do this in SAS? (It's so easy in excel.. It's relatively intuitive in R.. But I can't figure it out in SAS! Please help!)
Thank you for your time!

MAX function is your friend.
data want;
set have;
n1 = max(of v1 v2);
n2 = v3;
n3 = max(of v4 v5 v6);
run;
Arrays and variable lists also work (such as, n3 = max(of v4-v6);).

I agree that the MAX function is what you want, but I would code it differently.
data want;
set have;
n1 = max(v1, v2);
n2 = v3;
n3 = max(v4, v5, v6);
run;
Alternatively:
data want;
set have;
n1 = max(v1, v2);
n2 = v3;
n3 = max(of v4-v6);
run;

Related

Subset data by group by proportion in SAS

In this data, I need to subset by each variable by certain percentage.
For example,
Obs Group Score
1 A 1
2 A 2
3 B 1
4 B 1
5 C 3
6 C 1
7 C 1
8 A 1
9 A 3
10 A 1
11 A 2
12 B 3
13 C 2
I would need to subset 10 obs.
The sample must consist of all groups, and score of 1 takes higher priority.
Each group is given certain percent.
Let say 50% for A, 20% for B and 30% for C.
I tried using proc surveyselect but it failed. The number of alloc is not same as the strata.
proc surveyselect data=example out=test sampsize=10;
strata group score/alloc=(0.5 0.2 0.3);
run;
I don't know proc surveyselect too much, so I give the data step version.
data have;
input Obs Group$ Score;
cards;
1 A 1
2 A 2
3 B 1
4 B 1
5 C 3
6 C 1
7 C 1
8 A 1
9 A 3
10 A 1
11 A 2
12 B 3
13 C 2
;
run;
proc sort;
by Group Score;
run;
data want;
array _Dist_[3]$ _temporary_('A','B','C');
array _Upper_[3] _temporary_(5,2,3);
array _Count_[3] _temporary_;
do i = 1 to rec;
set have nobs=rec point=i;
do j = 1 to dim(_Dist_);
_Count_[j] + (Group=_Dist_[j]);
if _Count_[j] <= _Upper_[j] and Group = _Dist_[j] then output;
end;
end;
stop;
drop j;
run;

SAS reverse count within ID group

I am used to creating count variables within a group where the count goes upwards +1 at each time using :
data objective ;
set eg ;
count + 1 ;
by id age ;
if first.age then count = 1 ;
run ;
However I would like to do the reverse, i.e. where the first value of age in each id group has a value of 10 and each subsequently line has a value of -1 that of the preceding line:
data eg ;
input id age desire ;
cards;
1 5 10
1 4 9
1 3 8
1 2 7
1 1 6
2 10 10
2 9 9
2 8 8
2 7 7
2 6 6
2 5 5
2 4 4
2 3 3
2 2 2
2 1 1
3 7 10
3 6 9
3 5 8
3 4 7
3 3 6
3 2 5
3 1 4
;
run;
data objective ;
set eg ;
count - 1 ;
by id age ;
if first.age_ar then count = 10 ;
run ;
Is there a way to do this as count-1 is not recognised.
You can add -1 without using retain as follows:
data objective;
set eg;
count + -1;
by id descending age;
if first.id then count = 10;
run;
Try this (see comments in code for explanation):
data objective ;
retain count 10; /*retain last countvalue for every observation, 10 is optional as initial value*/
set eg ;
count=count - 1 ; /*count -1 does not work, but count=count-1 with count as retainvariable*/
by id age notsorted;/*notsorted because age is ordered descending*/
if first.id then count = 10 ;/*not sure why you hade age_ar here, should be id to get your desired output*/
run ;
output:

Inputting a row which number is in a variable

I have an array t that specifies numbers of rows that I want to read from file.txt. So my code should look like this:
data a;
do i = 1 to dim(t);
infile "C:\sas\file.txt" firstobs = t(i) obs = t(i);
input x1-x10;
output;
end;
run;
Of course this solution (firstobs) works only if the number of a column is a constant. How can I do this using an array (which is also read from the same file - from the first row)?
For example if the file.txt looked like this:
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
Then I want the output to be:
2 2 2 2 2 2 2 2 2 2
4 4 4 4 4 4 4 4 4 4
6 6 6 6 6 6 6 6 6 6
Here's an answer similar to Tom's, but which does not attempt to read in off-path data. This may be superior for cases where your skipped rows have data which are not formatted in the same manner as your on-path data. It uses Tom's parmcards and structure so you can more easily see the differences.
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
%let ncol=9 ;
%let maxrows=1000;
data want ;
infile tempdata truncover end=eof;
array rows (&maxrows) _temporary_;
do i=1 by 1 until (rows(i)=.); *read in first line, just like Toms answer;
input rows(i) #;
drop i;
end;
input ; * stop inputting on the first line;
* Here you may need to use CALL SORTN to sort row array if it is not already sorted;
_currow = 2; * current row indicator;
do _i = 1 to dim(rows); * iterate through row array;
if rows[_i]=. then leave; * leave if row array is empty;
do while (_currow lt rows[_i] and not eof); * skip rows not in row array;
input;
_currow = _currow + 1;
end;
input x1-x&ncol; * now you know you are on a desired row, so input it;
output; * and output it;
_currow = _currow + 1;
end;
run;
You may as I noted above have to use CALL SORTN, if the array is not already sorted (i.e., if the missings are not at the end and the numbers are out of order).
Sounds like the first row contains the list of rows to keep. It would probably be easier to read that from a separate file, but you could make it work with a single file. You did not mention how to know the number of columns of data or the maximum number of row numbers that could be in the first row. For now let's assume that you can set these numbers in macro variables.
Let's get your example data into a file:
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6 . . . . . . .
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
Now let's read it into a dataset.
%let ncol=9 ;
%let maxrows=1000;
data want ;
infile tempdata truncover ;
array rows (&maxrows) _temporary_;
if _n_=1 then do i=1 by 1 until (rows(i)=.);
input rows(i) #;
drop i;
end;
else do;
input x1-x&ncol;
if whichn(_n_,of rows(*)) then output;
end;
run;
If the other rows of the file have invalid data such that the INPUT statement would cause errors you can skip trying to read the data from those rows with a minor modification in the ELSE block.
else do;
input #;
if whichn(_n_,of rows(*)) then do;
input x1-x&ncol;
output;
end;
end;
If you find that you frequently want to not read lots of records at the end of the file you could add this line to the end of the data step to stop when you have read past the last line you want.
if _n_ > max(of rows(*)) then stop;
If your file is structured (i.e. same delimiter/one continuous 'row' of input data ) then the approach below should work. I'm sure that you can tweak to make a bit more efficient but I put some comments in to explain what each section is doing. I also suggest reading through the infile documentation for an explanation of the _infile_ automatic variable and other ways to manipulate the input data buffer. Also, if your input data file needs split up into individual rows itself then you will need to adjust for that.
filename in_data 'C:\sas\file.txt';
data out_data (keep=x1-x10);
infile in_data;
input fn;
/*get the number of vars based on delimiter*/
count = count(strip(_infile_), ' ') + 1;
/*iterate through vars*/
do i =1 to count;
/*set new value to current var*/
rec = scan(strip(_infile_), i, ' ');
/*set array values to new value*/
array obs(10) x1-x10;
do j=1 to dim(obs);
obs(j) = rec;
end;
/*output to dataset*/
output out_data;
end;
run;
Input
2 4 6 7 8 9 10 11 2 3
Output
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
2 2 2 2 2 2 2 2 2 2
4 4 4 4 4 4 4 4 4 4
6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9
10 10 10 10 10 10 10 10 10 10
11 11 11 11 11 11 11 11 11 11
2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3
Hope that this helps.
OK, I figured it out. Assuming I know the number of columns (10) and number of rows (10) I can get what I wanted using the following code:
data a;
w=1;
infile "C:\sas\file.txt" n=10;
input #w x1-x10;
array x(*) x1-x10;
array t(10) _temporary_;
do i=1 to 10;
if(x(i)^=.) then t(i)=x(i);
else leave;
end;
do j=1 to i-1;
w=t(j);
input #w x1-x10;
output;
end;
stop;
run;
What is left is to do the same without knowing numbers of rows and columns. This way I only read the rows I'm interested in as opposed to reading all rows and only outputting the ones I need.
It would probably be a lot easier program to maintain if you just read the whole matrix into a dataset and then used the row numbers to pick the data you want. Your file would probably need to have hundreds of thousands of observations for the time saved to be worth the programming effort to avoid reading the full file.
Here is one way using the POINT= option of the SET statement to select the rows.
options parmcards=tempdata ;
filename tempdata temp;
parmcards;
2 4 6
2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6
;
data rows;
infile tempdata obs=1 ;
input row ##;
row=row-1;
run;
proc import datafile="%sysfunc(pathname(tempdata))" dbms=dlm out=full replace;
getnames=no;
delimiter=' ';
datarow=2;
run;
data want ;
set rows ;
pointer=row ;
set full point=pointer ;
run;
proc print; run;

How to select the 5 minimum values with SAS Proc IML?

I would like to know if it's possible to select the 5 minimum or maximum values by rows with IML ?
This is my code :
Proc iml ;
use table;
read all var {&varlist} into matrix ;
n=nrow(matrix) ; /* n=369 here*/
p=ncol(matrix); /* p=38 here*/
test=J(n,5,.) ;
Do i=1 to n ;
test[i,1]=MIN(taux[i,]);
End;
Quit ;
So I would like to obtain a matrix test that contains for the 1rst column the maximal minimum value, then for the 2nd column the minimum value of my row EXCEPTING the 1rst value, etc...
If you have any idea ! :)
Event if it's not with IML (but with SAS : base, sql..)
So for example :
Data test; input x1-x10 ; cards;
1 9 8 7 3 4 2 6
9 3 2 1 4 7 12 -2
;run;
And I would like to obtain the results sorted by row:
1 2 3 4 6 7 8 9
-2 1 2 3 4 7 12
in order to select my 5 minimum values in another table :
y1 y2 y3 y4 y5
1 2 3 4 6
-2 1 2 3 4
Read the article "Compute the kth smallest data value in SAS"
Define the modules as in the article. Then use the following:
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
x = have`; /* transpose */
ord = j(5,ncol(x));
do j = 1 to ncol(x);
ord[,j] = ordinal(1:5, x[,j]);
end;
print ord;
If you have missing values in your data and want to exclude them, use the SMALLEST module instead of the ORDINAL module.
You can use call sort() in PROC IML to sort a column. Because you want to separate the columns and not sort the whole matrix, extract the column, sort it, and then update the original.
You want to sort rows, so transpose your matrix, do the sorting, and then transpose back.
proc iml;
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
print have;
n = nrow(have);
have = have`; /*Transpose because sort works on columns*/
do i=1 to n;
tmp = have[,i];
call sort(tmp,1);
have[,i]=tmp;
end;
have = have`;
want = have[,1:5];
print want;
quit;

Counting values to get a matrix in Stata

I have a variable age, 13 variables x1 to x13, and 802 observations in a Stata dataset. age has values ranging 1 to 9. x1 to x13 have values ranging 1 to 13.
I want to know how to count the number of 1 .. 13 in x1 to x13 according to different values of age. For example, for age 1, in x1 to x13, count the number of 1,2,3,4,...13.
I first change x1 to x13 as a matrix by using
mkmat x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13, matrix (a)
Then, I want to count using the following loop:
gen count = 0
quietly forval i = 1/802 {
quietly forval j = 1/13 {
replace count = count + inrange(a[r'i', x'j'], 0, 1), if age==1
}
}
I failed.
I am still somewhat uncertain as to what you like to achieve. But if I am understanding you correctly, here is one way to do it.
First, a simple data that has age ranging from one to three, and four variables x1-x4, each with values of integers ranging between 5 and 7.
clear
input age x1 x2 x3 x4
1 5 6 6 6
1 7 5 6 5
2 5 7 6 6
3 5 6 7 7
3 7 6 6 6
end
Then we create three count variables (n5, n6 and n7) that counts the number of 5s, 6s, and 7s for each subject across x1-x4.
forval i=5/7 {
egen n`i'=anycount(x1 x2 x3 x4),v(`i')
}
Below is how the data looks like now. To explain, the first "1" under n5 indicates that there is only one "5" for the subject across x1-x4.
+----------------------------------------+
| age x1 x2 x3 x4 n5 n6 n7 |
|----------------------------------------|
1. | 1 5 6 6 6 1 3 0 |
2. | 1 7 5 6 5 2 1 1 |
3. | 2 5 7 6 6 1 2 1 |
4. | 3 5 6 7 7 1 1 2 |
5. | 3 7 6 6 6 0 3 1 |
+----------------------------------------+
It sounds to me like your ultimate goal is to have sums calculated separately for each value in age. Assuming this is true, let's create a 3x3 matrix to store such results.
mat A=J(3,3,.) // age (1-3) and values (5-7)
mat rown A=age1 age2 age3
mat coln A=value5 value6 value7
forval i=5/7 {
forval j=1/3 {
qui su n`i' if age==`j'
loca k=`i'-4 // the first column for value5
mat A[`j',`k']=r(sum)
}
}
The matrix looks like this. To explain, the first "3" under value5 indicates that for all children of the age of 1, the value 5 appears a total of three times across x1-x4
A[3,3]
value5 value6 value7
age1 3 4 1
age2 1 2 1
age3 1 4 3
With Aspen's example, you could do this:
gen id = _n
reshape long x, i(id)
tab age x
Note that your sample code doesn't loop over different ages and there is an incorrect comma in the count command. I won't try to fix the code, as there are many more direct methods, one of which is above. tabulate has an option to save the table as a matrix.
Here is another solution closer to the original idea. Warning: code not tested.
matrix count = J(9, 13, 0)
forval i = 1/9 {
forval j = 1/13 {
forval J = 1/13 {
qui count if age == `i' & x`J' == `j'
matrix count[`i', `j'] = count[`i', `j'] + r(N)
}
}
}