How to add two columns with specified condition in SAS? - sas

How to add two columns with specified condition in SAS?
Here I want to add the column data that is specified in first 2 columns.
data data1;
set data;
column1 = x;
column2 = y;
total = column1 + column2;
run;
input
x y a b c d e
-- -- -- -- -- -- --
a c 1 45 32 7 45
b a 22 45 55 33 55
d e 56 78 66 44 12
c d 33 45 44 56 77
Output
x y a b c d e Output
-- -- -- -- -- -- -- ------
a c 1 45 32 7 45 33
b a 22 45 55 33 55 67
d e 56 78 66 44 12 56
c d 33 45 44 56 77 100

Here's a simpler solution. Create an array of the possible numbers to add, loop through them and check if the variable name is the same as the value in x or y, using the vname function.
data have;
input x $ y $ a b c d e;
datalines;
a c 1 45 32 7 45
b a 22 45 55 33 55
d e 56 78 66 44 12
c d 33 45 44 56 77
;
run;
data want;
set have;
array vars{*} a--e;
output=0;
do i = 1 to dim(vars);
if x=vname(vars{i}) then output+vars{i};
if y=vname(vars{i}) then output+vars{i};
end;
drop i;
run;

I'm sure there's a cleaner way of doing this, but this works:
data data;
input
x $ y $ a b c d e;
datalines;
a c 1 45 32 7 45
b a 22 45 55 33 55
d e 56 78 66 44 12
c d 33 45 44 56 77
;
run;
data data1;
set data;
count = _n_;
run;
proc sql noprint;
select count into: countlist
separated by " " from data1;
quit;
proc sql noprint;
select x into: xlist
separated by " " from data1;
quit;
proc sql noprint;
select y into: ylist
separated by " " from data1;
quit;
%macro add;
data data2 (drop = count);
set data1;
%do i = 1 %to %sysfunc(countw(&countlist.));
%let this_x = %scan(&xlist., &i.);
%let this_y = %scan(&ylist., &i.);
if &i. = count then total = &this_x. + &this_y.;
%end;
run;
%mend add;
%add;

Related

plotting a vertical line exactly in the middle and at specific value of X in SAS

For the below x, y data, how to plot vertical line exactly in the middle of the plot (please note, X should not be ordered and plot be as it is).
Also how to plot a vertical line at x=5 (a distance on X from X=0) when X in the data below is taken as 0, 1, 2, 3.. and so forth.
data sample;
infile cards truncover expandtabs;
input X Y;
cards;
29 21
18 23
28 24
16 26
3 27
18 29
2 33
3 37
26 39
2 42
25 47
9 54
13 57
17 58
29 60
5 63
23 66
4 69
3 72
17 73
7 73
12 72
8 69
20 66
12 63
8 60
28 58
3 57
18 54
11 47
21 42
8 39
1 37
16 29
3 27
17 22
3 19
6 17
19 14
18 10
;
run;
I tried:
proc sort data=sample ;
by x;
run;
proc sgplot data=sample;
needle x=x y=y;
run;
data Trapezoidal;
set sample end=last;
dif_x=dif(x);
mean_y=mean(lag(y),y);
integral + (dif_x*mean_y);
if last then putlog 'area under curve is ' integral;
run;
Vertical lines are plotted with refline. Determine what the 'middle' is however you wish (using proc means or proc sql or similar), get it into a variable in your dataset or a macro variable, and use refline in proc sgplot to produce the line.
Same applies for your specific-values-of-x (except you don't actually need to do anything there to produce the value to plot at). Add refline x=5; or similar to your plots to get them.
You could also use band plots if you're trying to highlight certain areas.

Alternative coding than lag function?

I would like to have a more efficient way perform this imputation. What I want is the value of variable CID to be copied on the row where CID is missing. For instace having CID=1818 reported also for dates 1,14,28 and 42.
The program that I wrote works fine but I would like to know if there is another more simple way to perform this action. Note that here RETAIN can't be used.
DATA test;
infile cards dlm='' dsd ;
input cid $ #6 days $ #9 CH #13 CL ;
cards;
1818 -2 117 46
1 107 45
14 97 46
28 104 46
42 106 44
5684 -2 100 62
1 58 78
14 87 46
28 102 45
42 155 41
;
RUN;
options mprint mlogic symbolgen;
%macro lag(var,num);
%do i=2 %to &num.;
sub&i.=lag&i.(&var);
if cid=' ' then cid=sub&i.;
/*drop sub&i.;*/
%end;
%mend lag;
data test_1 ;set test;
sub=lag(cid);
if cid=' ' then cid=sub;
%lag(cid,5);
run;
Why can't Retain be used?
data want;
set test;
retain _cid;
if not missing(cid) then _cid=cid;
else cid=_cid;
drop _cid;
run;

SAS PROC GLM in a loop

I tried to run PROC GLM in a loop, because I have many models (different combinations of dependent and independent variables), and it’s very time consuming to run them one by one. But log error indicates only one MODEL statement allowed in PROC GLM, so any solutions for this?
my code looks like as below
data old;
input year A1 A2 A3 A4 B C D;
datalines;
2000 22 22 30 37 4 13 14
2000 37 29 31 38 6 16 12
2000 42 29 34 37 3 15 15
2000 28 28 27 35 10 13 15
2000 33 22 37 40 9 12 15
2000 22 29 26 40 3 11 15
2000 37 20 32 40 6 12 13
2001 44 22 33 35 7 20 12
2001 33 20 26 40 6 13 15
2001 32 30 37 35 1 12 13
2001 44 25 31 39 4 20 14
2001 25 30 37 38 4 20 10
2001 43 21 35 38 6 11 10
2001 25 23 34 37 5 17 11
2001 45 30 35 37 1 13 14
2001 48 24 36 39 2 13 15
2001 25 24 35 40 9 16 11
2002 38 26 33 35 2 14 10
2002 29 24 35 36 1 16 13
2002 34 28 32 35 9 16 11
2002 45 26 29 35 9 19 10
2002 26 22 38 35 1 14 12
2002 20 26 26 39 8 17 10
2002 33 20 35 37 9 18 12
;
run;
%macro regression (in, YLIST,XLIST);
%local NYLIST;
%let NYLIST=%sysfunc(countw(&YLIST));
ods tagsets.ExcelXP path='D:\REG' file='Regression.xls';
Proc GLM data=∈ class year;
%do i=1 %to &NYLIST;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2)/ solution;
%end;
%do i=2 %to &NYLIST;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2) %scan(&XLIST,3)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2) %scan(&XLIST,3)/ solution;
Model %scan(&YLIST,&i)=%scan(&XLIST,1) %scan(&XLIST,2) %scan(&XLIST,3)/ solution;
%end;
run;
ods tagsets.excelxp close;
%mend regression;
options mprint;
%regression
(in=old
,YLIST= A1 A2 A3 A4
,XLIST= B C D);
/*potential solutions*/
%macro regression(data,y,x1,x2,x3);
proc glm data=&data;
class year;
model &y=&x1 &x2 &x3/solution;
run;
%mend regression;
%macro sql (mydataset,y,x1,x2,x3);
proc sql noprint;
select cats('%regression(&mydataset,',&y,',',&x1,',',&x2,',',&x3,')')
into :calllist separated by ' ' from &mydataset;
quit;
&calllist.;
%mend sql;
%sql
(mydataset=old
,y=A1
,X1=B
,X2=C
,X3=D);
You should do this in two steps. One is a macro that contains one instance of PROC GLM:
%macro regression(data,y,x1,x2,x3);
proc glm data=&data;
class year;
model &y &x1 &x2 &x3/solution;
run;
%mend regression;
And then call that macro from something else, either a macro with the looping elements, or better, from a dataset that contains your y/x1/x2/x3 as columns (one row per model statement) using call execute or proc sql select into methods. For example, with a data set modeldata containing the y/x values for your model:
proc sql noprint;
select cats('%regression(mydataset,',y,',',x1,',',x2,',',x3,')') into :calllist separated by ' ' from modeldata;
quit;
&calllist.;
Or similar.

Selecting groups of records using FIRST.var

My dataset is:
ID AGE
1 65
1 66
1 67
1 68
1 69
1 70
1 71
2 70
2 71
2 72
3 68
3 69
3 70
[...]
My (basic) question is: which is the most direct way for obtaining a dataset with ID records starting with 65 <= AGE <= 68? (in the above example I would like to get the first 7 rows and the last three). Thanks!
Just to have another method...
proc sql;
delete from input_dataset I where not exists
(select 1 from input_dataset D where I.id=D.id having 65 le min(age) le 68);
quit;
If you want to create a new dataset, the same basic query would work as part of a SELECT, reversing the NOT.
data input_dataset;
input ID AGE;
cards;
1 65
1 66
1 67
1 68
1 69
1 70
1 71
2 70
2 71
2 72
;
run;
proc sort data=input_dataset out=sorted;
by ID;
run;
data work.first_age65to68;
set sorted;
retain keepit 0;
by ID;
if first.ID then do;
if AGE ge 65 and AGE le 68 then keepit=1;
else keepit=0;
end;
if keepit;
drop keepit;
run;

With SAS and PROC TABULATE, how can I display a percentage for one subset of values and not ALL subsets?

If I have, say, grades (A,B,C,D,E,F), and want to display the percentage and N of kids who got an A or B out of all kids, and ONLY that info, is it possible with PROC TABULATE?
I tried using multi-label formats, but with those I have to include all the other values in the format or they just show up in the PROC TABULATE results. I'm looking for something like:
Kid Group----------Total N----------A or B N----------A or B %
Group 1 100 25 25%
Group 2 100 10 10%
Can't think of any way to do it with tabulate right off hand. It might be easiest to jut do it manually.
data grades;
input name $ 1-15 gender $ 16-16 grade 19-20;
datalines;
Anderson M 75
Aziz F 67
Bayer M 77
Burke F 63
Chung M 85
Cohen F 89
Drew F 49
Dubos M 41
Elliott F 85
Hazelton M 55
Hinton M 85
Hung F 98
Jacob F 64
Janeway F 51
Judson F 89
Litowski M 85
Malloy M 79
Meyer F 85
Nichols M 58
Oliver F 41
Park F 77
Patel M 73
Randleman F 46
Robinson M 64
Shien M 55
Simonson M 62
Smith N M 71
Smith R M 79
Sullivan M 77
Swift M 63
Wolfson F 79
Wong F 89
Zabriski M 89
;
run;
proc sort data=grades;by gender;run;
data _null_;
set grades end=last;
by gender;
if _n_=1 then do;
put #1 "gender" #8 "total" #16 "A_or_B" #27 "pct";
end;
if first.gender then do;
A_or_B=0;
total=0;
end;
total+1;
if grade ge 80 and grade le 100 then A_or_B+1;
if last.gender then do;
pct=A_or_B/total;
put #1 gender #8 total #16 A_or_B #24 pct nlpct.;
end;
run;
Subset the raw dataset in PROC tabulate with where statement.
e.g. where grade in ('A', 'B')