Need a SAS macro program to avoid repetition - sas

I have a dataset where some SAS Datastep logic are
needed to populate the columns that are missing, or to be derived from exiting columns.
The dataset looks more like the below:
mpi v1 v2 v3......v9 v10 v11.....v50
001 a 1.324
002 c 0.876
003 f 11.9
004 r 5.7
005 b 3.3
. . .
. . .
n t 0.4
I actually developed the program below:
/*a*/
IF v2 ('a') AND 0 <= v11 <= 2 THEN DO;
v13 = 1;
v14 =20;
END;
IF v2 IN ('a') AND 2 < v11 <= 3.1 THEN DO;
v13 = 2;
v14 =40;
END;
IF v2 IN ('a') AND 3.1 < v11<= 5.3 THEN DO;
v13 = 3;
v14 =60; END;
IF v2 IN ('a') AND 5.3 < v11 <= 11.5 THEN DO;
v13 = 4;
v14 =80;
END;
IF v2 IN ('a') AND v11 > 11.5 THEN DO;
v13 = 5;
v14 =100;
END;
My request is that I need to write same program to populate v13 and v14 when v2 IN c, f, t, r, etc; but of different parameters for the bound in v11 (separate for c, e, g,...) while v13 and v14 remain the same for the categories.
I would like to use SAS macro to get this done to avoid repetition of program. Can you help out on this?

The best way to do this is to create a dataset with the values of v2,v11,v13,v14, and merge it on or otherwise combine it with your dataset.
Doing that is a little more complicated when you have a range for a value, but by no means impossible.
Let's say you have a dataset, with v2, v11min, v11max, v13, and v14.
data mergeon;
input v2 $ v11min v11max v13 v14;
datalines;
a 0 2 1 20
a 2 3.1 2 40
a 3.1 5.3 3 60
a 5.3 11.5 4 80
a 11.5 9999 5 100
c 0 4 1 20
c 4 8.1 2 40
c 8.1 9.6 3 60
c 9.6 13.5 4 80
c 13.5 9999 5 100
;;;;
run;
data have;
input mpi v2 $ v11 v13 v14;
datalines;
1 a 2 0 0
2 a 4 0 0
3 c 1 0 0
4 c 7 0 0
5 c 9 0 0
6 a 22 0 0
7 a 10 0 0
;;;;
run;
proc sql;
create table want as
select H.mpi, H.v2, H.v11, coalesce(M.v13,H.v13) as v13, coalesce(M.v14,H.v14) as v14
from have H
left join mergeon M
on H.v2=M.v2
and M.v11min < H.v11 <= M.v11max
;
quit;
COALESCE chooses the first nonmissing value, meaning it will keep the H.v13 value only when M.v13 is missing (so, when the merge fails to find a record in the mergeon table).
If you aren't comfortable with SQL, you can also use a few other options; a hash table is probably the easiest, though you may also be able to use an update statement (not as familiar with those myself).

Related

Subset data by group by proportion in SAS

In this data, I need to subset by each variable by certain percentage.
For example,
Obs Group Score
1 A 1
2 A 2
3 B 1
4 B 1
5 C 3
6 C 1
7 C 1
8 A 1
9 A 3
10 A 1
11 A 2
12 B 3
13 C 2
I would need to subset 10 obs.
The sample must consist of all groups, and score of 1 takes higher priority.
Each group is given certain percent.
Let say 50% for A, 20% for B and 30% for C.
I tried using proc surveyselect but it failed. The number of alloc is not same as the strata.
proc surveyselect data=example out=test sampsize=10;
strata group score/alloc=(0.5 0.2 0.3);
run;
I don't know proc surveyselect too much, so I give the data step version.
data have;
input Obs Group$ Score;
cards;
1 A 1
2 A 2
3 B 1
4 B 1
5 C 3
6 C 1
7 C 1
8 A 1
9 A 3
10 A 1
11 A 2
12 B 3
13 C 2
;
run;
proc sort;
by Group Score;
run;
data want;
array _Dist_[3]$ _temporary_('A','B','C');
array _Upper_[3] _temporary_(5,2,3);
array _Count_[3] _temporary_;
do i = 1 to rec;
set have nobs=rec point=i;
do j = 1 to dim(_Dist_);
_Count_[j] + (Group=_Dist_[j]);
if _Count_[j] <= _Upper_[j] and Group = _Dist_[j] then output;
end;
end;
stop;
drop j;
run;

scatter plot in sas does not show

I am new in SAS and I'm trying to do scatter plot to see X vs residual but when I run the code this error appears
ERROR: Procedure SQPLOT not found.
this is my code:
data EC
input x e;
datalines;
2 3.2
3 2.9
4 -1.7
5 -2.0
6 -2.3
7 -1.2
8 -0.9
9 0.8
10 0.7
11 0.5
;
run;
proc sqplot data = EC;
scatter x = x y=residual;
run;
could you help me where is the wrong?
There is no procedure name SQPLOT. You probably want to use SGPLOT.
data EC;
input x e;
datalines;
2 3.2
3 2.9
4 -1.7
5 -2.0
6 -2.3
7 -1.2
8 -0.9
9 0.8
10 0.7
11 0.5
;
run;
proc sgplot data=EC;
scatter x = x y=e;
run;
For the situations where your code tries to use a procedure which is not licensed (or installed) the log will show a similar ERROR: message.

SAS Function to calculate percentage for row for two stratifications

I have a dataset that looks like this
data test;
input id1$ id2$ score1 score2 score3 total;
datalines;
A D 9 36 6 51
A D 9 8 6 23
A E 5 3 2 10
B D 5 3 3 11
B E 7 4 7 18
B E 5 3 3 11
C D 8 7 9 24
C E 8 52 6 66
C D 4 5 3 12
;
run;
I want to add a column that calculates what percentage of the corresponding total is of the summation within id1 and id2.
What I mean is this; id1 has a value of A. Within the value of A, there are twoid2 values ; D and E. There are two values of D, and one of E. The two total values of D are 51 and 23, and they sum to 74. The one total value of E is 10, and it sums to 10. The column I'd like to create would hold the values of .68 (51/74), .31 (23/74), and 1 (10/10) in row 1 ,row 2, and row 3 respectively.
I need to perform this calculations for the rest of the id1 and their corresponding id2. So when complete, I want a table that would like like this:
id1 id2 score1 score2 score3 total percent_of_total
A D 9 36 6 51 0.689189189
A D 9 8 6 23 0.310810811
A E 5 3 2 10 1
B D 5 3 3 11 1
B E 7 4 7 18 0.620689655
B E 5 3 3 11 0.379310345
C D 8 7 9 24 0.666666667
C E 8 52 6 66 1
C D 4 5 3 12 0.333333333
I realize a loop might be able to solve the problem I've given, but I'm dealing with EIGHT levels of stratification, with as many as 98 sublevels within those levels. A loop is not practical. I'm thinking something along the lines of PROC SUMMARY but I'm not too familiar with the function.
Thank you.
It is easy to do with a data step. Make sure the records are sorted.
You can find the grand total for the ID1*ID2 combination and then use it to calculate the percentage.
proc sort data=test;
by id1 id2;
run;
data want ;
do until (last.id2);
set test ;
by id1 id2 ;
grand = sum(grand,total);
end;
do until (last.id2);
set test ;
by id1 id2 ;
precent_of_total = total/grand ;
output;
end;
run;

SAS find in order variables

I need to find who has in order A-B-C. Please check the table for example;
id term grade subj num
10 2002 D 332 1
10 2002 A 333 2
11 2005 C 232 1
11 2005 A 232 2
11 2005 B 232 3
11 2005 C 232 4
15 2010 A 130 1
15 2010 B 130 2
15 2010 C 130 3
20 2000 B 500 1
20 2000 A 500 2
20 2000 C 500 3
What i need fromthis table is id : 11 AND 15
The output should be like
id term subj
11 2005 232
15 2010 130
So i need list the id's that had Grade of 'A' in it then was changed to 'B' then it was changed to 'C' .
Num could be in order. It dosen't have to start from 1, it could be 1 or 2 or 3, etc. But it should be in order A then B then C
I dont need to see the ID=20 bec for the num order grades' are not in order.
If all you are looking for is a simple 'A'-'B'-'C' sequence, then the LAG() function is sufficient. That is what I show in the example below. If you are looking for more sequences (e.g. 'A'-'B', 'B'-'C', 'A'-'B'-'C'-'D'), a slightly more complex solution is needed. If so, I'll edit the answer accordingly.
Below is a test program showing the implementation:
DATA d1;
INPUT
id :8.
term :8.
grade :$2.
subj :8.
num :8.
;
DATALINES;
10 2002 D 332 1
10 2002 A 333 2
11 2005 C 232 1
11 2005 A 232 2
11 2005 B 232 3
11 2005 C 232 4
15 2010 A 130 1
15 2010 B 130 2
15 2010 C 130 3
;
RUN;
DATA d2 (
KEEP = id term subj
);
SET d1;
grade_previous_1 = LAG1(grade);
grade_previous_2 = LAG2(grade);
IF (grade = 'C' AND grade_previous_1 = 'B' AND grade_previous_2 = 'A');
RUN;
Note that the LAG functions must be evaluated on their own lines and stored in variables, as shown above - don't fold them into the IF conditions or they won't always get executed. That is, don't say:
IF (grade = 'C' AND LAG1(grade) = 'B' AND LAG2(grade) = 'A');
That actually works in this example but in general it's better to get into the habit of calling LAG() outside of IF conditions and storing results in temporary variables.

SAS Translate cell count to specific values

I have a data set that has a person's name and how many times they scored a 1-10. For example, Bob scored 7 1s, 8 2s, and 7 4s, but did not receive any other scores.
Name 1 2 3 4 5 6 7 8 9 10
Bob 7 8 7 0 0 0 0 0 0 0
Hal 9 3 1 0 0 0 0 0 0 0
I want a data set that has a row for Bob that looks like this
Bob 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 4 4 4 4 4 4 4
Hal 1 1 1 1 1 1 1 1 1 2 2 2 3
I'm doing this in SAS by the way.
I know I can write a macro to create variables named score1, score2, ..., scoreN.
I am having trouble populating the cells. Any help would be appreciated. Thanks.
Such things - changing the structure of the dataset - sometimes easier to do with PROC TRANSPOSE:
data have;
input Name $ v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;
datalines;
Bob 7 8 7 0 0 0 0 0 0 0
;
run;
/*convert original wide dataset into long one*/
proc transpose data=have out=have_long;
var v:;
by Name;
run;
data want;
set have_long;
substr(_NAME_,1,1)=""; *to get rid of first 'v' in variables' names;
do i=1 to COL1;
new_var=_NAME_;
output;
end;
drop _NAME_ COL1 i;
run;
/*convert back to wide dataset*/
proc transpose data=want out=want(drop=_NAME_);
var new_var;
by Name;
run;