SAS find in order variables - sas

I need to find who has in order A-B-C. Please check the table for example;
id term grade subj num
10 2002 D 332 1
10 2002 A 333 2
11 2005 C 232 1
11 2005 A 232 2
11 2005 B 232 3
11 2005 C 232 4
15 2010 A 130 1
15 2010 B 130 2
15 2010 C 130 3
20 2000 B 500 1
20 2000 A 500 2
20 2000 C 500 3
What i need fromthis table is id : 11 AND 15
The output should be like
id term subj
11 2005 232
15 2010 130
So i need list the id's that had Grade of 'A' in it then was changed to 'B' then it was changed to 'C' .
Num could be in order. It dosen't have to start from 1, it could be 1 or 2 or 3, etc. But it should be in order A then B then C
I dont need to see the ID=20 bec for the num order grades' are not in order.

If all you are looking for is a simple 'A'-'B'-'C' sequence, then the LAG() function is sufficient. That is what I show in the example below. If you are looking for more sequences (e.g. 'A'-'B', 'B'-'C', 'A'-'B'-'C'-'D'), a slightly more complex solution is needed. If so, I'll edit the answer accordingly.
Below is a test program showing the implementation:
DATA d1;
INPUT
id :8.
term :8.
grade :$2.
subj :8.
num :8.
;
DATALINES;
10 2002 D 332 1
10 2002 A 333 2
11 2005 C 232 1
11 2005 A 232 2
11 2005 B 232 3
11 2005 C 232 4
15 2010 A 130 1
15 2010 B 130 2
15 2010 C 130 3
;
RUN;
DATA d2 (
KEEP = id term subj
);
SET d1;
grade_previous_1 = LAG1(grade);
grade_previous_2 = LAG2(grade);
IF (grade = 'C' AND grade_previous_1 = 'B' AND grade_previous_2 = 'A');
RUN;
Note that the LAG functions must be evaluated on their own lines and stored in variables, as shown above - don't fold them into the IF conditions or they won't always get executed. That is, don't say:
IF (grade = 'C' AND LAG1(grade) = 'B' AND LAG2(grade) = 'A');
That actually works in this example but in general it's better to get into the habit of calling LAG() outside of IF conditions and storing results in temporary variables.

Related

SAS_Add value for specific rows

I want to give the value for some specific rows. I think showing it by example would be better. I have following datasheet;
Date Value
01/01/2001 10
02/01/2001 20
03/01/2001 35
04/01/2001 15
05/01/2001 25
06/01/2001 35
07/01/2001 20
08/01/2001 45
09/01/2001 35
My result should be:
Date Value Spec.Value
01/01/2001 10 1
02/01/2001 20 1
03/01/2001 35 1
04/01/2001 15 2
05/01/2001 25 2
06/01/2001 35 2
07/01/2001 20 3
08/01/2001 45 3
09/01/2001 35 3
As you can see, my condition value is 35. I have three 35. I need to group my date by using this condition value.
data want;
set have;
retain specvalue 1;
if lag(value) = 35 then do;
specvalue +1;
end;
run;

SAS IF then statement

Hello for whatever reason my if then statement will not work for this code. What I am trying to get it to do is (kinda obvious but whatever) if the salary is LE 30,000 then make new variable income equal to low. Here is what I have so far.
data newdd2;
input subject group$ year salary : comma7. ##;
IF (salary <= 30,000) THEN income = 'low';
datalines;
1 A 2 53,900 2 B 2 37,400 3 A 1 49,500
4 C 2 43,900 5 B 3 38,400 6 A 3 39,500
7 A 3 53,600 8 B 2 37,700 9 C 1 49,900
10 C 2 43,300 11 B 3 57,400 12 B 3 39,500
13 B 1 33,900 14 A 2 41,400 15 C 2 49,500
16 C 1 43,900 17 B 1 39,400 18 A 3 39,900
19 A 2 53,600 20 A 2 37,700 21 C 3 42,900
22 C 2 43,300 23 B 1 57,400 24 C 3 69,500
25 C 2 33,900 26 A 2 35,300 27 A 2 47,500
28 C 2 43,900 29 B 3 38,400 30 A 1 32,500
31 A 3 53,600 32 B 2 37,700 33 C 1 41,600
34 C 2 43,300 35 B 3 57,400 36 B 3 39,500
37 B 2 33,900 38 A 2 41,400 39 C 2 79,500
40 C 1 43,900 41 C 1 29,500 42 A 3 39,900
43 A 2 53,600 44 A 2 37,500 45 C 3 42,900
46 C 2 43,300 47 B 1 47,400 48 C 3 59,500
run;
The error I keep getting is (The work dataset may be incomplete), however I am sure that my code is correct I've tried a number of things but no success yet thanks in advance.
You cannot use a comma in a numeric literal.
IF (salary <= 30000) THEN income = 'low';

SAS Function to calculate percentage for row for two stratifications

I have a dataset that looks like this
data test;
input id1$ id2$ score1 score2 score3 total;
datalines;
A D 9 36 6 51
A D 9 8 6 23
A E 5 3 2 10
B D 5 3 3 11
B E 7 4 7 18
B E 5 3 3 11
C D 8 7 9 24
C E 8 52 6 66
C D 4 5 3 12
;
run;
I want to add a column that calculates what percentage of the corresponding total is of the summation within id1 and id2.
What I mean is this; id1 has a value of A. Within the value of A, there are twoid2 values ; D and E. There are two values of D, and one of E. The two total values of D are 51 and 23, and they sum to 74. The one total value of E is 10, and it sums to 10. The column I'd like to create would hold the values of .68 (51/74), .31 (23/74), and 1 (10/10) in row 1 ,row 2, and row 3 respectively.
I need to perform this calculations for the rest of the id1 and their corresponding id2. So when complete, I want a table that would like like this:
id1 id2 score1 score2 score3 total percent_of_total
A D 9 36 6 51 0.689189189
A D 9 8 6 23 0.310810811
A E 5 3 2 10 1
B D 5 3 3 11 1
B E 7 4 7 18 0.620689655
B E 5 3 3 11 0.379310345
C D 8 7 9 24 0.666666667
C E 8 52 6 66 1
C D 4 5 3 12 0.333333333
I realize a loop might be able to solve the problem I've given, but I'm dealing with EIGHT levels of stratification, with as many as 98 sublevels within those levels. A loop is not practical. I'm thinking something along the lines of PROC SUMMARY but I'm not too familiar with the function.
Thank you.
It is easy to do with a data step. Make sure the records are sorted.
You can find the grand total for the ID1*ID2 combination and then use it to calculate the percentage.
proc sort data=test;
by id1 id2;
run;
data want ;
do until (last.id2);
set test ;
by id1 id2 ;
grand = sum(grand,total);
end;
do until (last.id2);
set test ;
by id1 id2 ;
precent_of_total = total/grand ;
output;
end;
run;

performing chi squared test in SAS using PROC FREQ

Our university is forcing us to perform the old school chi square test using PROC FREQ (I am aware of the options with proc univariate).
I have generated one theoretical exponential distribution with Beta=15 (and written down the values laboriously), and I've generated 10000 random variables which have an exponential distribution, with beta=15.
I try to first enter the frequencies of my random variables (in each interval) via the datalines command:
data expofaktiska;
input number count;
datalines;
1 2910
2 2040
3 1400
4 1020
5 732
6 531
7 377
8 305
9 210
10 144
11 106
12 66
13 40
14 45
15 29
16 16
17 12
18 8
19 8
20 3
21 2
22 0
23 1
24 2
25 0
26 2
;
run;
This seems to work.
I then try to compare these values to the theoretical values, using the chi square test in proc freq (the one we are supposed to use)
As follows:
proc freq data=expofaktiska;
weight count;
tables number / testp=(0.28347 0.20311 0.14554 0.10428 0.07472 0.05354 0.03837 0.02749 0.01969 0.01412 0.01011 0.00724 0.0052 0.00372 0.00266 0.00191 0.00137 0.00098 0.00070 0.00051 0.00036 0.00026 0.00018 0.00013 0.00010 0.00007) chisq;
run;
I get the following error:
ERROR: The number of TESTP values does not equal the number of levels. For the table of number,
there are 24 levels and 26 TESTP values.
This may be because two intervals contain 0 obervations. I don't really see a way around this.
Also, I don't get the chi square test in the results viewer, nor the "tes probability", I only the frequency/cumulative frequency of the random variables.
What am I doing wrong? Do both theoretical/actual distributions need to have the same form (probability/frequencies?)
We are using SAS 9.4
Thanks in advance!
/Magnus
You need ZEROS options on the WEIGHT statement.
data expofaktiska;
input number count;
datalines;
1 2910
2 2040
3 1400
4 1020
5 732
6 531
7 377
8 305
9 210
10 144
11 106
12 66
13 40
14 45
15 29
16 16
17 12
18 8
19 8
20 3
21 2
22 0
23 1
24 2
25 0
26 2
;
run;
proc freq data=expofaktiska;
weight count / zeros;
tables number / testp=(0.28347 0.20311 0.14554 0.10428 0.07472 0.05354 0.03837 0.02749 0.01969 0.01412 0.01011 0.00724 0.0052 0.00372 0.00266 0.00191 0.00137 0.00098 0.00070 0.00051 0.00036 0.00026 0.00018 0.00013 0.00010 0.00007) chisq;
run;

Cumulative sum in multiple columns in SAS

I have been searching the solution a while, but I couldn't find any similar question in SAS in communities. So here is my question: I have a big SAS table: let's say with 2 classes and 26 variables:
A B Var1 Var2 ... Var25 Var26
-----------------------------
1 1 10 20 ... 35 30
1 2 12 24 ... 32 45
1 3 20 23 ... 24 68
2 1 13 29 ... 22 57
2 2 32 43 ... 33 65
2 3 11 76 ... 32 45
...................
...................
I need to calculate the cumulative sum of the all 26 variables through the Class=B, which means that for A=1, it will accumulate through B=1,2,3; and for A=2 it will accumulate through B=1,2,3. The resulting table will be like:
A B Cum1 Cum2 ... Cum25 Cum26
-----------------------------
1 1 10 20 ... 35 30
1 2 22 44 ... 67 75
1 3 40 67 ... 91 143
2 1 13 29 ... 22 57
2 2 45 72 ... 55 121
2 3 56 148 .. 87 166
...................
...................
I can choose the hard way, like describing each of 26 variables in a loop, and then I can find the cumulative sums through B. But I want to find a more practical solution for this without describing all the variables.
On one of the websites was suggested a solution like this:
proc sort data= (drop=percent cum_pct rename=(count=demand cum_freq=cal));
weight var1;
run;
I am not sure if there is any option like "Weight" in Proc Sort, but if it works then I thought that maybe I can modify it by putting numeric instead of Var1, then the Proc Sort process can do the process for all the numerical values :
proc sort data= (drop=percent cum_pct rename=(count=demand cum_freq=cal));
weight _numerical_;
run;
Any ideas?
One way to accomplish this is to use 2 'parallel' arrays, one for your input values and another for the cumulative values.
%LET N = 26 ;
data cum ;
set have ;
by A B ;
array v{*} var1-var&N ;
array c{*] cum1-cum&N ;
retain c . ;
if first.A then call missing(of c{*}) ; /* reset on new values of A */
do i = 1 to &N ;
c{i} + v{i} ;
end ;
drop i ;
run ;