SAS: retain the largest value of several possible values - sas

This is a SAS question. The following lines for two people are ordered by ascending AdmitNum. Ascending AdmitNum is based on ascending dates, which are omitted. Ages are provided for each AdmitNum. Age decreases between some of the observations. I don't want this to occur. Age must be equal or increase.
If the next age is less than the current age, then I want the current age to be written into the new variable NeedAge. In other words, retain the greater age while it is the greater age.
Person 2 has the wrong age, 43, in three rows. These should be 53. Person 2's age changes to 54 when AdmitNum=5 and this value, 54, should be retained.
After several attempts I have had only had partial success. Can someone suggest a way to make NeedAge as shown below? Thanks.
ID AdmitNum HaveAge NeedAge
1 1 51 51
1 2 48 51
1 3 51 51
1 4 49 51
2 1 53 53
2 2 43 53
2 3 43 53
2 4 43 53
2 5 54 54

data have;
input ID AdmitNum HaveAge;
datalines;
1 1 51
1 2 48
1 3 51
1 4 49
2 1 53
2 2 43
2 3 43
2 4 43
2 5 54
;
run;
data want;
set have;
by ID;
if _n_ = 1 NeedAge = HaveAge;
if HaveAge > NeedAge then NeedAge = HaveAge;
retain NeedAge;
run;

Check if HaveAge exceeds NeedAge, and if so, replace NeedAge with HaveAge. Then retain.
data have;
input ID AdmitNum HaveAge;
datalines;
1 1 51
1 2 48
1 3 51
1 4 49
2 1 53
2 2 43
2 3 43
2 4 43
2 5 54
;
run;
data want;
set have;
by ID;
if HaveAge > NeedAge then NeedAge = HaveAge;
retain NeedAge;
run;

Related

How to create running conditional sum variables by class and ID in SAS

From a cumulative episode count (if time intervals are less than 10 days, it is considered one episode), I want to calculate a “wide” and “long” version of running episode count based on class by ID.
This is what my data looks like right now.
id Class Date Obsvn Episode_Sum
9 Wide 3/10/2012 1 1
9 Wide 3/12/2012 2 1
9 Wide 7/1/2012 111 2
9 Wide 7/3/2012 2 2
108 Wide 3/31/2011 1 1
108 Long 3/31/2011 1 1
108 Wide 4/17/2011 17 2
108 Wide 6/24/2011 68 3
108 Wide 6/16/2012 358 4
108 Wide 7/20/2012 34 5
108 Wide 7/27/2012 7 5
I achieved the running count by this code:
data want (drop=lag); set have;
by id date;
format lag mmddyy10.;
lag=lag(date);
if first.id then obsvn=1;
else obsvn=max(intck("Day", Lag, date),1);
if first.id then episode_sum=1;
else if obsvn>10 then episode_sum+1;
run;
I want my data to look like this:
id Class Date Obsvn Sum Wide Long
9 Wide 3/10/2012 1 1 1 0
9 Wide 3/12/2012 2 1 1 0
9 Wide 7/1/2012 111 2 2 0
9 Wide 7/3/2012 2 2 2 0
108 Wide 3/31/2011 1 1 1 0
108 Long 3/31/2011 1 1 1 1
108 Wide 4/17/2011 17 2 2 1
108 Wide 6/24/2011 68 3 3 1
108 Wide 6/16/2012 358 4 4 1
108 Wide 7/20/2012 34 5 5 1
108 Wide 7/27/2012 7 5 5 1
But I am getting this:
id Class Date Obsvn Sum Wide Long
9 Wide 3/10/2012 1 1 1 0
9 Wide 3/12/2012 2 1 1 0
9 Wide 7/1/2012 111 2 2 0
9 Wide 7/3/2012 2 2 **1** 0
108 Wide 3/31/2011 1 1 1 **1**
108 Long 3/31/2011 1 1 1 1
108 Wide 4/17/2011 17 2 2 1
108 Wide 6/24/2011 68 3 3 1
108 Wide 6/16/2012 358 4 4 1
108 Wide 7/20/2012 34 5 5 1
108 Wide 7/27/2012 7 5 **1** 1
This is my code to create the episodes by wide and long. I am trying to account for when each ID switches class. How do I achieve this?
/*Calculating Long*/
if (first.id and class in ("Long")) then Episode_Long=1;
else if obsvn>10 and class in ("Long") then Episode_Long+1;
retain Episode_Long;
if (obsvn<10 and class in ("Long")) then Episode_Long=1;
if class not in ("Long") then do;
if first.id and class not in ("Long") then Episode_Long=0;
retain Episode_Long;
end;
/*Calculating Wide */
if (obsvn<10 and class in ("Wide")) then Episode_Wide=1 ;
if (first.id and class in ("Wide")) then Episode_Wide=1;
else if obsvn>10 and class in ("Wide") then Episode_Wide+1;
retain Episode_Wide;
The tricky part is that you have two records for the same DATE in the second ID group. So you want to keep track of that when calculating the change in days.
Here is one way. First let's enter your source data (and desired results).
data have ;
input id Class $ Date :mmddyy. EObsvn ESum EWide ELong ;
format date yymmdd10.;
cards;
9 Wide 3/10/2012 1 1 1 0
9 Wide 3/12/2012 2 1 1 0
9 Wide 7/1/2012 111 2 2 0
9 Wide 7/3/2012 2 2 2 0
108 Wide 3/31/2011 1 1 1 0
108 Long 3/31/2011 1 1 1 1
108 Wide 4/17/2011 17 2 2 1
108 Wide 6/24/2011 68 3 3 1
108 Wide 6/16/2012 358 4 4 1
108 Wide 7/20/2012 34 5 5 1
108 Wide 7/27/2012 7 5 5 1
;
You might want to find the dates where WIDE or LONG gaps exist first.
data long ;
set have ;
by id date;
where class='Long';
if first.date;
lag=lag(date);
if first.id then call missing(lag,obsvn);
else obsvn=max(intck("Day", Lag, date),1);
lflag = missing(lag) or obsvn > 10 ;
keep id date lflag ;
run;
data wide ;
set have ;
by id date;
where class='Wide';
if first.date;
lag=lag(date);
if first.id then call missing(lag,obsvn);
else obsvn=max(intck("Day", Lag, date),1);
wflag = missing(lag) or obsvn > 10 ;
keep id date wflag ;
run;
Then merge it back onto the source by date and calculate your counters.
data want ;
merge have wide long ;
by id date;
if first.date then do ;
lag=lag(date);
format lag yymmdd10.;
if first.id then call missing(lag,obsvn);
else obsvn=max(intck("Day", Lag, date),1);
retain lag obsvn;
end;
if first.id then call missing(sum,wide,long);
if missing(lag) or obsvn > 10 then sum+first.date ;
wide + (wflag and first.date);
long + (lflag and first.date);
run;

SAS_Add value for specific rows

I want to give the value for some specific rows. I think showing it by example would be better. I have following datasheet;
Date Value
01/01/2001 10
02/01/2001 20
03/01/2001 35
04/01/2001 15
05/01/2001 25
06/01/2001 35
07/01/2001 20
08/01/2001 45
09/01/2001 35
My result should be:
Date Value Spec.Value
01/01/2001 10 1
02/01/2001 20 1
03/01/2001 35 1
04/01/2001 15 2
05/01/2001 25 2
06/01/2001 35 2
07/01/2001 20 3
08/01/2001 45 3
09/01/2001 35 3
As you can see, my condition value is 35. I have three 35. I need to group my date by using this condition value.
data want;
set have;
retain specvalue 1;
if lag(value) = 35 then do;
specvalue +1;
end;
run;

performing chi squared test in SAS using PROC FREQ

Our university is forcing us to perform the old school chi square test using PROC FREQ (I am aware of the options with proc univariate).
I have generated one theoretical exponential distribution with Beta=15 (and written down the values laboriously), and I've generated 10000 random variables which have an exponential distribution, with beta=15.
I try to first enter the frequencies of my random variables (in each interval) via the datalines command:
data expofaktiska;
input number count;
datalines;
1 2910
2 2040
3 1400
4 1020
5 732
6 531
7 377
8 305
9 210
10 144
11 106
12 66
13 40
14 45
15 29
16 16
17 12
18 8
19 8
20 3
21 2
22 0
23 1
24 2
25 0
26 2
;
run;
This seems to work.
I then try to compare these values to the theoretical values, using the chi square test in proc freq (the one we are supposed to use)
As follows:
proc freq data=expofaktiska;
weight count;
tables number / testp=(0.28347 0.20311 0.14554 0.10428 0.07472 0.05354 0.03837 0.02749 0.01969 0.01412 0.01011 0.00724 0.0052 0.00372 0.00266 0.00191 0.00137 0.00098 0.00070 0.00051 0.00036 0.00026 0.00018 0.00013 0.00010 0.00007) chisq;
run;
I get the following error:
ERROR: The number of TESTP values does not equal the number of levels. For the table of number,
there are 24 levels and 26 TESTP values.
This may be because two intervals contain 0 obervations. I don't really see a way around this.
Also, I don't get the chi square test in the results viewer, nor the "tes probability", I only the frequency/cumulative frequency of the random variables.
What am I doing wrong? Do both theoretical/actual distributions need to have the same form (probability/frequencies?)
We are using SAS 9.4
Thanks in advance!
/Magnus
You need ZEROS options on the WEIGHT statement.
data expofaktiska;
input number count;
datalines;
1 2910
2 2040
3 1400
4 1020
5 732
6 531
7 377
8 305
9 210
10 144
11 106
12 66
13 40
14 45
15 29
16 16
17 12
18 8
19 8
20 3
21 2
22 0
23 1
24 2
25 0
26 2
;
run;
proc freq data=expofaktiska;
weight count / zeros;
tables number / testp=(0.28347 0.20311 0.14554 0.10428 0.07472 0.05354 0.03837 0.02749 0.01969 0.01412 0.01011 0.00724 0.0052 0.00372 0.00266 0.00191 0.00137 0.00098 0.00070 0.00051 0.00036 0.00026 0.00018 0.00013 0.00010 0.00007) chisq;
run;

Cumulative sum in multiple columns in SAS

I have been searching the solution a while, but I couldn't find any similar question in SAS in communities. So here is my question: I have a big SAS table: let's say with 2 classes and 26 variables:
A B Var1 Var2 ... Var25 Var26
-----------------------------
1 1 10 20 ... 35 30
1 2 12 24 ... 32 45
1 3 20 23 ... 24 68
2 1 13 29 ... 22 57
2 2 32 43 ... 33 65
2 3 11 76 ... 32 45
...................
...................
I need to calculate the cumulative sum of the all 26 variables through the Class=B, which means that for A=1, it will accumulate through B=1,2,3; and for A=2 it will accumulate through B=1,2,3. The resulting table will be like:
A B Cum1 Cum2 ... Cum25 Cum26
-----------------------------
1 1 10 20 ... 35 30
1 2 22 44 ... 67 75
1 3 40 67 ... 91 143
2 1 13 29 ... 22 57
2 2 45 72 ... 55 121
2 3 56 148 .. 87 166
...................
...................
I can choose the hard way, like describing each of 26 variables in a loop, and then I can find the cumulative sums through B. But I want to find a more practical solution for this without describing all the variables.
On one of the websites was suggested a solution like this:
proc sort data= (drop=percent cum_pct rename=(count=demand cum_freq=cal));
weight var1;
run;
I am not sure if there is any option like "Weight" in Proc Sort, but if it works then I thought that maybe I can modify it by putting numeric instead of Var1, then the Proc Sort process can do the process for all the numerical values :
proc sort data= (drop=percent cum_pct rename=(count=demand cum_freq=cal));
weight _numerical_;
run;
Any ideas?
One way to accomplish this is to use 2 'parallel' arrays, one for your input values and another for the cumulative values.
%LET N = 26 ;
data cum ;
set have ;
by A B ;
array v{*} var1-var&N ;
array c{*] cum1-cum&N ;
retain c . ;
if first.A then call missing(of c{*}) ; /* reset on new values of A */
do i = 1 to &N ;
c{i} + v{i} ;
end ;
drop i ;
run ;

SAS replacing duplicate values

I have a data set that has duplicate values of v1. I would like v2 values to be replaced by the first value of v2.
Data one;
v1 v2
1 20
1 23
1 21
2 36
3 51
4 44
4 20
I would like data=one to be changed to this:
Data one;
v1 v2
1 20
1 20
1 20
2 36
3 51
4 44
4 44
what procedure do I need to use?
A data step will do (assuming the data is already sorted the way you want):
data one;
set one;
by v1;
if first.v1
then keeper=v2;
else v2=keeper;
retain keeper;
drop keeper;
run;