Changing values in previous and post records when a numerical condition is met using SAS - sas

data have;
input patient level timepoint;
datalines;
1 0 1
1 0 2
1 0 3
1 3 4
1 0 5
1 0 6
2 0 1
2 4 2
2 0 3
2 3 4
2 0 5
2 0 6
2 0 7
2 2 8
2 0 9
2 0 10
3 3 1
3 0 2
3 0 3
4 0 1
4 0 2
4 0 3
4 0 4
4 1 5
4 0 6
4 0 7
4 0 8
4 0 9
4 0 10
;;
proc print; run;
/*
Condition 1: If there is one non-zero numeric value, in level, sorted by timepoint for a patient, set level to 2.5 for the record that is immediately prior to this time point; and set level = 1.5 for the next prior time point; set level to 2.5 for the record that is immediate post this time point; and set level to 1.5 for the next post record. The levels by timepoint should look like, ... 1.5, 2.5, non-zero numeric value, 2.5, 1.5 ... (Note: ... are kept as 0s).
Condition 2: If there are two or more non-zero numeric values, in level, sorted by timepoint for a patient, find the FIRST non-zero numeric value, and set level to 2.5 for the record that is immediate prior this time point; and set level to 1.5 for the next prior time point; then find the LAST non-zero numeric value record, set level to 2.5 for the record that is immediate post this last non-zero numeric value, and set level to 1.5 for the next post record; Set all zero values (i.e. level=0) to level = 2.5 for records between the first and last non-zero numeric values; The levels by timepoint should look like: ... 1.5, 2.5, FIRST Non-zero Numeric value, 2.5, Non-zero Numeric value, 2.5, LAST Non-zero Numeric value, 2.5, 1.5 ....
*/
I've tried data steps using N-1, N-2, N+1, N+2, arrays/do loops (my first thought was to use multiple arrays for this so that I could use the i=index to go to previous i-1/i+1 or i-2/1+2 records, but it was hard to grasp the concept of how to even code it.). All of this has to be done BY Patient, so there may be instances where there is only one record before the first non-zero and not two. The same could be true for post record as well. I searched all different types of examples and help, but none that could help with my needs. Thanks in advance for any help.
This is how I want the data to look like:
data want;
input patient level timepoint;
datalines;
1 0 1
1 1.5 2
1 2.5 3
1 3 4
1 2.5 5
1 1.5 6
2 2.5 1
2 4 2
2 2.5 3
2 3 4
2 2.5 5
2 2.5 6
2 2.5 7
2 2 8
2 2.5 9
2 1.5 10
3 3 1
3 2.5 2
3 1.5 3
4 0 1
4 0 2
4 1.5 3
4 2.5 4
4 1 5
4 2.5 6
4 1.5 7
4 0 8
4 0 9
4 0 10
;;
proc print; run;

I approached this by first finding the timepoints of the first and last non-zero levels. Then I merged those into the original set, and changed levels based on the rules you mentioned.
proc sort data = have;
by patient timepoint;
run;
data have2;
retain first 0 last 0;
set have;
by patient timepoint;
if level ne 0 and first = 0 then first = timepoint;
if level ne 0 then last = timepoint;
if last.patient then do;
output;
first = 0;
last = 0;
end;
keep patient first last;
run;
proc sort data=have2;
by patient;
run;
data merged;
merge have have2;
by patient;
if level = 0 then do;
if first-timepoint = 1 then level = 2.5;
if first-timepoint = 2 then level = 1.5;
if last-timepoint = -1 then level = 2.5;
if last-timepoint = -2 then level = 1.5;
if first < timepoint < last then level = 2.5;
end;
drop first last;
run;

Related

Lag function in SAS for checking previous value

In SAS, I would like to create a label that check the previous sell indicator: if the sell indicator of the previous time period is 1/0 and in the current is 0/1 (meaning that it has changed) then I assign a value 1 to the ind variable.
The dataset looks like:
Customer Time Sell_Ind
1 2 1
1 3 0
1 4 0
2 23 0
2 24 0
2 30 0
5 12 1
5 11 0
And so on.
My expected output would be
Customer Time Sell_Ind Ind
1 2 1 0
1 3 0 1
1 4 0 0
2 23 0 0
2 24 0 0
2 30 0 0
5 12 1 0
5 11 0 1
The previous/current check is meant by customer.
I have tried as follows
data mydata;
set original;
By customer;
Lag_sell_ind=lag(sell_ind);
If first.customer then Lag_sell_ind=.;
Run;
But it does not return the expected output.
In sql I would probably use partition by customer over time but I do not know how to do the same in SAS.
You were halfway through, you only need to add one if statement to achieve the desired output.
data want;
set have;
by customer;
lag=lag(sell_ind);
if first.customer then lag=.;
if sell_ind ne lag and lag ne . then ind = 1;
else ind = 0;
drop lag;
run;
You can simplify this using the IFN Function like below.
data have;
input Customer Time Sell_Ind;
datalines;
1 2 1
1 3 0
1 4 0
2 23 0
2 24 0
2 30 0
5 12 1
5 11 0
;
data want;
set have;
by customer;
Lag_sell_ind = ifn(first.customer, 0, lag(sell_ind));
Run;

Identify and delete observations that do not meet conditions in Stata

I need help identifying and removing observations that meet certain conditions. My data looks like this:
ID caseID set Var1 Var2
1 1 1 1 0
1 2 1 2 0
1 3 1 3 1
1 4 2 1 0
1 5 2 2 0
1 6 2 3 1
2 7 3 1 0
2 8 3 2 0
2 9 3 3 1
2 10 4 1 0
2 11 4 2 0
2 12 4 3 0
For every set, I want to have one observation in which Var2=1 and two observations in which Var2=0. If they do not meet this condition, I want to delete all observations from the set. For example, I would delete set=4 because Var2=0 for all observations. How can I do this in Stata?
Consider the following new variables:
egen count1 = total(Var2 == 1), by(set)
egen count0 = total(Var2 == 0), by(set)
egen total = total(Var2), by(set)
A literal reading of your question implies that you want to
keep if count1 == 1 & count0 == 2
But if sets are always of size 3 and no values other than 0 or 1 are possible, then you need only count1 == 1 OR count0 == 2 OR total == 1 as a condition.

How to recognise a particular sequence in a dataset and mark it?

How to recognize the first "1,0" sequence in column "Flag" from each group and mark a "1" just like it in column "Flag2"?
ID Flag Flag2
1 1
1 1 1
1 0
1 1
1 0
1 0
2 1
2 1
2 1
2 1 1
2 0
2 0
3 0
3 0
3 0
3 0
4 1
4 1 1
4 0
4 1
The problem requires using a 'lead' concept (value from next row) similar to the lag concept provided by the lag function. There is no built in lead function so you need to be creative.
Merge the data to itself, without a by statement, where the second version is:
Offset by one row by the firstobs data set option
Renames the variables so the lead state can be established with an if
A retained variable tracks if the 1,0 transition has been observed within the group.
Sample code:
data have;input
ID Flag; datalines;
1 1
1 1
1 0
1 1
1 0
1 0
2 1
2 1
2 1
2 1
2 0
2 0
3 0
3 0
3 0
3 0
4 1
4 1
4 0
4 1
run;
data want;
merge
have
have(firstobs=2 rename=(id=lead_id flag=lead_flag))
;
retain flagged_id;
if (id=lead_id) /* lead is in same group */
and (flag=1) and (lead_flag=0) /* transition identified */
and (flagged_id ne id) then /* first such transition for group */
do;
flag2=1; /* flag the lead transition */
flagged_id = id; /* track id where transition last flagged */
end;
drop lead_: flagged:;
run;

SAS reverse count within ID group

I am used to creating count variables within a group where the count goes upwards +1 at each time using :
data objective ;
set eg ;
count + 1 ;
by id age ;
if first.age then count = 1 ;
run ;
However I would like to do the reverse, i.e. where the first value of age in each id group has a value of 10 and each subsequently line has a value of -1 that of the preceding line:
data eg ;
input id age desire ;
cards;
1 5 10
1 4 9
1 3 8
1 2 7
1 1 6
2 10 10
2 9 9
2 8 8
2 7 7
2 6 6
2 5 5
2 4 4
2 3 3
2 2 2
2 1 1
3 7 10
3 6 9
3 5 8
3 4 7
3 3 6
3 2 5
3 1 4
;
run;
data objective ;
set eg ;
count - 1 ;
by id age ;
if first.age_ar then count = 10 ;
run ;
Is there a way to do this as count-1 is not recognised.
You can add -1 without using retain as follows:
data objective;
set eg;
count + -1;
by id descending age;
if first.id then count = 10;
run;
Try this (see comments in code for explanation):
data objective ;
retain count 10; /*retain last countvalue for every observation, 10 is optional as initial value*/
set eg ;
count=count - 1 ; /*count -1 does not work, but count=count-1 with count as retainvariable*/
by id age notsorted;/*notsorted because age is ordered descending*/
if first.id then count = 10 ;/*not sure why you hade age_ar here, should be id to get your desired output*/
run ;
output:

SAS, Filtering for Highest Values

I currently have a health injury data set of scores 0-6, where 0 is no injury and 6 is fatal injury. This is across 6 categorical body region variables. I'm attempting to construct an Abbreviated Injury Scale, where the three highest scores in an observation would be considered for the calculations. How do I filter the three highest in each row in SAS? Below is an example:
ID A B C D E F
1 0 0 0 3 4 0
2 1 2 1 4 0 0
3 0 0 5 0 0 0
4 1 2 1 5 4 0
So in OBS 1, scores 3, 4, and 0 would be used; OBS 2 - 4, 2, and 1; OBS 3 - 5, 0, and 0; OBS 4 - 5, 4, 2.
I've provided code below to do what you asked, and detailed out the steps enough that you should be able to modify it for many options/uses.
Basically, it takes your data, transposes it as Quentin suggested and then uses proc means to output the top 3 observations for each ID.
DATA NEW;
INPUT ID A B C D E F;
CARDS;
1 0 0 0 3 4 0
2 1 2 1 4 0 0
3 0 0 5 0 0 0
4 1 2 1 5 4 0
RUN;
PROC TRANSPOSE DATA=NEW OUT=T_OUT(RENAME=(_NAME_ = VARIABLE COL1=VALUES));
BY ID;
VAR A B C D E F;
PROC PRINT DATA=T_OUT;
RUN;
PROC MEANS DATA=T_OUT NOPRINT;
CLASS ID;
TYPES ID;
VAR VALUES;
OUTPUT OUT=TOP3LIST(RENAME=(_FREQ_=RANK VALUES_MEAN=INDEX_CRITERIA))SUM= MEAN=
IDGROUP(MAX(VALUES) OUT[3] (VALUES VARIABLE)=)/AUTOLABEL AUTONAME;
PROC PRINT DATA=TOP3LIST;
RUN;
***THEN YOU CAN MERGE THIS DATA SET TO YOUR ORIGINAL ONE BY ID TO GET YOUR INDEX CRITERIA ADDED TO IT***;
***THE INDEX_CRITERIA IS A MEAN FROM PROC MEANS BEFORE THE KEEPING OF JUST THE TOP3 VALUES***;
DATA FINAL (DROP=_TYPE_ RANK VALUES_Sum VALUES_1 VALUES_2 VALUES_3 VARIABLE_1 VARIABLE_2 VARIABLE_3);
MERGE NEW TOP3LIST;
INDEX_CRITERIA2=SUM(VALUES_1, VALUES_2, VALUES_3)/3; *THIS CRITERIA IS AVERAGE OF THE KEPT 3 VALUES;
BY ID;
PROC PRINT DATA=FINAL;
RUN;
Best regards,
john