SAS creating identifier - sas

I have a dataset with patient having multiple courses during a treatment phase.
Data set looks like:
C 1 1 0
C 0 0 1
C 1 1 0
C 0 0 1
The first two rows: patient start at row1 and finishes at row2. This is the first course of patient C.
The second two rows: patient C again starts at row3 and finishes at row four.
How can I create an identifier for these two courses using the first and last statements in SAS.
Expected output should look like this;
C 1 1 0 23
C 0 0 1 23
C 1 1 0 24
C 0 0 1 24
C 1 1 1 25
The counts for one course should be the same and different from courses to courses within he same patient.
Thanks.

Assuming the third variable, whatever it is, is your 'end state' the following works. Probably not the easiest method but hopefully clear. I don't know if First/Last will actually help in this situation except for when the ID switches.
Idea is look for the V3=1 and then set a flag to 1. If the flag is 1, then the next record increments and resets the flag and the process is continued. Retain is used to hold the values of Flag and Course across the rows.
data have;
input ID $ v1-v3;
cards;
C 1 1 0
C 0 0 1
C 1 1 0
C 0 0 1
D 1 0 0
D 0 1 0
D 0 0 1
;
run;
data want;
set have;
BY ID;
retain flag 0 course;
if first.ID then do;
Course=1;
flag=0;
end;
if flag=1 then do;
course=course+1;
flag=0;
end;
else if v3=1 and flag=0 then flag=1;
run;
proc print;
run;

Related

Concatenation in SAS with name

Hi i have 25 variables like Name1, name2, name3.....name 25. Few names are having 0 data which means few name are have the value 0.
I want to concate all the name from 1 to 25 and drop those name which has 0 values.
i am trying
data test;
set need;
new_name = catx (',',Name1,Name2, Name3, Name4, Name5, Name6, Name7, Name8, Name9, Name10, Name11, Name12, Name13, Name14, Name15, Name16, Name17, Name18, Name19, Name20, Name21, Name22, Name23, Name24, Name25);
); run;
but i am getting name like 0,0,0,0,0,Amit,0,0,0,Dave,0,0,0,Pam,0,0,0,0,Deepka,0,0,0,0,0,0. However, i only need names not 0 value in the outcome. Variable data type is charracter
Any help would be much appreciated
This should work:
data have;
input (Name1-Name25) (:$20.);
datalines;
0 0 0 0 0 Amit 0 0 0 Dave 0 0 0 Pam 0 0 0 0 Deepka 0 0 0 0 0 0
Tom Steve 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Richard 0 0 0 0
;
run;
First I changed 0 with nulls, so catx doesn't concatenate those zeros.
Data have;
set have;
array Name[25];
do i=1 to 25;
if Name[i]='0' then do;
Name[i]='';
end;
end;
run;
Your code then works just fine.
proc sql;
create table want as
select catx (',',Name1,Name2, Name3, Name4, Name5, Name6, Name7, Name8, Name9, Name10, Name11, Name12, Name13, Name14, Name15, Name16, Name17, Name18, Name19, Name20, Name21, Name22, Name23, Name24, Name25) as new_name
from have
;
quit;
This is naturally not the only way to do this. Best way would probably be to fix the issue at the source - to put proper null values into the table instead of 0s.

SAS LOOP - create columns from the records which are having a value

Suppose i have random diagnostic codes, such as 001, v58, ..., 142,.. How can I construct columns from the codes which is 1 for the records?
Input:
id found code
1 1 001
2 0 v58
3 1 v58
4 1 003
5 0 v58
......
......
15000 0 v58
Output:
id code_001 code_v58 code_003 .......
1 1 0 0
2 0 0 0
3 0 1 0
4 1 0 0
5 0 0 0
.........
.........
You will want to TRANSPOSE the values and name the pivoted columns according to data (value of code) with an ID statement.
Example:
In real world data it is often the case that missing diagnoses will be flagged zero, and that has to be done in a subsequent step.
data have;
input id found code $;
datalines;
1 1 001
2 0 v58
2 1 003 /* second diagnosis result for patient 2 */
3 1 v58
4 1 003
5 0 v58
;
proc transpose data=have out=want(drop=_name_) prefix=code_;
by id;
id code; * column name becomes <prefix><code>;
var found;
run;
* missing occurs when an id was not diagnosed with a code;
* if that means the flag should be zero (for logistic modeling perhaps)
* the missings need to be changed to zeroes;
data want;
set want;
array codes code_:;
do _n_ = 1 to dim(codes); /* repurpose automatic variable _n_ for loop index */
if missing(codes(_n_)) then codes(_n_) = 0;
end;
run;

SAS and do loop

I'm writing a program in SAS.
Here's the dataset I have:
id huuse days
1 0 4
1 0 3
1 1 12
1 1 1
1 2 15
2 1 13
2 0 16
2 1 18
2 0 44
For each ID, I want to delete the record if variable huuse ne 1, until I get to the first huuse=1. Then I want to keep that record and all subsequent records for that id, no matter what value huuse is. So for id=1, I want to delete the first two records than keep all records for id=1 starting with the 3rd record. For id=2, the first record has huuse=1, so I want to keep all records for id=2.
The data set I want should look like this:
id huuse days
1 0 4
1 0 3
1 1 12
1 1 1
1 2 15
2 1 13
2 0 16
2 1 18
2 0 44
I tried this code, but it removes all records that have huuse ne 1.
data want;
set have;
by id;
do until (huuse=1);
if huuse = 1 then LEAVE;
if huuse ne 1 then DELETE;
END;
run;
I've tried several variations of do loops, but they all do the same thing.
The DATA step is a program with an implicit loop that reads every record of the data set specified in the SET statement. Any program data vector (pdv) variables not coming from the data set are, by default, reset to missing at the top of the implicit loop. You change that behavior using a RETAIN statement to name variables that should not get reset.
So, in your problem you have two situations when a tracking variable is needed. The variable will track the state of the condition Have I seen huuse=1 yet in this group ?. Call this variable one_flag
RETAIN one_flag; so you control when it's value changes
At the start of a BY group one_flag needs to be reset to false (0)
When huuse is first seen as 1 set the flag to true (1)
Example:
data want(drop=one_flag);
set have;
by id;
retain one_flag 0;
if first.id then one_flag = 0;
if not one_flag and huuse = 1 then one_flag = 1;
if one_flag then OUTPUT; * want all rows in group starting at first huuse=1;
run;
You can place the SET and BY statement inside an explicit DO and that changes the operating behavior of the program, especially if the explicit loop is terminated according to a LAST.<var> automatic variable. Such a loop is commonly called a DOW loop by SAS programmers. There is no phrase DOW loop in the SAS documentation.
Example:
data want;
do until (last.id);
set have;
by id;
if not one_flag and huuse=1 then one_flag = 1;
if one_flag then OUTPUT; * want all rows in group starting at first huuse=1;
end;
run;
Because the looping is explicit and never reaches the TOP of the program with in the loop, there is no need to RETAIN the flag variable, nor reset it. Program variables that are not retained are reset automatically at the top of the program, and the top of the program is only reached at the start of the BY group. Learn more about this programming construct in the SGF 2013 paper "The Magnificent DO", Paul M. Dorfman
Your source and result are same :-)
But if I understood your question correctly the solution is quite simple with a retain solution. I add 2 lines to the example to make it clear that I understood correctly.
The code with example table:
data test;
id=1;huuse=0;days=4;output;
id=1;huuse=0;days=3;output;
id=1;huuse=1;days=12;output;
id=1;huuse=1;days=1;output;
id=1;huuse=2;days=15;output;
id=2;huuse=1;days=13;output;
id=2;huuse=0;days=16;output;
id=2;huuse=1;days=18;output;
id=2;huuse=0;days=44;output;
id=3;huuse=0;days=1;output;
id=3;huuse=1;days=2;output;
run;
data test_output;
set test;
retain keep_id -1;
if (keep_id ne id and huuse ne 0) then keep_id=id;
if keep_id = id then output;
run;
/* the results:
id huuse days
1 1 12 1
1 1 1 1
1 2 15 1
2 1 13 2
2 0 16 2
2 1 18 2
2 0 44 2
3 1 2 3
*/

SAS: Extract previous and next observations

I have a dataset like this(type is an indicator):
datetime type
...
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
...
I was trying to extract data with type 1 and also the previous and next one. Just try to extract (-1,+1) window based on type 1.
datetime type
...
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
...
I found a similar post here. I copied and pasted the code, but I am not quite sure what does 'x' mean in his code. SAS gives me 'File WORK.x does not exist'.
Can someone help me out? Thx.
The X data set in the other post is the same source table you are filtering, so the logical order of the code is:
Check every row in the table 'Have', _N_ holds the current row number,
If Type = 1 then Set Have Point=_N_ goes to row _N_ in the 'Have' table and outputs that row to the new table 'want', then continues to the next row. The _N_ can be the pointer to the current, previous or next row. ( The two IF statements handles the cases of first row and last row; where there is no Previous or no Next)
Full Working Code:
data have;
length datetime $23.;
input datetime $ type ;
datalines;
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
;
run;
data want;
set have nobs=nobs;
if type = 1 then do;
current = _N_;
prev = current - 1;
next = current + 1;
if prev > 0 then do;
set have point = prev;
output;
end;
set have point = current;
output;
if next <= nobs then do;
set have point = next;
output;
end;
end;
run;
proc sort data=want noduprecs;
by _all_ ; Run;
Note: I added an extra step proc sort to remove duplicate rows.
Output:
datetime=ddmmyy:10:31:00 type=0
datetime=ddmmyy:10:32:00 type=1
datetime=ddmmyy:10:33:00 type=0
datetime=ddmmyy:10:34:00 type=1
datetime=ddmmyy:10:35:00 type=0
For you example data that does not have any id or group variable it should be pretty straight forward. Instead of thinking about moving back and forth in the file, just create new variables that contain the previous (LAG_TYPE) and next (LEAD_TYPE) value for TYPE. Then your requirement to keep the observations before the one with TYPE=1 is translated to keeping the observations where LEAD_TYPE=1.
Let's convert your sample data into a dataset.
data have ;
input datetime :$15. type ;
cards;
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
;
Rather than actually keeping the required observations I will make a new variable KEEP that will be true for records that meet your criteria.
data want ;
recno+1;
set have end=eof;
lag_type=lag(type);
if not eof then set have(firstobs=2 keep=type rename=(type=lead_type));
else lead_type=.;
keep= (type=1 or lag_type=1 or lead_type=1) ;
run;
Here is the result.
recno datetime type lag_type lead_type keep
1 ddmmyy:10:30:00 0 . 0 0
2 ddmmyy:10:31:00 0 0 1 1
3 ddmmyy:10:32:00 1 0 0 1
4 ddmmyy:10:33:00 0 1 1 1
5 ddmmyy:10:34:00 1 0 0 1
6 ddmmyy:10:35:00 0 1 . 1
Move next observation up and compare two observations at the same row or use lag to compare the current observation and previous observation.
data have;
length datetime $23.;
input datetime $ type ;
datalines;
ddmmyy:10:30:00 0
ddmmyy:10:31:00 0
ddmmyy:10:32:00 1
ddmmyy:10:33:00 0
ddmmyy:10:34:00 1
ddmmyy:10:35:00 0
;
run;
data want;
merge have have(firstobs=2 keep=type rename=(type=_type));
if max(type,_type) or max(type,lag(type)) ;
drop _type;
run;

SAS PROC TABULATE percentages by multiple class variables with missing values

I would like to create a table that has three variables where var2 is a percentage of var1 and var3 is a percentage of var 2, broken down by class variables that have missing values.
To explain, imagine I have data showing who applied, was interviewed, and was hired for a job, e.g.
data job;
input applied interviewed hired;
datalines;
1 1 1
1 1 1
1 1 1
1 1 0
1 1 0
1 1 0
1 0 .
1 0 .
1 0 .
1 0 .
;
run;
it's very easy to create a table that shows the count of who applied, and then the percentage of those who were interviewed and then of those people, the percentage who was hired.
proc tabulate data = job;
var applied interviewed hired;
tables applied * n (interviewed hired) * mean * f=percent6.;
run;
which gives:
applied interviewed hired
10 60% 50%
Now I would like to break that down by several class variables with missing values.
data have;
input sex degree exp applied interviewed hired;
datalines;
0 1 1 1 1 1
1 . 0 1 1 1
. 0 1 1 1 1
0 1 0 1 1 0
1 0 1 1 1 0
0 1 0 1 1 0
1 . 1 1 0 .
0 1 . 1 0 .
. 0 0 1 0 .
1 0 0 1 0 .
;
run;
If I do one class variable at a time it will give me the correct percentages:
proc tabulate data = have format = 6.;
class sex;
var applied interviewed hired;
tables sex, applied * sum (interviewed hired) * mean * f=percent6.;
run;
Is there a way to do all three class variables in the table at once and get the right percentage for each category. so the table looks like:
applied interviewed hired
sex
0 4 75% 33%
1 4 50% 50%
degree
0 4 50% 50%
1 4 75% 33%
exp
0 5 60% 33%
1 4 75% 67%
This is something I must do many, many times and I need to populate tables in a report with the numbers, so I'm looking for a solution where the table can be printed all in one step.
How would you solve this problem?
The problem you're running into is that of missing data. When a case is missing for any class variable, it is eliminated from the entire table, unless you specify MISSING in the proc call. So, for example, your 4th sex=0 who did not interview was missing EXP; so they didn't show up at all in the table, though you would want them showing up in SEX.
You can get the correct numbers, mostly:
proc tabulate data = have format = 6. missing;
class sex degree exp;
var applied interviewed hired;
tables (sex degree exp), applied * sum (interviewed hired) * mean * f=percent6.;
run;
However, you have an extra row that includes those with missing data. You cannot eliminate those rows from the printed output while also including them in the other class calculations; this is just one of those limitations of SAS tabulation. Other PROCs have a similar problem; PROC FREQ is the only one that doesn't do this if you have multiple tables generated, but even then within one table (combined with asterisks) you will have the same issue.
The only way I've found around this is to output the table to a dataset and then filter out those rows, and PROC REPORT or PRINT or TABULATE the data back out.
I think this is close to what you want. You will have to fix the row labels, but it is one PROC TABULATE step.
title;
data have;
input sex degree exp applied interviewed hired;
datalines;
0 1 1 1 1 1
1 . 0 1 1 1
. 0 1 1 1 1
0 1 0 1 1 0
1 0 1 1 1 0
0 1 0 1 1 0
1 . 1 1 0 .
0 1 . 1 0 .
. 0 0 1 0 .
1 0 0 1 0 .
;
run;
proc print;
run;
proc summary data=have missing ;
class sex degree exp;
ways 1;
output out=stats sum(applied)= mean(interviewed hired)= / levels;
run;
data stats2;
set stats;
if n(of sex degree exp) eq 0 then delete;
run;
proc print;
run;
proc tabulate data=stats2;
class _type_ / descend;
class _level_;
var applied interviewed hired;
tables (_type_*_level_),applied*sum='N'*f=8. (interviewed hired)*sum='Percent'*f=percent6.;
run;
/**/
/* applied interviewed hired*/
/*sex */
/* 0 4 75% 33%*/
/* 1 4 50% 50%*/
/*degree */
/* 0 4 50% 50%*/
/* 1 4 75% 33%*/
/*exp */
/* 0 5 60% 33%*/
/* 1 4 75% 67%*/