Assume you have a temporary SAS data set called EPISODES that contains information about hospital episodes. The data set contains the variables ID_NO (patient ID), ADMIT_DATE (date of admission), DISC_DATE (date of discharge), and TOTAL_COST.
Using this data set, create a new data set in which you will create a separate observation for each day of each hospital episode. If a patient had a hospital episode that was 3 days long, they would have three views in the new data set from that episode -- one for each day.
Each observation in the new data set should have only three variables: the patient identifier ID_NO, the date for that particular day of hospitalization XDATE, and the cost for that day of hospitalization DAILY_COST = TOTAL_COST divided by the number of days in the episode.
My thought is to do this as a loop. Something like the following.
data new_data;
set input_data ;
do xdate = admit_date to disc_data;
daily_cost = .... ;
output new_data ( keep = xdate daily_cost id_no );
end;
run;
*This program block sets up our data set;
data episodes;
INPUT ID_NO $ ADMIT_DATE mmddyy10. TOTAL_COST DISC_DATE mmddyy10.;
DATALINES;
1 01/01/2017 3000 01/03/2017
2 01/01/2017 14000 01/14/2017
;
run;
data new_episodes (keep= ID_NO XDATE DAILY_COST);
set episodes;
NUM_DAYS= DISC_DATE-ADMIT_DATE;
DAILY_COST= TOTAL_COST/(DISC_DATE-ADMIT_DATE);
*Using the Do While loop to create a matrix of date observations;
XDATE=ADMIT_DATE;*initializing our variable;
do while(XDATE<DISC_DATE);
put XDATE=;
XDATE+1;
output;*outputting the date variable;
end;
format XDATE mmddyy10.;
run;
proc print data=new_episodes;
run;
Since SAS stores dates as number of days you can just use a DO loop to increment XDATE from ADMIT_DATE to DISC_DATE.
But you need to decide how to count dates. If you are admitted on Monday and discharged on Tuesday is that one day or two days? If it is one day then do you want XDATE records for Monday or Tuesday? Or both?
Let's make some test data:
data have;
input id_no $ admit_date :yymmdd. total_cost disc_date :yymmdd.;
format admit_date disc_date yymmdd10.;
put (_all_) (+0);
datalines;
1 2017-01-01 3000 2017-01-04
2 2017-01-01 5000 2017-01-06
3 2020-02-23 500 2020-02-23
;
Here is code that treats Monday to Tuesday as one day. So it doesn't output the discharge date (unless it is the same as the admission date).
data want;
set have ;
if admit_date=disc_date then daily_cost=total_cost;
else daily_cost = total_cost / (disc_date - admit_date);
do xdate=admit_date to max(admit_date,disc_date-1) ;
output;
end;
keep id_no xdate daily_cost;
format xdate yymmdd10.;
run;
Results:
daily_
Obs id_no cost xdate
1 1 1000 2017-01-01
2 1 1000 2017-01-02
3 1 1000 2017-01-03
4 2 1000 2017-01-01
5 2 1000 2017-01-02
6 2 1000 2017-01-03
7 2 1000 2017-01-04
8 2 1000 2017-01-05
9 3 500 2020-02-23
If you want to treat a stay from Monday to Tuesday as 2 days then the code is easier.
data want;
set have ;
daily_cost = total_cost / (disc_date - admit_date + 1);
do xdate=admit_date to disc_date ;
output;
end;
keep id_no xdate daily_cost;
format xdate yymmdd10.;
run;
Results:
daily_
Obs id_no cost xdate
1 1 750.000 2017-01-01
2 1 750.000 2017-01-02
3 1 750.000 2017-01-03
4 1 750.000 2017-01-04
5 2 833.333 2017-01-01
6 2 833.333 2017-01-02
7 2 833.333 2017-01-03
8 2 833.333 2017-01-04
9 2 833.333 2017-01-05
10 2 833.333 2017-01-06
11 3 500.000 2020-02-23
Related
Please find the dataset below:
ID amt order_type no_of_order
1 200 em 6
1 300 on 5
2 600 em 10
Output desired:
ID amt order_type no_of_order
1 500 on 11
2 600 em 10
based on the highest amount i need to pick the order_type.
How can this be achieved in sas code
Sounds like you want to get the sum of the two numeric variables for each value of ID and also select one value for ORDER_TYPE. You appear to want to take the value of ORDER_TYPE which had the largest AMT. Here is simply way using PROC SUMMARY.
data have;
input ID amt order_type $ no_of_order;
cards;
1 200 em 6
1 300 on 5
2 600 em 10
;
proc summary data=have ;
by id;
var amt no_of_order;
output out=want sum= idgroup(max(amt) out[1] (order_type)=);
run;
Results:
no_of_ order_
Obs ID _TYPE_ _FREQ_ amt order type
1 1 0 2 500 11 on
2 2 0 1 600 10 em
I have a SAS dataset that I need to transpose from wide format to long format
data that I have:
DATES Year1 Year2 Year3
Jan 100 200 300
Data I want:
DATES Year Income
Jan 1 100
Jan 2 200
Jan 3 300
In this scenario the syntax for proc transpose is fairly simple.
proc transpose data=have out=want(rename=(_name_=Year col1=Income));
by date;
var year:; * the ':' is a wildcard character;
run;
The resulting output:
Obs date Year Income
1 Jan year1 100
2 Jan year2 200
3 Jan year3 300
I have a data which data which looks something like this
/********************************************************************************/
YYMM Sector
1701 Agriculture
1611 Retail
1501 CRE
/*************/
There is another dataset which looks something like this/*************
Customer_ID YYMM
XXXX 1702
XXXX 1701
XXXX 1612
XXXX 1611
XXXX 1610
XXXX 1510
XXXX 1509
/********************************************************/
So basically I want to mere these two datasets on the basis of YYMM but and merge in the sectors. But since the previous data has only few YYMM all I want to do is copy the sectors till a new yymm is encountered from the first dataset.
So the sector from 1701 to 1612 should be agriculture and the sector from 1611 to 1502 is retail and for any month before 1501 it has to be CRE.
Can you please tell me how to do it?
Here is a SQL based solution (similar to the one proposed by pinegulf).
Let us create test datasets:
data T01;
length Sector $20;
infile cards;
input YYMM_to Sector;
cards;
1701 Agriculture
1611 Retail
1501 CRE
;
run;
data T02;
length Customer_id $10;
infile cards;
input Customer_ID YYMM;
cards;
AXXX 1702
BXXX 1701
CXXX 1612
DXXX 1611
EXXX 1610
FXXX 1510
GXXX 1509
;
run;
We can add a "YYMM_from" column to T01:
proc sort data=T01;
by YYMM_to;
run;
data T01;
set T01;
by YYMM_to;
YYMM_from=lag(YYMM_to);
if _N_=1 then YYMM_from=0;
run;
proc print data=T01;
run;
We get:
Obs Sector YYMM_to YYMM_from
------------------------------------------
1 CRE 1501 0
2 Retail 1611 1501
3 Agriculture 1701 1611
Then comes the join:
proc sql;
create table T03 as
select a.*, b.Sector
from T02 a LEFT JOIN T01 b
on YYMM_from<a.YYMM<=YYMM_to;
quit;
proc print data=T03;
quit;
We get:
Obs Customer_id YYMM Sector
-----------------------------------------
1 DXXX 1611 Retail
2 EXXX 1610 Retail
3 FXXX 1510 Retail
4 GXXX 1509 Retail
5 BXXX 1701 Agriculture
6 CXXX 1612 Agriculture
7 AXXX 1702
Here is a solution with proc format. Since your data is in yymm format you can set the limits logical without the data conversion, but I feel more comfortable with actual dates.
data Begin;
input Customer_ID $ YYMM $;
cards;
XXXX 1702
YYYY 1701
ZZZZ 1612
OOOO 1611
AAAA 1610
FFFF 1510
DDDD 1509
; run;
data with_date;
set begin;
date = mdy(substr(yymm,3,2), 1, substr(yymm,1,2) );
run;
proc format; /*Didn't check the bins too much. Adjust as needed.*/
value sector
low - '1jan2015'd ='lows'
'1jan2015'd < - '1nov2016'd = 'CRE'
'1nov2016'd < - '1jan2017'd = 'Retail'
'1jan2017'd < - high = 'Agriculture'
;
run;
data wanted;
set with_date;
format date sector.;
run;
For more on proc format see http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473474.htm
I have a dataset like the following:
x y
16:00 1
17:00 2
18:00 2
19:00 3
20:00 4
21:00 5
22:00 6
23:00 1
24:00 1
01:00 2
02:00 3
03:00 1
04:00 7
...
I want to plot the relationship between x and y using the following code. I want my x axis start from 16:00 and end at 04:00. However using the code below, x axis start from 00:00 and end at 16:00. can anyone teach me how to adjust my code please. ( i dont want to type the order one by one like the following order = ("16:00" ..."04:00").
PROC SGPLOT DATA = data;
SERIES X = x Y = y;
axis order=("16:00:00"t to "03:00:00"t by hour);
TITLE 'Plot';
RUN;
So the problem is that numerical X axis values cannot be put out of order. And a time in SAS 1am < 11pm. So you cannot go around the clock, so to say.
A work around is to make the time values date times. That is, add a day component to it. Then you only display the time portion.
data have;
informat x time5. y best.;
format x time5.;
input x y;
datalines;
16:00 1
17:00 2
18:00 2
19:00 3
20:00 4
21:00 5
22:00 6
23:00 1
24:00 1
01:00 2
02:00 3
03:00 1
04:00 7
;
run;
data have;
retain day 0;
set have;
format x_new datetime.;
/*Count Days*/
if x = "24:00"t then
day = day + 1;
x_new = dhms(day,hour(x),minute(x),second(x));
run;
proc sgplot data=have;
series x=x_new y=y;
xaxis valuesformat=tod5.;
run;
Here I am looking for the 24 hour mark to increment the day count. Then creating a new variable to hold the day + the time.
When plotting, tell SAS to use the TODw.d format which only displays the time portion.
Here's what I get
I'm trying to transpose a data using values as variable names and summarize numeric data by group, I tried with proc transpose and with proc report (across) but I can't do this, the unique way that I know to do this is with data set (if else and sum but the changes aren't dynamically)
For example I have this data set:
school name subject picked saving expenses
raget John math 10 10500 3500
raget John spanish 5 1200 2000
raget Ruby nosubject 10 5000 1000
raget Ruby nosubject 2 3000 0
raget Ruby math 3 2000 500
raget peter geography 2 1000 0
raget noname nosubject 0 0 1200
and I need this in 1 line, sum of 'picked' by the names of students, and later sum of picked by subject, the last 3 columns is the sum total for picked, saving and expense:
school john ruby peter noname math spanish geography nosubject picked saving expenses
raget 15 15 2 0 13 5 2 12 32 22700 8200
If it's possible to be dynamically changed if I have a new student in the school or subject?
It's a little difficult because you're summarising at more than one level, so I've used PROC SUMMARY and chosen different _TYPE_ values. See below:
data have;
infile datalines;
input school $ name $ subject : $10. picked saving expenses;
datalines;
raget John math 10 10500 3500
raget John spanish 5 1200 2000
raget Ruby nosubject 10 5000 1000
raget Ruby nosubject 2 3000 0
raget Ruby math 3 2000 500
raget peter geography 2 1000 0
raget noname nosubject 0 0 1200
;
run;
proc summary data=have;
class school name subject;
var picked saving expenses;
output out=want1 sum(picked)=picked sum(saving)=saving sum(expenses)=expenses;
run;
proc transpose data=want1 (where=(_type_=5)) out=subs (where=(_NAME_='picked'));
by school;
id subject;
run;
proc transpose data=want1 (where=(_type_=6)) out=names (where=(_NAME_='picked'));
by school;
id name;
run;
proc sql;
create table want (drop=_TYPE_ _FREQ_ name subject) as
select
n.*,
s.*,
w.*
from want1 (where=(_TYPE_=4)) w,
names (drop=_NAME_) n,
subs (drop=_NAME_) s
where w.school = n.school
and w.school = s.school;
quit;
I've also tested this code by adding new schools, names and subjects and they do appear in the final table. You'll note that I haven't hardcoded anything (e.g. no reference to math or John), so the code is dynamic enough.
PROC REPORT is an interesting alternative, particularly if you want the printed output rather than as a dataset. You can use ODS OUTPUT to get the output dataset, but it's messy as the variable names aren't defined for some reason (they're "C2" etc.). The printed output of this one is a little messy also as the header rows don't line up, but that can be fixed with some finagling if that's desired.
data have;
input school $ name $ subject $ picked saving expenses;
datalines;
raget John math 10 10500 3500
raget John spanish 5 1200 2000
raget Ruby nosubject 10 5000 1000
raget Ruby nosubject 2 3000 0
raget Ruby math 3 2000 500
raget peter geography 2 1000 0
raget noname nosubject 0 0 1200
;;;;
run;
ods output report=want;
proc report nowd data=have;
columns school (name subject),(picked) picked=picked2 saving expenses;
define picked/analysis sum ' ';
define picked2/analysis sum;
define saving/analysis sum ;
define expenses/analysis sum;
define name/across;
define subject/across;
define school/group;
run;