I have a SAS dataset that I need to transpose from wide format to long format
data that I have:
DATES Year1 Year2 Year3
Jan 100 200 300
Data I want:
DATES Year Income
Jan 1 100
Jan 2 200
Jan 3 300
In this scenario the syntax for proc transpose is fairly simple.
proc transpose data=have out=want(rename=(_name_=Year col1=Income));
by date;
var year:; * the ':' is a wildcard character;
run;
The resulting output:
Obs date Year Income
1 Jan year1 100
2 Jan year2 200
3 Jan year3 300
Related
Please find the dataset below:
ID amt order_type no_of_order
1 200 em 6
1 300 on 5
2 600 em 10
Output desired:
ID amt order_type no_of_order
1 500 on 11
2 600 em 10
based on the highest amount i need to pick the order_type.
How can this be achieved in sas code
Sounds like you want to get the sum of the two numeric variables for each value of ID and also select one value for ORDER_TYPE. You appear to want to take the value of ORDER_TYPE which had the largest AMT. Here is simply way using PROC SUMMARY.
data have;
input ID amt order_type $ no_of_order;
cards;
1 200 em 6
1 300 on 5
2 600 em 10
;
proc summary data=have ;
by id;
var amt no_of_order;
output out=want sum= idgroup(max(amt) out[1] (order_type)=);
run;
Results:
no_of_ order_
Obs ID _TYPE_ _FREQ_ amt order type
1 1 0 2 500 11 on
2 2 0 1 600 10 em
Assume you have a temporary SAS data set called EPISODES that contains information about hospital episodes. The data set contains the variables ID_NO (patient ID), ADMIT_DATE (date of admission), DISC_DATE (date of discharge), and TOTAL_COST.
Using this data set, create a new data set in which you will create a separate observation for each day of each hospital episode. If a patient had a hospital episode that was 3 days long, they would have three views in the new data set from that episode -- one for each day.
Each observation in the new data set should have only three variables: the patient identifier ID_NO, the date for that particular day of hospitalization XDATE, and the cost for that day of hospitalization DAILY_COST = TOTAL_COST divided by the number of days in the episode.
My thought is to do this as a loop. Something like the following.
data new_data;
set input_data ;
do xdate = admit_date to disc_data;
daily_cost = .... ;
output new_data ( keep = xdate daily_cost id_no );
end;
run;
*This program block sets up our data set;
data episodes;
INPUT ID_NO $ ADMIT_DATE mmddyy10. TOTAL_COST DISC_DATE mmddyy10.;
DATALINES;
1 01/01/2017 3000 01/03/2017
2 01/01/2017 14000 01/14/2017
;
run;
data new_episodes (keep= ID_NO XDATE DAILY_COST);
set episodes;
NUM_DAYS= DISC_DATE-ADMIT_DATE;
DAILY_COST= TOTAL_COST/(DISC_DATE-ADMIT_DATE);
*Using the Do While loop to create a matrix of date observations;
XDATE=ADMIT_DATE;*initializing our variable;
do while(XDATE<DISC_DATE);
put XDATE=;
XDATE+1;
output;*outputting the date variable;
end;
format XDATE mmddyy10.;
run;
proc print data=new_episodes;
run;
Since SAS stores dates as number of days you can just use a DO loop to increment XDATE from ADMIT_DATE to DISC_DATE.
But you need to decide how to count dates. If you are admitted on Monday and discharged on Tuesday is that one day or two days? If it is one day then do you want XDATE records for Monday or Tuesday? Or both?
Let's make some test data:
data have;
input id_no $ admit_date :yymmdd. total_cost disc_date :yymmdd.;
format admit_date disc_date yymmdd10.;
put (_all_) (+0);
datalines;
1 2017-01-01 3000 2017-01-04
2 2017-01-01 5000 2017-01-06
3 2020-02-23 500 2020-02-23
;
Here is code that treats Monday to Tuesday as one day. So it doesn't output the discharge date (unless it is the same as the admission date).
data want;
set have ;
if admit_date=disc_date then daily_cost=total_cost;
else daily_cost = total_cost / (disc_date - admit_date);
do xdate=admit_date to max(admit_date,disc_date-1) ;
output;
end;
keep id_no xdate daily_cost;
format xdate yymmdd10.;
run;
Results:
daily_
Obs id_no cost xdate
1 1 1000 2017-01-01
2 1 1000 2017-01-02
3 1 1000 2017-01-03
4 2 1000 2017-01-01
5 2 1000 2017-01-02
6 2 1000 2017-01-03
7 2 1000 2017-01-04
8 2 1000 2017-01-05
9 3 500 2020-02-23
If you want to treat a stay from Monday to Tuesday as 2 days then the code is easier.
data want;
set have ;
daily_cost = total_cost / (disc_date - admit_date + 1);
do xdate=admit_date to disc_date ;
output;
end;
keep id_no xdate daily_cost;
format xdate yymmdd10.;
run;
Results:
daily_
Obs id_no cost xdate
1 1 750.000 2017-01-01
2 1 750.000 2017-01-02
3 1 750.000 2017-01-03
4 1 750.000 2017-01-04
5 2 833.333 2017-01-01
6 2 833.333 2017-01-02
7 2 833.333 2017-01-03
8 2 833.333 2017-01-04
9 2 833.333 2017-01-05
10 2 833.333 2017-01-06
11 3 500.000 2020-02-23
In SAS, how to read the following dates with different formats? (especially 01/05/2018 and 1/6/2018)
01/05/2018
1/6/2018
Jan 05 2018
Jan 6 2018
Any help is greatly appreciated. Thanks!
The ANYDTDTM informat will parse most varieties of human readable date, time or datetime representations into a SAS datetime value. The datepart function of that value will return the SAS date value thereof.
The ANYDTDTE informat will also parse a variety of date, time or datetime representations and return the date part implicitly. However it fails on some of your data items where ANYDTDTM does not.
data _null_;
input
#1 a_datetime_value anydtdtm.
#1 a_date_value anydtdte.
;
hot_date = datepart(a_datetime_value);
put
'_infile_ ' _infile_
/ 'anydtdtm. ' a_datetime_value datetime16.
/ 'datepart() ' hot_date yymmdd10.
/ 'anydtdte. ' a_date_value yymmdd10.
/;
datalines;
01/05/2018
1/6/2018
Jan 05 2018
Jan 6 2018
run;
==== LOG ====
_infile_ 01/05/2018
anydtdtm. 05JAN18:00:00:00
datepart() 2018-01-05
anydtdte. .
_infile_ 1/6/2018
anydtdtm. 06JAN18:00:00:00
datepart() 2018-01-06
anydtdte. 2018-01-06
_infile_ Jan 05 2018
anydtdtm. 05JAN18:00:00:00
datepart() 2018-01-05
anydtdte. .
_infile_ Jan 6 2018
anydtdtm. 06JAN18:00:00:00
datepart() 2018-01-06
anydtdte. .
Read the SAS documentation and conference papers for a greater exploration of the ANYDT** family of informats.
I have this kind of data
Configuration Retail Price month
1 450 Jan
1 520 Feb
1 630 Mar
5 650 Jan
5 320 Feb
5 480 Mar
9 770 Jan
9 180 Feb
9 320 Mar
I want my data to look like this
Configuration Jan Feb Mar
1 450 520 630
5 650 320 480
9 770 180 320
Generating some data to tinker with:
data begin;
length Configuration 3 Retail_Price 3 month $3;
input Configuration Retail_Price month;
datalines;
1 450 Jan
1 520 Feb
1 630 Mar
5 650 Jan
5 320 Feb
5 480 Mar
9 770 Jan
9 180 Feb
9 320 Mar
;
run;
In SAS everything must be sorted right. (Or you can use index, but that different thing) We use PROC SORT for this.
proc sort data= begin; by Configuration month; run;
A way to calculate sum is to utilize PROC MEANS.
proc means data=begin noprint; /*Noprint is for convienience*/
by Configuration month; /*These are the subsets*/
output out = From_means(drop=_TYPE_ _FREQ_) /*Drops are for ease sake*/
sum(Retail_Price)=
;
run;
At this point we have the data in narrow format. A way to transform this to wide format is PROC TRANSPOSE.
proc transpose data=From_means out=Wide_format;
by Configuration;
id month;
run;
Bear in mind that there are multiple other ways to accomplish the same. A popular way is to utilize PROC SQL for almost everything, but in my experience large datasets are better to be handled by SAS proc commands..
I have a dataset like the following:
x y
16:00 1
17:00 2
18:00 2
19:00 3
20:00 4
21:00 5
22:00 6
23:00 1
24:00 1
01:00 2
02:00 3
03:00 1
04:00 7
...
I want to plot the relationship between x and y using the following code. I want my x axis start from 16:00 and end at 04:00. However using the code below, x axis start from 00:00 and end at 16:00. can anyone teach me how to adjust my code please. ( i dont want to type the order one by one like the following order = ("16:00" ..."04:00").
PROC SGPLOT DATA = data;
SERIES X = x Y = y;
axis order=("16:00:00"t to "03:00:00"t by hour);
TITLE 'Plot';
RUN;
So the problem is that numerical X axis values cannot be put out of order. And a time in SAS 1am < 11pm. So you cannot go around the clock, so to say.
A work around is to make the time values date times. That is, add a day component to it. Then you only display the time portion.
data have;
informat x time5. y best.;
format x time5.;
input x y;
datalines;
16:00 1
17:00 2
18:00 2
19:00 3
20:00 4
21:00 5
22:00 6
23:00 1
24:00 1
01:00 2
02:00 3
03:00 1
04:00 7
;
run;
data have;
retain day 0;
set have;
format x_new datetime.;
/*Count Days*/
if x = "24:00"t then
day = day + 1;
x_new = dhms(day,hour(x),minute(x),second(x));
run;
proc sgplot data=have;
series x=x_new y=y;
xaxis valuesformat=tod5.;
run;
Here I am looking for the 24 hour mark to increment the day count. Then creating a new variable to hold the day + the time.
When plotting, tell SAS to use the TODw.d format which only displays the time portion.
Here's what I get