Proc Tabulate: Reordering a Formatted Variable - sas

I created a day variable using the following code:
DAY=datepart(checkin_date_time); /*example of checkin_date_time 1/1/2014 4:44:00*/
format DAY DOWNAME.;
Sample Data:
ID checkin_date_time Admit_Type BED_ORDERED_TO_DISPO
1 1/1/2014 4:40:00 ICU 456
2 1/1/2014 5:64:00 Psych 146
3 1/1/2014 14:48:00 Acute 57
4 1/1/2014 20:34:00 ICU 952
5 1/2/2014 10:00:00 Psych 234
6 1/2/2014 3:48:00 Psych 846
7 1/2/2014 10:14:00 ICU 90
8 1/2/2014 22:27:00 ICU 148
I want to analyze some data using Proc Tab where day is one of the class variables and have the day of week appear in chronological order in the output; however, the output table begins with Tuesday. I would like it to start with Sunday. I've read over the the following page http://support.sas.com/resources/papers/proceedings11/085-2011.pdf and tried the proc format invalue code but it's producing a table that where the "day of week" = "21". Not quite sure where to go from here.
Thanks!
proc format;
invalue day_name
'Sunday'=1
'Monday'=2
'Tuesday'=3
'Wednesday'=4
'Thursday'=5
'Friday'=6
'Saturday'=7;
value day_names
1='Sunday'
2='Monday'
3='Tuesday'
4='Wednesday'
5='Thursday'
6='Friday'
7='Saturday';
run;
data Combined_day;
set Combined;
day_of_week = input(day,day_name.);
run;
proc tabulate data = Combined_day;
class Day Admit_Type;
var BED_ORDERED_TO_DISPO ;
format day_of_week day_names.;
table Day*Admit_Type, BED_ORDERED_TO_DISPO * (N Median);
run;

Fundamentally, you are confusing actual values with displayed values (i.e., formats). Specifically, datepart extracts the date portion out of a date/time field. Then, applying a format only changes how it is displayed not actual underlying value. So below DAY never contains the character values of 'WEDNESDAY' or 'THURSDAY' but original integer value (19724 and 19725).
DAY = datepart(checkin_date_time); // DATE VALUE
format DAY DOWNAME.; // FORMATTED DATE VALUE (SAME UNDERLYING DATE VALUE)
Consider actually assigning a column as weekday value using WEEKDAY function. Then apply your user-defined format for proc tabulate.
data Combined_day;
set Combined;
checkin_date = datepart(checkin_date_time); // NEW DATE VALUE (NO TIME)
format checkin_date date9.;
checkin_weekday = weekday(checkin_date); // NEW INTEGER VALUE OF WEEKDAY
run;
proc tabulate data = Combined_day;
class checkin_weekday Admit_Type;
var BED_ORDERED_TO_DISPO ;
format checkin_weekday day_names.; // APPLY USER DEFINED FORMAT
table checkin_weekday*Admit_Type, BED_ORDERED_TO_DISPO * (N Median);
run;

Related

I want to extract month data from a datetime format column in SAS

I have a data in format 01 Jan 19.00.00 (datetime), and I want to extract the month name only from it.
Tried the below code but getting output in numbers i.e. 1 2 3 and so on. I want the output either in Jan Feb mar format or January February march format.
data want;
set detail;
month = month(datepart(BEGIN_DATE_TIME));
run;
You can use the MONNAME format.
data test;
dt = datetime();
monname = put(datepart(dt),MONNAME.);
put monname=;
run;
If you want "OCT" not "OCTOBER" you can add a 3 to the format (MONNAME3.).
If you are using the value in a report the better approach might be to use a date value formatted with MONNAME.
The values of a date formatted variable will be ordered properly when the variable is used in a CLASS or BY statement. If you had instead computed a new variable as the month name, the default ordering of values would be alphabetical.
data want;
set have;
begin_date = datepart(BEGIN_DATE_TIME);
format begin_date MONNAME3.;
run;

I would like to get Date and Time separately from DATETIME. format

I would like to get only date and time separately from 01JAN13:08:29:00
Format & Infomat available in Dataset is:
Date Num 8 DATETIME.(format) ANYDTDTM40(informat)
And If I run datepart() on 01JAN13:08:29:00 I get output as 19359 (I don't want it.)
The DATEPART function extracts the date value from a datetime value. The date value as you have seen is simply a number. A date format must be applied to a variable holding a date value. Base SAS variables have only two value types, character and numeric.
data want;
now_dtm = datetime();
now_dt = datepart(now_dtm);
now_dt_unformatted = now_dt;
format now_dtm datetime.;
format now_dt date9.; * <----- this is what you need, format stored in data set header information;
run;
proc print data=want;
run;
* you can change the format temporarily during a proc step;
proc print data=want;
format now_dt yymmdd10.; * <---- changes format for duration of proc step;
format now_dt_unformatted mmddyy10.;
run;
Actually 19,359 is exactly the value you want. You started with the number of seconds since 1960 and converted it to the number of days since 1960.
data x ;
dt = '01JAN13:08:29:00'dt ;
date = datepart(dt);
time = timepart(dt);
put (dt date time) (=);
run;
Results
dt=1672648140 date=19359 time=30540
You just need to attach a format to your new variable so that SAS will display the value in a format that humans will recognize. You could use a format like DATE9. to have it show 19,359 as 01JAN2013. Similarly you need to attach a format to the time part to make it print in format that human's will interpret as a time.
format date date9. time time8. ;

Calculating rolling correlations in SAS

I have a data set here.
An excerpt of my data looks like this: (For an enlarged version: http://puu.sh/79NCK.jpg)
(Note: there are no missing values in my dataset)
I wish to calculate the correlation matrix using a rolling window of 1 year. My period starts from 01 Jan 2008. So for example, the correlation between AUT and BEL on 01 Jan 2008 is calculated using the series of values from 01 Jan 2007 to 01 Jan 2008, and likewise for all other pairs. Similarly the correlation between AUT and BEL on 02 Jan 2008 is calculated using the series of values from 02 Jan 2007 to 02 Jan 2008.
Since there will be a different correlation matrix for each day, I want to output each day's correlation matrix into a sheet in excel and name that sheet COV1 (for 01 Jan 2008), COV2 (for 02 Jan 2008), COV3 (for 03 Jan 2008), and so on until COV1566 (for 31 Dec 2013). An excerpt of the output for each sheet is like this: (Note: with the titles included on the top row and first column)
http://puu.sh/79NAy.jpg
I have loaded my datafile into SAS named rolling. For the moment, my code is simply:
proc corr data = mm.rolling;
run;
Which simply calculates the correlation matrix using the entire series of values. I am very new to SAS, any help would be appreciated.
Think about how you might do if you had immense amount of patience.
proc corr data = mm.rolling out = correlation_as_of_01jan2008;
where date between '01jan2007'd and '01jan2008'd;
run;
Similarly,
proc corr data = mm.rolling out = correlation_as_of_02jan2008;
where date between '02jan2007'd and '02jan2008'd;
run;
Thankfully you can use SAS macro programming to achieve a similar effect as shown in this macro:
%macro rollingCorrelations(inputDataset=, refDate=);
/*first get a list of unique dates on or after the reference date*/
proc freq data = &inputDataset. noprint;
where date >="&refDate."d;
table date/out = dates(keep = date);
run;
/*for each date calculate what the window range is, here using a year's length*/
data dateRanges(drop = date);
set dates end = endOfFile
nobs= numDates;
format toDate fromDate date9.;
toDate=date;
fromDate = intnx('year', toDate, -1, 's');
call symputx(compress("toDate"!!_n_), put(toDate,date9.));
call symputx(compress("fromDate"!!_n_), put(fromDate, date9.) );
/*find how many times(numberOfWindows) we need to iterate through*/
if endOfFile then do;
call symputx("numberOfWindows", numDates);
end;
run;
%do i = 1 %to &numberOfWindows.;
/*create a temporary view which has the filtered data that is passed to PROC CORR*/
data windowedDataview / view = windowedDataview;
set &inputDataset.;
where date between "&&fromDate&i."d and "&&toDate&i."d;
drop date;
run;
/*the output dataset from each PROC CORR run will be
correlation_DDMMMYYY<from date>_DDMMMYY<start date>*/
proc corr data = windowedDataview
outp = correlations_&&fromDate&i.._&&toDate&i. (where=(_type_ = 'CORR'))
noprint;
run;
%end;
/*append all datasets into a single table*/
data all_correlations;
format from to date9.;
set correlations_:
indsname = datasetname
;
from = input(substr(datasetname,19,9),date9.);
to = input(substr(datasetname,29,9), date9.);
run;
%mend rollingCorrelations;
%rollingCorrelations(inputDataset=rolling, refDate=01JAN2008)
The final output from the above macro will have from & to identifier to identify which date range each correlation matrix refers. Run it and examine the results.
I dont think excel can accomodate over 1500 tabs anyway, so best to keep it in a single table. The final table had 81K rows and the whole process ran in 2.5 mins.
update: to sort them by from & to
proc sort data = ALL_CORRELATIONS;
by from to;
run;

Extract month and year for new variable

I have a dataset with a variable that has the date of orders (MMDDYY10.). I need to extract the month and year to look at orders by month--year because there are multiple years. I am vaguely aware of the month and year functions, but how can I use them together to create a new month bin?
Ideally I would create a bin for each month/year so my output can look something like:
Date
Item 2011OCT 2011NOV 2011DEC 2012JAN ...
a 50 40 30 20
b 15 20 25 30
c 1 2 3 4
total
Here is a sample code I created:
data dsstabl;
set dsstabl;
order_month = month(AC_DATE_OF_IMAGE_ORDER);
order_year = year(AC_DATE_OF_IMAGE_ORDER);
order = compress(order_month||order_year);
run;
proc freq data
table item * _order;
run;
Thanks in advance!
When you are doing your analysis, use an appropriate format. MONYY. sounds like the right one. That will sort properly and will group values accordingly.
Something like:
proc means data=yourdata;
class datevar;
format datevar MONYY7.;
var whatever;
run;
So your table:
proc tabulate data=dsstabl;
class item datevar;
format datevar MONYY7.;
tables item,datevar*n;
run;
Or you can do it with nice trick.
data my_data_with_months;
set my_data;
MONTH = INTNX('month', NORMAL_DATE, 0, 'B');
run;
Always use it.

SAS-How to format arrays dynamically based on information in one column

I'm new to SAS, and would greatly appreciate anyone who can help me formulate a code. Can someone please help me with formatting changing arrays based on the first column values?
So basically here's the original data:
Category Name1 Name2......... (Changes invariably)
#ofpeople 20 30
#ofproviders 10 5
#ofclaims 40 25
AmountBilled 50 100
AmountPaid 11 35
AmountDed 5 6
I would like to format the values under Name1 to infinite Name# and reformat them to dollar10.2 for any values under Category called 'AmountBilled','AmountPaid','AmountDed'.
Thank you so much for your help!
You can't conditionally format a column (like you might in excel). A variable/column has one format for the entire column. There are tricks to get around this, but they're invariably more complex than should be considered useful.
You can store the formatted value in a character variable, but it loses the ability to do math.
data have;
input category :$10. name1 name2;
datalines;
#ofpeople 20 30
#ofproviders 10 5
#ofclaims 40 25
AmountBilled 50 100
AmountPaid 11 35
AmountDed 5 6
;;;;
run;
data want;
set have;
array names name:; *colon is wildcard (starts with);
array newnames $10 newname1-newname10; *Arbitrarily 10, can be whatever;
if substr(category,1,6)='Amount' then do;
do _t = 1 to dim(names);
newnames[_t] = put(names[_t],dollar10.2);
end;
end;
run;
You could programmatically figure out the newname1000 endpoint using PROC CONTENTS or SQL's DICTIONARY.COLUMNS / SAS's SASHELP.VCOLUMN. Alternately, you could put out the original dataset as a three column dataset with many rows for each category (was it this way to begin with prior to a PROC TRANSPOSE?) and put the character variable there (not needing an array). To me that's the cleanest option.
data have_t;
set have;
array names name:;
format nameval $10.;
do namenum = 1 to dim(names);
if substr(category,1,6)='Amount' then nameval = put(names[namenum],dollar10.2 -l);
else nameval=put(names[namenum],10. -l); *left aligning here, change this if you want otherwise;
output; *now we have (namenum) rows per line. Test for missing(name) if you want only nonmissing rows output (if not every row has same number of names).
end;
run;
proc transpose data=have_t out=want_T(drop=_name_) prefix=name;
by category notsorted;
var nameval;
run;
Finally, depending on what you're actually doing with this, you may have superior options in terms of the output method. If you're doing PROC REPORT for example, you can use compute blocks to set the style (format) of the column conditionally in the report output.