sas aggregate weekly, monthly - sas

For the data set below(actual one is several thousand row long) I would like SAS to aggregate the income daily (many income lines everyday per machine), weekly, monthly (start of week is Monday, Start of month is 01 in any given year) by the machine. Is there a straight forward code for this? Any help is appreciated.
MachineNo Date income
1 01Jan2012 1500
1 02Jan2012 2000
1 27Aug2012 300
2 02Jan2012 1200
2 15Jun2012 50
3 03Mar2012 1000
4 08Apr2012 500

proc expand and proc timeseries are excellent tools for accumulation and aggregation to different frequencies of series. You can combine both with by-group processing to convert to any time period that you need.
Step 1: Sort by MachineNo and Date
proc sort data=want;
by MachineNo Date;
run;
Step 2: Find the min/max end dates of your series for date alignment
The format=date9. statement is important. For whatever reason, some SAS/ETS and HPF procedures require date literals for certain arguments.
proc sql noprint;
select min(date) format=date9.,
max(date) format=date9.
into :min_date,
:max_date
from have;
quit;
Step 3: Align each MachineNo by start/end date, and accumulate days per MachineNo
The below code will get you aligned daily accumulation, remove duplicate days per machine, and set Income on any missing days to 0. This step will also guarantee that your series has equal time intervals per by-group, allowing you to run hierarchical time-series analyses without violating the equal-spaced interval assumption.
proc timeseries data=have
out=want_day;
by MachineNo;
id date interval=day
align=both
start="&min_date"d
end="&max_date"d;
var income / accumulate=total setmiss=0;
run;
Step 4: Aggregate aligned Daily to Weekly shifted by 1 day, Monthly
SAS time intervals are able to be both multiplied and shifted. Since the standard weekday starts on a Sunday, we want to shift by 1 day to have it start on a Monday.
Standard Week
2 3 4 5 6 7 1
Mon Tue Wed Thu Fri Sat Sun
Shifted
1 2 3 4 5 6 7
Mon Tue Wed Thu Fri Sat Sun
Intervals follow the format:
TimeInterval<Multiplier>.<Shift>
The standard shift interval is 1. For all intents and purposes, consider 1 as 0: 1 means it's unshifted. 2 means it's shifted by 1 period. Thus, for a week to start on a Monday, we want to use the interval Week.2.
proc expand data=want_day
out=want_week
from=day
to=week.2;
id date;
convert income / method=aggregate observed=total;
run;
Step 5: Convert Week to Month
proc expand data=want_week
out=want_month
from=week.2
to=month;
id date;
convert income / method=aggregate observed=total;
run;

In case you don't have a license for SAS/ETS here's another way.
For the monthly data you can format the date in a proc means output.
I think WeekW. starts on Monday but it may not be in a format you want, so you'll need to create a new variable for week first if you wanted to use this method.
proc means data=have nway noprint;
class machineno date;
format date monyy7.;
var income;
output out=want sum(income)=income;
run;

Related

Calculate average of the last x years

I have the following data
Date value_idx
2002-01-31 .
2002-01-31 24.533
2002-01-31 26.50
2018-02-28 25.2124
2019-09-12 22.251
2019-01-31 24.214
2019-05-21 25.241
2019-05-21 .
2020-05-21 25.241
2020-05-21 23.232
I would need to calculate the average of value_idx of the last 3 years and 7 years.
I tried first to calculate it as follows:
proc sql;
create table table1 as
select date, avg(value_idx) as avg_value_idx
from table
group by date;
quit;
The problem is that I do not know how to calculate the average of value_idx not per each month but for the last two years. So I think I should extract the year, group by that, and then calculate the average.
I hope someone of you can help me with this.
You can use CASE to decide which records contribute to which MEAN. You need to clarify what you mean by last 2 or last 7 years. This code will find the value of the maximum date and then compare the year of that date to the year of the other dates.
select
mean(case when year(max_date)-year(date) < 2 then value_idx else . end) as mean_yr2
,mean(case when year(max_date)-year(date) < 7 then value_idx else . end) as mean_yr7
from have,(select max(date) as max_date from have)
;
Results
mean_yr2 mean_yr7
------------------
24.0358 24.2319
The best way to do this sort of thing in SAS is with native PROCs, as they have a lot of functionality related to grouping.
In this case, we use multilabel formats to control the grouping. I assume you mean 'Last Three Years' as in calendar 2018/2019/2020 and 'Last Seven Years' as calendar 2014-2020. Presumably you can see how to modify this for other time periods - so long as you aren't trying to make the time period relative to each data point.
We create a format that uses the MULTILABEL option (which allows data points to fall in multiple categories), and the NOTSORTED option (to allow us to force the ordering of the labels, otherwise SEVEN is earlier than THREE).
Then, we use it in PROC TABULATE, enabling it with MLF (MultiLabel Format) and preloadfmt order=data which again keeps the ordering correct. This produces a report with the two averages only.
data have;
informat date yymmdd10.;
input Date value_idx;
datalines;
2002-01-31 .
2002-01-31 24.533
2002-01-31 26.50
2017-02-28 25.2124
2017-09-12 22.251
2018-01-31 24.214
2018-05-21 25.241
2019-05-21 .
2020-05-21 25.241
2020-05-21 23.232
;;;;
run;
proc format;
value yeartabfmt (multilabel notsorted)
'01JAN2018'd-'31DEC2020'd = 'Last Three Years'
'01JAN2014'd-'31DEC2020'd = 'Last Seven Years'
other=' '
;
quit;
proc tabulate data=have;
class date/mlf preloadfmt order=data;
var value_idx;
format date yeartabfmt.;
tables date,value_idx*mean;
run;

In sas application Set date parameter and put it in the contacted column to retrieve data in a certain period

The date in the table is not one set,
Days in the days column and months in the month column and years in the year column
I have concatenated the columns and then put these concatenation in where clause and put the parameter I have made but I got no result
I assume you are querying a date dimension table, and you want to extract the record that matches a certain date.
Solution:
I created a dates table to match with,
data dates;
input key day month year ;
datalines;
1 19 2 2018
2 20 2 2018
3 21 2 2018
4 22 2 2018
;;;
run;
Output:
In the where clause I parse the date '20feb2018'd using day, month & year functions: in SAS you have to quote the dates in [''d]
proc sql;
select * from dates
/*if you want to match todays' date: replace '20feb2018'd with today()*/
where day('20feb2018'd)=day and month('20feb2018'd)=month and year('20feb2018'd)=year;
quit;
Output:
if you compare date from day month and year, then use mdy function in where clause as shown below. it is not totally clear what you are looking for.
proc sql;
select * from dates
where mdy(month,day, year) between '19feb2018'd and '21feb2018'd ;

SAS: Calculating rolling skew of previous 30 days

I want to calculate the skew of a timeseries (stock returns) of the previous 30 days on a rolling basis (thus, getting daily values).
Dataset looks like:
Stock date month year return
1SF7 1/07/2016 7 2016 0.94
1SF7 5/07/2016 7 2016 0.91
1SF7 6/07/2016 7 2016 0.82
1SF7 7/07/2016 7 2016 0.95
..........
Currently, I tried proc means and just calculate month-end skewness
proc means data=have; by year month;
output out= want (drop= _freq_ _type_ ) skew(return)=Skew_monthly;
run;
Anyone has an idea for rolling skewness? I know there is a question here that asks for rolling skewness, but the answer to that only outputs one value per 30 days, but I want daily values.
Thankful for any input!
Marc
Thanks, I managed it with the array version:
data want; array p{0:29} _temporary_;
set have; by symbol;
if symbol then call missing(of p{*});
p{mod(_n_,30)} = return;
skew = skewness(of p{*});
run;

use data step generate next observation

Case 1
Suppose the data are sorted by year then by month (always have 3 observations in data).
Year Month Index
2014 11 1.1
2014 12 1.5
2015 1 1.2
I need to copy the Index of last month to new observation
Year Month Index
2014 11 1.1
2014 12 1.5
2015 1 1.2
2015 2 1.2
Case 2
Year is removed from data. So we only have Month and Index.
Month Index
1 1.2
11 1.1
12 1.5
Data is always collected from consecutive 3 months. So 1 is the last month.
Still, ideal output is
Month Index
1 1.2
2 1.2
11 1.1
12 1.5
I solve it by creating another dataset only contains Month (1,2...12). Then right join the original dataset twice. But I think there's more elegant way to deal with this.
Case 1 can be a straight-forward data step. Add end=eof to the set statement to initialize a variable eof that returns value 1 when the data step is reading the last row of the data set. An output statement in the data step outputs a row during each iteration. If eof=1, a do block runs that increments the month by 1 and outputs another row.
data want;
set have end=eof;
output;
if eof then do;
month=mod(month+1,12);
output;
end;
run;
For case 2, I would switch to an sql solution. Self join the table to itself on month, incremented by 1 in the second table. Use the coalesce function to keep the values from the existing table if it exists. If not, use the values from the second table. Since a case crossing December-January will produce 5 months, limit the output to four rows using the outobs= option in proc sql to exclude the unwanted second January.
proc sql outobs=4;
create table want as
select
coalesce(t1.month,mod(t2.month+1,12)) as month,
coalesce(t1.index,t2.index) as index
from
have t1
full outer join have t2
on t1.month = t2.month+1
order by
coalesce(t1.month,t2.month+1)
;
quit;

week function giving strange result

using the week function to clean some data and eventually will order the weeks. I used week() on the date 8/26/2011 and I got 34, and when the function inserted the date 01/13/2012 it spit out 2. I thouhgt I was getting number of weeks since jan 1, 1960?
As per the WEEK Function documentation, the default U descriptor specifies the number of the week within the year, with Sunday being deemed the 1st day of the week. (You can use V if you want Monday to be considered the 1st day instead.)
The week function calculates the week of the current year. The answer to the implied question, "how do I calculated the number of days since 1/1/1960 [or some arbitrary date]," is the intck function.
data have;
input datevar date9.;
datalines;
01JAN1960
02JAN2013
13JAN2012
26AUG2011
;;;;
run;
data want;
set have;
wks = intck('week',0,datevar); *# of weeks from 0 to datevar [0=1/1/1960].
*Can replace 0 with any other date variable.;
run;