I am using the INTNX function to calculate month intervals. I'm finding that the results are frequently one day off from what I would expect... For example, look at this code:
data test;
olddate='20140531';
oldsasdate=input(olddate,yymmdd8.);
newsasdate=intnx('month',oldsasdate,-17);
newdate=put(newsasdate,yymmdd8.);
run;
In this code, I try to find the date 17 months before 05/31/2014. I would expect the function to return 11/30/2012, but it instead returns 12/1/2012. Any idea what's going on here? Is there a way to fix this?
The default for intnx is to align with the start of the month. It basically tracks interval boundaries, so each time it goes from MM/01/YY to MM/30/YY it ticks one interval crossed.
So,
data _null_;
x = intnx('month','31MAY2014'd, -1);
put x= date9.;
run;
Returns '01APR14'd, not '30APR14'd.
You can change it to 'same' alignment with the optional 4th parameter (SAS 9.2+ I believe).
data _null_;
x = intnx('month','31MAY2014'd, -1,'s');
put x= date9.;
run;
Related
How to convert the default timestamp "0001-01-01-00.00.00.000000" in SAS, i have tried below code but it has returned null value. Can someone help on this please
data _NULL_;
x = "0001-01-01-00.00.00.000000";
rlstime = input(x,anydtdtm26.);
call symput('rlstime',rlstime);
run;
%put rlst: &rlstime;
As far as I remember, SAS cannot do that. Any date/timestamp before 1.1.1600 doesn't exist for SAS. Do you need it or can you just replace it with a null value? If you really need it you could transform it into another valid timestamp, split it into different columns (year, month, etc.) or just use it as a string. In your example you just write the timestamp into the log, meaning it's not necessary to transform it.
The earliest date that SAS will handle is 1st January, 1582. Additionally, a colon character should be used to delimit the time from the date, as well as the hours, minutes and seconds. Therefore, your code may be adjusted to the following:
data _NULL_;
x = "1582-01-01:00:00:00.000000";
rlstime = input(x,anydtdtm26.);
call symput('rlstime',rlstime);
run;
%put rlst: &rlstime;
For an if-query I would like to create a macro varibale giving the respective frequency of the underlying time
series. I tried to get some descriptive statistics from proc time series. However, they unfortunately do not include the figure for the frequency.
The underlying times series does not necessarily conclude all periods of the frequency. That excludes a selected count by proc sql from my point of view.
Does anyone know an efficient procedure to determine the frequency without computing the frequency on my own (in a data step or a proc sql code)?
You can use the outspectra statement to help learn what kind of seasonality it has. Based on the data, give PROC TIMESERIES your best guess of day, month, etc. In the example below, we know we want to forecast by month but we do not know what seasonality it has.
proc timeseries data=sashelp.air outspectra=spectra;
id date interval=month;
var air;
run;
Plot this spectra dataset in proc sgplot and you'll see something that looks like this:
proc sgplot data=spectra;
where NOT missing(period);
series x=period y=p;
run;
This line will naturally increase over time, but we're looking for a bumps in the line. Notice the large bump somewhere between 0 and 24 months and the several smaller bumps before it. Let's zoom in on that by filtering out the longer periods.
proc sgplot data=spectra;
where period < 24 and NOT missing(period);
series x=period y=p;
run;
It's pretty clear that there is a strong seasonality of 12, with potentially smaller cycles at 3 and 6 months. From this plot, we can conclude that our seasonality should be 12 based on our spectra plot.
You can turn this into a macro to help identify the season if you'd like. Simply search for the largest bump within a reasonable timeframe. In our case we'll choose 36 because we do not suspect that we have any seasonality > 36 months.
proc sort data=spectra;
by period;
run;
data identify_period;
set spectra;
by period;
where NOT missing(period) AND period LE 36;
delta = abs(p - lag(p) );
run;
proc sql;
select period, max(delta) as max_delta
from identify_period
having delta = max(delta)
;
quit;
Output:
PERIOD max_delta
12 163712
I don't know how to do this without data step logic, but you could wrap the data step in a macro as follows:
%macro get_frequency(data,date_variable,output_variable);
proc sort data=&data (keep=&date_variable) out=__tempsorted;
by &date_variable;
run;
data _null_;
set __tempsorted end=lastobs;
prevdate=lag(&date_variable);
if _n_ > 1 then do;
interval_number+1;
interval_total + (&date_variable - prevdate);
end;
if lastobs then do;
average_interval = interval_total/interval_number;
frequency = round(365.25/average_interval);
call symput ("&output_variable",left(put(frequency,best32.)));
end;
run;
proc datasets nolist;
delete __tempsorted;
run;
quit;
%mend get_frequency;
Then you can call the macro on your original data set timeseries to examine the variable date and create a new macro variable frequency1 with the required frequency.
data work.timeseries;
input date date. value;
format date date9.;
datalines;
01Oct18 3000
01Nov18 4000
01Dec18 6500
01Jan19 7000
01Feb19 4000
01Mar19 5000
01Apr19 7500
01May19 4800
01Jun19 4500
;
run;
%get_frequency(timeseries,date,freqency1)
%put &=frequency1;
This seems to work on your sample data where each date is the first of the month. If your dates are evenly distributed (e.g. always near month start/end, or always near mid-month etc.) then this macro should work ok. Obviously if you have multiple observations per date then it will give the completely incorrect frequency.
I need help with a question.
I need to subtract two dates in sas studio. I have the next:
%let date_star = %SYSFUNC( DATETIME());
%let date_end = %SYSFUNC( DATETIME());
But I dont´n now how to subtract these variables.
Thanks for your help.
Use the INTCK() function to return the number of interval boundaries of a given kind that lie between two dates, times, or datetime values. The possible values of interval are listed in Date and Time Intervals.
%MACRO want;
%let date_start = %SYSFUNC(DATETIME());
data _null_;
rc=SLEEP(10,1); /* Sleep for 10 seconds */
run;
%let date_end = %SYSFUNC(DATETIME());
%put %sysfunc(intck(second, &date_start., &date_end.));
%MEND;
%want;
Result is 10 seconds, as expected.
So you haven't really created any variables there, just macro variables. Normally you would want to use SAS code to work with data, not macro code, but you can do it in a pinch.
You also do not have two DATE values. SAS stores dates as the number of days. Instead you have two DATETIME values. The DATETIME() function returns the number of seconds since 1960. So the difference between two datetime values will be a number in seconds.
The datetime value returned by DATETIME() will include fractions of a second. To perform floating point arithmetic in macro code you need to use the %SYSEVALF() function. The %EVAL() function that is used by default to evaluate conditions, like in a %IF statement, only handles integer arithmetic.
%let elapsed_time=%sysevalf(&date_end - &date_star);
If you would like to see the value in hours, minutes and seconds then you could apply a format to it.
%put Elasped time was %sysfunc(putn(&elapsed_time,time15.3));
I am taking a scripting class and I have no idea what I'm doing!
For my assignment, I am supposed to print min/max/mean/std for each year. The .csv file I was given to use has a year column with the years as
1949.083
1949.167
1949.25
1949.333
1949.417
1949.5
1949.583
1949.667
1949.75
1949.833
1949.917
1950
1950.083
1950.167
and so on, all the way to 1960.
Assuming I am using PROC MEANS, is there a way to maybe combine the years so I can print a single set of calculations (min/max/mean/std) for each year? As in one set of calculations for the year 1949 (data values from 1949-1949.917), another one for 1950 (data values from 1950-1950.917), etc. Not sure if I'm making sense! I've been looking everywhere for hours and I can't figure it out! :(
If you want PROC MEANS to calculate separate statistics per year you can use a CLASS statement. With a CLASS statement it will define the groups based on the formatted value. So if you just use the format 4. with the variable YEAR then each value will be mapped to a simple 4 digit value.
proc means data=have min max mean std ;
class year;
format year 4.;
var analysis_var ;
run;
But that will round values like 1,949.667 to 1950 and not 1949. If you want to ignore the fractional part of the year you can use the INT() function. So first create a new variable and then use that new variable in the CLASS statement.
data step1;
set have;
yrnum = int(year);
run;
proc means data=step1 min max mean std ;
class yrnum ;
var analysis_var ;
run;
I have a question regarding moving average. I use Proc Expand (cmovave 3), but those three days can be non consecutive I suppose. I want to avoid missing data between days and use moving average for just those adjacent days.
Is there any way that I can do this? If I want to put it in another way 'how can I select a part of my data set where I have values for consecutive period (days)?'. I hope you give me some examples for this problem.
Use Expand to make sure you have all the values in the timeseries interval. Then use a data step to calculate the ma3 with the lagN() functions.
If you data already has the correct timeseries interval, then skip the PROC EXPAND step.
data test;
start = "01JAN2013"d;
format date date9.
value best.;
do i=1 to 365;
r = ranuni(1);
value = rannor(1);
date = intnx('weekday',start,i);
dummy=1;
if r > .33 then output;
end;
drop i start r;
run;
proc expand data=test out=test2 to=weekday ;
id date;
var dummy;
run;
data test(drop=dummy);
merge test2 test;
by date;
ma3 = (value + lag(value) + lag2(value))/3;
run;
I use the DUMMY variable so that EXPAND will convert the series to WEEKDAY. Then drop it afterwards.