I have a stock price dataset which has observations in miliseconds (Variables: STOCK DATE TIME(in ms) PRICE. It is sorted by stock, date, and time.
I now need a dataset where the freuqency is 1-second intervals. The price variable should be the prevailing price at the second.
I tried proc expand:
proc expand data=have out=want to=second;
id stock date time; run;
But it does not work that way.
Any help is appreciated!
M
got it: proc timeseries with id time and interval=second works!
Related
For an if-query I would like to create a macro varibale giving the respective frequency of the underlying time
series. I tried to get some descriptive statistics from proc time series. However, they unfortunately do not include the figure for the frequency.
The underlying times series does not necessarily conclude all periods of the frequency. That excludes a selected count by proc sql from my point of view.
Does anyone know an efficient procedure to determine the frequency without computing the frequency on my own (in a data step or a proc sql code)?
You can use the outspectra statement to help learn what kind of seasonality it has. Based on the data, give PROC TIMESERIES your best guess of day, month, etc. In the example below, we know we want to forecast by month but we do not know what seasonality it has.
proc timeseries data=sashelp.air outspectra=spectra;
id date interval=month;
var air;
run;
Plot this spectra dataset in proc sgplot and you'll see something that looks like this:
proc sgplot data=spectra;
where NOT missing(period);
series x=period y=p;
run;
This line will naturally increase over time, but we're looking for a bumps in the line. Notice the large bump somewhere between 0 and 24 months and the several smaller bumps before it. Let's zoom in on that by filtering out the longer periods.
proc sgplot data=spectra;
where period < 24 and NOT missing(period);
series x=period y=p;
run;
It's pretty clear that there is a strong seasonality of 12, with potentially smaller cycles at 3 and 6 months. From this plot, we can conclude that our seasonality should be 12 based on our spectra plot.
You can turn this into a macro to help identify the season if you'd like. Simply search for the largest bump within a reasonable timeframe. In our case we'll choose 36 because we do not suspect that we have any seasonality > 36 months.
proc sort data=spectra;
by period;
run;
data identify_period;
set spectra;
by period;
where NOT missing(period) AND period LE 36;
delta = abs(p - lag(p) );
run;
proc sql;
select period, max(delta) as max_delta
from identify_period
having delta = max(delta)
;
quit;
Output:
PERIOD max_delta
12 163712
I don't know how to do this without data step logic, but you could wrap the data step in a macro as follows:
%macro get_frequency(data,date_variable,output_variable);
proc sort data=&data (keep=&date_variable) out=__tempsorted;
by &date_variable;
run;
data _null_;
set __tempsorted end=lastobs;
prevdate=lag(&date_variable);
if _n_ > 1 then do;
interval_number+1;
interval_total + (&date_variable - prevdate);
end;
if lastobs then do;
average_interval = interval_total/interval_number;
frequency = round(365.25/average_interval);
call symput ("&output_variable",left(put(frequency,best32.)));
end;
run;
proc datasets nolist;
delete __tempsorted;
run;
quit;
%mend get_frequency;
Then you can call the macro on your original data set timeseries to examine the variable date and create a new macro variable frequency1 with the required frequency.
data work.timeseries;
input date date. value;
format date date9.;
datalines;
01Oct18 3000
01Nov18 4000
01Dec18 6500
01Jan19 7000
01Feb19 4000
01Mar19 5000
01Apr19 7500
01May19 4800
01Jun19 4500
;
run;
%get_frequency(timeseries,date,freqency1)
%put &=frequency1;
This seems to work on your sample data where each date is the first of the month. If your dates are evenly distributed (e.g. always near month start/end, or always near mid-month etc.) then this macro should work ok. Obviously if you have multiple observations per date then it will give the completely incorrect frequency.
Noob SAS user here.
I have a hospital data set with patientID and a variable that counts the days between admission and discharge.
Those patients who had more than one hospital admission show up with the same patientID and with a record of how many days they were in hospital each time.
I want to sum the total days in hospital per patient, and then only have one patientID record with the sum of all hospital days across all stays. Does anyone know how I would go about this?
You want to select distinct the sum of days_in_hospital and group by patientID This will get what you want:
proc sql;
create table want as
select distinct
patientID,
sum(days_in_hospital) as sum_of_days
from have
group by patientID;
quit;
Alternatively you can use proc summary.
proc summary data= hospital_data nway;
class patientID;
var days;
output out=summarized_data (drop = _type_ _freq_) sum=;
run;
This creates a new dataset called summarized_data which has the summed days for each patientID. (The nway option removes the overall summary row, and the drop statement removes extra default summary columns you don't need.)
I have a dataset with visitors and weather variables. I'm trying to forecast visitors based on the weather variables. Since the dataset only consists of visitors in season there is missing values and gaps for every year. When running proc reg in sas it's all okay but the issue comes when i'm using proc VARMAX. I cannot run the regression due to missing values. How can i tackle this?
proc varmax data=tivoli4 printall plots=forecast(all);
id obs interval=day;
model lvisitors = rain sunshine averagetemp
dfebruary dmarch dmay djune djuly daugust doctober dnovember ddecember
dwednesday dthursday dfriday dsaturday dsunday
d_24Dec2016 d_05Dec2013 d_24Dec2017 d_24Dec2014 d_24Dec2015 d_24Dec2019
d_24Dec2018 d_24Sep2012 d_06Jul2015
d_08feb2019 d_16oct2014 d_15oct2019 d_20oct2016 d_15oct2015 d_22sep2017 d_08jul2015
d_20Sep2019 d_08jul2016 d_16oct2013 d_01aug2012 d_18oct2012 d_23dec2012 d_30nov2013 d_20sep2014 d_17oct2012 d_17jun2014
dFrock2012 dFrock2013 dFrock2014 dFrock2015 dFrock2016 dFrock2017 dFrock2018 dFrock2019
dYear2015 dYear2016 dYear2017
/p=7 q=2 Method=ml dftest;
garch p=1 q=1 form=ccc OUTHT=CONDITIONAL;
restrict
ar(3,1,1)=0, ar(4,1,1)=0, ar(5,1,1)=0,
XL(0,1,13)=0, XL(0,1,14)=0, XL(0,1,13)=0, XL(0,1,27)=0, XL(0,1,38)=0, XL(0,1,42)=0;
output lead=10 out=forecast;
run;
As with any forecast, you will first need to prepare your time-series. You should first run through your data through PROC TIMESERIES to fill-in or impute missing values. The impute choice that is most appropriate is dependent on your variables. The below code will:
Sum lvisitors by day and set missing values to 0
Set missing values of averagetemp to average
Set missing values of rain, sunshine, and your variables starting with d to 0 (assuming these are indicators)
Code:
proc timeseries data=have out=want;
id obs interval = day
setmissing = 0
notsorted
;
var lvisitors / accumulate=total;
crossvar averagetemp / accumulate=none setmissing=average;
crossvar rain sunshine d: / accumulate=none;
run;
Important Time Interval Consideration
Depending on your data, this could bias your error rate and estimates since you always know no one will be around in the off-season. If you have many missing values for off-season data, you will want to remove those rows.
Since PROC VARMAX does not support custom time intervals, you can instead create a simple time identifier. You can alternatively turn this into a format for proc format and converttime_id at the end.
data want;
set have;
time_id+1;
run;
proc varmax data=want;
id time_id interval=day;
...
output lead=10 out=myforecast;
run;
data myforecast;
merge myforecast
want(keep=time_id date)
;
by time_id;
run;
Or, if you made a format:
data myforecast;
set myforecast;
date = put(time_id, timeid.);
drop time_id;
run;
I have the DAILY returns of industry portfolios in SAS.
I would like to calculate the WEEKLY returns.
The daily returns are in percentage so I think that should just be the sum of returns during each week.
Obvious problems I am facing is that the weeks can have a different number of days in.
The table I have in SAS is in the following format:
INDUSTRY_NUMBER DATE DAILY_RETURN
Any help would be greatly appreciated.
I have tried this:
proc expand data=Day_result
out=Week_result from=day to=week;
Industry_Number Trading_Date;
convert Value_weighted_return / method=aggregate observed=total;
run;
The daily data is in Day_Result when I remove the forth line i.e.
proc expand data=Day_result
out=Week_result from=day to=week;
convert Value_weighted_return / method=aggregate observed=total;
run;
This works as in it does what I want it to do but it doesn't do it for each category it does it for the whole table.
So if I have 40 categories I want the weekly returns for each category.
The second set of code provides the weekly return for every category.
EXAMPLE DATA:
data have;
format trading_date date9.;
infile datalines dlm=',';
input trading_date:ddmmyy10. industry_number value_weighted_return;
datalines;
19/01/2000,1, -0.008
20/01/2000,1, 0.008
23/01/2000,1, 0.008
24/01/2000,1, -0.007
25/01/2000,1, -0.009
26/01/2000,1, 0.008
27/01/2000,1, -0.008
30/01/2000,1, 0.003
31/01/2000,1, -0.001
01/02/2000,1, 0.004
02/02/2000,2, -0.008
03/02/2000,2, -0.005
06/02/2000,2, -0.004
07/02/2000,2, -0.009
08/02/2000,2, 0.002
09/02/2000,2, 0.006
10/02/2000,2, 0.008
13/02/2000,2, 0.008
14/02/2000,2, 0.002
15/02/2000,2, 0.01
16/02/2000,2, -0.008
;
run;
Sort your data by INDUSTRY_NUMBER Trading_Date, use INDUSTRY_NUMBER as a by-group, identify your time variable.
proc sort data=have;
by industry_number trading_date;
run;
Next, convert your data into a time-series to remove any time gaps. Set any missing days as the previous value since it does not change on those trading days (e.g. weekends, bank holidays, etc.).
proc timeseries data=have
out=have_ts;
by industry_number;
id trading_date interval=day
setmissing=previous
accumulate=average
;
var value_weighted_return;
run;
Finally, take the time-series output and convert it from day to week. Since you are using weights, you may want to use average rather than total.
proc expand data=have_ts
out=have_ts_week
from=day
to=week
;
by industry_number;
id trading_date;
convert Value_weighted_return / method=aggregate observed=average;
run;
I have a dataset of transactional data per week. (quantity, price, week, etc.)
However in the dataset i have two prices for the same week.
eg two observations for week 28 (one at price 5.03 and one at price 5.20)
what i want to do is calculate the weighted average price depending on the quantity and sum the quantity for the two different obs so that i have only one obs for week 28.
this happens frequently so i would like to be able to do this quickly without editing manually all prices and quantities.
Oh and this is in SAS btw!
Thanks!
PROC SUMMARY with the WEIGHT statement applied against price will calculate this for you.
proc summary data=have nway;
class week;
var quantity;
var price / weight=quantity;
output out=want (drop=_:) sum(quantity)= mean(price)=;
run;