Independent variable to find seasonality effect? - sas

I'm not sure if it's right to ask this here but any help greatly appreciated. I'm working on sas forecast studio.
This is my time series dataset (quarterly data):
Date e.g. 1-Jan-80, 1-Apr-80, 1-Jul-80
DateQ e.g. 1980Q1, 1980Q2, 1980Q3
Year e.g. 1980, 1981, 1982
GDP (dependable variable) e.g. 2650.1
T e.g. 1, 2, 3
Which of this variable, or should I create a new quarterly variable, to use as an independent variable for a linear regression to evaluate if there is any seasonal effect?

Seasonal effects should not be identified using simple linear regression on the time variable when analyzing time-series data. But, to answer your question, use date with the intnx() function to convert it to quarter.
data want;
format quarter yyq.;
set have;
quarter = intnx('quarter', date, 0, 'B');
run;
Seasonal effects can be identified a number of ways:
1. Graphing it
If a time series has a seasonal effect, it will tend to be clear. Simply looking at a graph of the data will let you know whether it is seasonal by your chosen interval.
In sashelp.air, it's very clear that there is a 12-month season.
2. Spectral Density Analysis
proc timeseries will give you a spectrum analysis to help identify significant seasons within the data. Peaks indicate possible cycles or seasons. You will need to do some filtering to a reasonable seasonal amount since the density may increase significantly after a certain point, and it is not representative of the true season.
Forecast Studio and Time Series Studio will do this for you and can give you similar output to the below.
proc timeseries data=sashelp.air
outspectra=outspectra;
id date interval=month;
var air;
spectra;
run;
proc sgplot data=outspectra;
where period BETWEEN 1 AND 24;
scatter x=period y=p;
series x=period y=p;
run;
We can see a strong indicator for a seasonality of 12. We also see some potential 3-month and 6-month cycles that could be tested within a model for significance.
3. ACF/PACF/IACF plots
Your ACF/PACF/IACF plots in Forecast Studio will also help you identify clear seasons.
The classic decaying suspension-bridge look is indicative of a seasonal effect. Note that the season increases around 12 and then decreases again. Additionally, the significant negative spike at 12 in the PACF and IACF plots are other indicators of a significant seasonal effect at 12.
Model Building and Testing
Tools like the seasonal augmented dickey fuller test that are available Forecast Studio can help you identify if you've captured seasonality and achieved stationarity after differencing.
The selection boxes in the Series view allow you to quickly add simple or seasonal differencing. Selecting (1) for simple differencing will add one simple difference. i.e:
y = y - lag(y)
Selecting (1) for seasonal differencing will add 1 seasonal difference. Note that when you create a project in Forecast Studio, the season is automatically diagnosed and assumed. This should be done after doing our diagnostics above for our best guess as to what the true season is. In our case, we've assumed our season is 12. This would be equivalent to:
y = y - lag12(y)
We can then use stationarity tests to ensure we've achieved stationarity. In our case, we'll add 1 simple and seasonal difference.
Notice how our white noise plot has improved and our spikes at 12 have decreased to non-significance. Additionally, our stationarity tests are looking good and significant - that is, there is no unit root present.
Adding Seasonal or Cyclical Effects
Your model choice will dictate how seasonal or cyclical effects are added. Differencing in an ARIMA model will take care of seasonality. Dummy variables can be used for additional cyclical effects in the ARIMA model. For example:
data want;
set have;
q1 = (qtr(date) = 1);
q2 = (qtr(date) = 2);
q3 = (qtr(date) = 3);
run;
UCMs can take care of all of these by adding both seasonal and cyclical effects. Holt-Winters ESMs take care of trend and seasonality without requiring dummy variables. Your modeling goals and performance considerations for each type of model will dictate which model you choose.

Related

Add stars to p<.05 in correlation matrix in Stata

I'm hoping to one star to p<.05 and two stars to p<.001 in a correlation matrix in Stata. This is the code that I'm currently using. The code still generates a correlation matrix, but no stars appear in places where they should. Thanks for your help!
asdoc corr RELATIONSHIP anxiety BEH_SIM SIM_VALUES sptconf NEG_EFFICACY spteffort SPTEFFORT_OTHER COOP_MOTIV COMP_MOTIV, star(0.5), replace
First, you need to use pwcorr rather than corr to be able to add stars to your correlation matrix. Second, you should not have the second comma right after the star option.
For example, the code below will output a correlation matrix with 1 star if significant at a 10% level, 2 stars if significant at 5% level, and three stars if significant at a 1%
level.
asdoc pwcorr var1 var2 var3, star(all) replace
I do not believe you can specify star numbers and significant levels the way you would like to using asdoc. You can specify custom significance levels by using star(.05) rather than star(all) as I do above, but this will put one star by every correlation coefficient significant at a 5% level and I do not think you can specify more than 1 level at a time.
The author of asdoc is Professor Attaullah Shah. He is very helpful and responsive so you might ask him. If not currently possible, if you ask he may add your suggestion to a future asdoc update. Here is a link to his website: https://fintechprofessor.com/2019/06/01/export-correlation-table-to-word-with-stars-and-significance-level-using-asdoc/

Proc reg using by variable (month): How do you take average of all coefficients across all months?

How do you take an average of the coefficients across all months?
Please refer to this question earlier
How do I perform regression by month on the same SAS data set?
The comments in the linked question provide the code to get the estimates in a data set. Then you would run a PROC MEANS on the saved data set to get the averages. But you could also run the model without which a variable to get the monthly estimates alone. In general, it isn't common to average parameter estimates this way, except in a bootstrapping process.

SAS propensity score matching: Observations considered for matching in PSMATCH is less than the total observations available in the data set

I am using SAS procedure PSMATCH to balance the cohorts. I am calculating the propensity score separately using logistic regression and then using the generated dataset in PSMATCH using PSDATA. I am doing multiple iterations of matching (to get the best results) by bringing variation in region, method (Optimal, Greedy and variable ratio), distance variable, caliper value and ratio. Please find the code below:
proc psmatch data=work.&data_set. region=&region_var.;
class &cat_var.;
psdata treatvar = case_cntrl_fl(Treated='1') PS=prop_score;
match method=&mtch_method.(&k_method.=&k_val.) exact= &.exact_mtch_var.
stat=&stat_var. caliper(mult=stddev)=&caliper_var.;
assess lps ps var=(prop_score &covar_asses.) / plots = (boxplot cloudplot);
output out(obs=match)=WORK.psm ps=ps lps=lps matchid=_MatchID matchwgt = _MATCHWGT_;
run;
My concern is regarding the number of observation considered for matching (i.e. All Observations). The total observation logistic regression data set are Treatment Arm 1: 531 and Treatment Arm 2: 3252 However, in PSMATCH report All observations reported as Treatment Arm 1: 446 and Treatment Arm 2: 2784 The result is consistent irrespective of the variations in PSMATCH methods
Can somebody help me understand the possible reason of drop in counts?
You likely have missing values in your data. If any variable in the proc is missing, that entire row is excluded from the analysis overall.

How to do proportionate stratified sampling without replacement?

I want to select my sample in Stata 13 based on three stratum variables with 12 strata in total (size - two strata; sector - three strata; intangible intensity - two strata). The selection should be proportional without replacement.
However, I can only find disproportionate selection commands that select for instance x% of each stratum.
Can anyone help me out with this problem?
Thank you for this discussion. I think I know where my problem was.
The command "gsample" can select strata based on different variables. Therefore, I thought I had to define three different stratum variables. But the solution should be more simple.
There are 12 strata in total (the large firms with high intensity in sector 1, the small firms with high intensity in sector 1, and so on) with each firm in the sample falling in to one of the strata.
All I have to do is creating a variable "strataident" with values from 1 to 12 identifying the different strata. I do this for the population dataset, so the number of firms falling into each stratum is representative for the population. The following code will provide me a stratified random sample that is representative for the population.
gsample 10, percent strata (strataident) wor
This command works as well and is much easier, see the example in 1:
gsample 10, percent wor strata(size sector intensity)
The problem is, that strata may "overlap". So you probably have to rebalance the sample after initial draft.
Now the question is, how this can be implemented. The final sample should represent the proportion of the population as good as possible.

Simulating random effects / mixed models in SAS

I'm trying to create a simulation of drug concentration based on the dose of a drug given. I have some preliminary data and I used a random effects model to analyze the relationship between log(dose), predicting log(drug concentration), modelling subject as a random effect.
The results of that analysis are below. I want to take these results and simulate similar data in SAS, so I can look at the effect of changing doses on the resulting concentration of drug in the body. I know that when I simulate the data, I need to ensure the random slope is correlated with the random intercept, but I'm unsure exactly how to do that. Any example code would be appreciated.
Random effects:
Formula: ~LDOS | RANDID
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 0.15915378 (Intr)
LDOS 0.01783609 0.735
Residual 0.05790635
Fixed effects:
LCMX ~ LDOS
Value Std.Error DF t-value p-value
(Intercept) 3.340712 0.04319325 16 77.34339 0
LDOS 1.000386 0.01034409 11 96.71090 0
Correlation:
(Intr)
LDOS -0.047