Stata Predict GARCH - stata

I want to do something very easy, but it doesnt work!
I need to see the predictions (and errors) of a GARCH model. The Main Variable es "dowclose", and my idea is look if the GARCH model has a good fitting on this variable.
Im using this easy code, but the prediction are just 0's
webuse dow1.dta
arch dowclose, noconstant arch(1) garch(1)
predict dow_hat, y
ARCH Results:
ARCH family regression
Sample: 1 - 9341 Number of obs = 9341
Distribution: Gaussian Wald chi2(.) = .
Log likelihood = -76191.43 Prob > chi2 = .
------------------------------------------------------------------------------
| OPG
dowclose | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
arch |
L1. | 1.00144 6.418855 0.16 0.876 -11.57929 13.58217
|
garch |
L1. | -.001033 6.264372 -0.00 1.000 -12.27898 12.27691
|
_cons | 56.60589 620784.7 0.00 1.000 -1216659 1216772
------------------------------------------------------------------------------

This is to be expected: you have no covariates and no intercept, so there's nothing to predict.
Here's a simple OLS regression that makes the problem apparent:
. sysuse auto
(1978 Automobile Data)
. reg price, nocons
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 0, 74) = 0.00
Model | 0 0 . Prob > F = .
Residual | 3.4478e+09 74 46592355.7 R-squared = 0.0000
-------------+------------------------------ Adj R-squared = 0.0000
Total | 3.4478e+09 74 46592355.7 Root MSE = 6825.9
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
. predict phat
(option xb assumed; fitted values)
. sum phat
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
phat | 74 0 0 0 0

Related

How can I adjust a coefplot for the constant value of categorical variable estimation?

I have a dataset in Stata that looks something like this
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
dv2 | 1,904 .5395645 .427109 -1.034977 1.071396
xvar | 1,904 3.074055 1.387308 1 5
with xvar being a categorical independent variable and dv2 a dependent variable of interest.
I am estimating a simple model with the categorical variable as a dummy:
reg dv2 ib4.xvar
eststo myest
Source | SS df MS Number of obs = 1,904
-------------+---------------------------------- F(4, 1899) = 13.51
Model | 9.60846364 4 2.40211591 Prob > F = 0.0000
Residual | 337.540713 1,899 .177746558 R-squared = 0.0277
-------------+---------------------------------- Adj R-squared = 0.0256
Total | 347.149177 1,903 .182422058 Root MSE = .4216
------------------------------------------------------------------------------
dv2 | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
xvar |
A | .015635 .0307356 0.51 0.611 -.044644 .075914
B | .1435987 .029325 4.90 0.000 .0860861 .2011113
C | .1711176 .0299331 5.72 0.000 .1124124 .2298228
E | .1337754 .0295877 4.52 0.000 .0757477 .1918032
|
_cons | .447794 .020191 22.18 0.000 .4081952 .4873928
------------------------------------------------------------------------------
These are the results. As you can see B, C and E have larger effect than D which is the excluded category.
However, coefplot does not account for the in categorical variable the coefficient is composite true_A=D+A.
coefplot myest, scheme(s1color) vert
As you can see the plot shows the constant to be the largest coefficient, while the other to be smaller.
Is there a systematic way I can adjust for this problem and plot the true coefficients and SEs of each category?
Thanks a lot for your help
In response to your second comment, here is an example of how you can use marginsplot to plot estimated effects from a linear regression.
sysuse auto, clear
replace price = price/100
reg price i.rep78, cformat(%9.2f)
------------------------------------------------------------------------------
price | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
rep78 |
2 | 14.03 23.56 0.60 0.554 -33.04 61.10
3 | 18.65 21.76 0.86 0.395 -24.83 62.13
4 | 15.07 22.21 0.68 0.500 -29.31 59.45
5 | 13.48 22.91 0.59 0.558 -32.28 59.25
|
_cons | 45.65 21.07 2.17 0.034 3.55 87.74
------------------------------------------------------------------------------
margins i.rep78, cformat(%9.2f)
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
rep78 |
1 | 45.65 21.07 2.17 0.034 3.55 87.74
2 | 59.68 10.54 5.66 0.000 38.63 80.73
3 | 64.29 5.44 11.82 0.000 53.42 75.16
4 | 60.72 7.02 8.64 0.000 46.68 74.75
5 | 59.13 8.99 6.58 0.000 41.18 77.08
------------------------------------------------------------------------------
marginsplot
Note that these values are the constant plus the appropriate coefficient.
And then using the marginsplot command we can produce the following plot, which includes the marginal estimates and confidence intervals:

How to include dummy variables in ivreg model?

I have the following model:
ivreg ldemand social_housing transport year (lprice = utilities)
However, I want to make year as a dummy variable.
How can I do it in Stata?
Using i.year doesn't work for the ivreg command.
Cross-posted on Statalist.
The command ivreg does not allow factor variables:
. webuse hsng2, clear
. ivreg rent pcturban i.region (hsngval = faminc)
factor variables not allowed
r(101);
However, you can still use the xi prefix to create dummies on the fly:
. xi: ivreg rent pcturban i.region (hsngval = faminc)
i.region _Iregion_1-4 (naturally coded; _Iregion_1 omitted)
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 50
-------------+---------------------------------- F(5, 44) = 9.10
Model | 12735.4667 5 2547.09334 Prob > F = 0.0000
Residual | 48507.6533 44 1102.44667 R-squared = 0.2079
-------------+---------------------------------- Adj R-squared = 0.1179
Total | 61243.12 49 1249.85959 Root MSE = 33.203
------------------------------------------------------------------------------
rent | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hsngval | .0038683 .0008958 4.32 0.000 .0020629 .0056737
pcturban | -.4980121 .5179779 -0.96 0.342 -1.541928 .5459039
_Iregion_2 | 1.528672 15.14086 0.10 0.920 -28.98572 32.04306
_Iregion_3 | 7.74279 15.10906 0.51 0.611 -22.70752 38.1931
_Iregion_4 | -40.61235 19.60999 -2.07 0.044 -80.13369 -1.091002
_cons | 88.26681 31.69154 2.79 0.008 24.39671 152.1369
------------------------------------------------------------------------------
Instrumented: hsngval
Instruments: pcturban _Iregion_2 _Iregion_3 _Iregion_4 faminc
------------------------------------------------------------------------------
It is important to note that according to the command's help file:
Out-of-date command
ivreg is an out-of-date command as of Stata 10. ivreg has been replaced with the ivregress command.
Thus, it is best to switch to ivregress instead:
. ivregress 2sls rent pcturban i.region (hsngval = faminc), small
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 50
-------------+------------------------------ F( 5, 44) = 9.10
Model | 12735.4667 5 2547.09334 Prob > F = 0.0000
Residual | 48507.6533 44 1102.44667 R-squared = 0.2079
-------------+------------------------------ Adj R-squared = 0.1179
Total | 61243.12 49 1249.85959 Root MSE = 33.203
------------------------------------------------------------------------------
rent | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hsngval | .0038683 .0008958 4.32 0.000 .0020629 .0056737
pcturban | -.4980121 .5179779 -0.96 0.342 -1.541928 .5459039
|
region |
N Cntrl | 1.528672 15.14086 0.10 0.920 -28.98572 32.04306
South | 7.74279 15.10906 0.51 0.611 -22.70752 38.1931
West | -40.61235 19.60999 -2.07 0.044 -80.13369 -1.091002
|
_cons | 88.26681 31.69154 2.79 0.008 24.39671 152.1369
------------------------------------------------------------------------------
Instrumented: hsngval
Instruments: pcturban 2.region 3.region 4.region faminc
Type help ivregress from Stata's command prompt for more details.

Stata Not Dropping Variables (in regression) due to Multicollinearity and I think it should

I am running a simple regression of race times against temperature just to develop some basic intuition. My data-set is very large and each observation is the race completion time of a unit in a given race, in a given year.
For starters I am running a very simple regression of race time on temperature bins.
Summary of temp variable:
|
Variable | Obs Mean Std. Dev Min Max
------------+--------------------------------------------
avg_temp_scc| 8309434 54.3 9.4 0 89
Summary of time variable:
Variable | Obs Mean Std. Dev Min Max
------------+--------------------------------------------
chiptime | 8309434 267.5 59.6 122 1262
I decided to make 10 degree bins for temperature and regress time against those.
The code is:
egen temp_trial = cut(avg_temp_scc), at(0,10,20,30,40,50,60,70,80,90)
reg chiptime i.temp_trial
The output is
Source | SS df MS Number of obs = 8309434
---------+------------------------------ F( 8,8309425) =69509.83
Model | 1.8525e+09 8 231557659 Prob > F = 0.0000
Residual | 2.7681e+108309425 3331.29368 R-squared = 0.0627
-----+-------------------------------- Adj R-squared = 0.0627
Total | 2.9534e+108309433 3554.22521 Root MSE = 57.717
chiptime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------+----------------------------------------------------------------
temp_trial |
10 | -26.63549 2.673903 -9.96 0.000 -31.87625 -21.39474
20 | 10.23883 1.796236 5.70 0.000 6.71827 13.75939
30 | -16.1049 1.678432 -9.60 0.000 -19.39457 -12.81523
40 | -13.97918 1.675669 -8.34 0.000 -17.26343 -10.69493
50 | -10.18371 1.675546 -6.08 0.000 -13.46772 -6.899695
60 | -.6865365 1.675901 -0.41 0.682 -3.971243 2.59817
70 | 44.42869 1.676883 26.49 0.000 41.14206 47.71532
80 | 23.63064 1.766566 13.38 0.000 20.16824 27.09305
_cons | 273.1366 1.675256 163.04 0.000 269.8531 276.42
So stata correctly drops the one of the bins (in this case 0-10) of temperature.
Now I manually created the bins and ran the regression again:
gen temp0 = 1 if temp_trial==0
replace temp0 = 0 if temp_trial!=0
gen temp1 = 1 if temp_trial == 10
replace temp1 = 0 if temp_trial != 10
gen temp2 = 1 if temp_trial==20
replace temp2 = 0 if temp_trial!=20
gen temp3 = 1 if temp_trial==30
replace temp3 = 0 if temp_trial!=30
gen temp4=1 if temp_trial==40
replace temp4=0 if temp_trial!=40
gen temp5=1 if temp_trial==50
replace temp5=0 if temp_trial!=50
gen temp6=1 if temp_trial==60
replace temp6=0 if temp_trial!=60
gen temp7=1 if temp_trial==70
replace temp7=0 if temp_trial!=70
gen temp8=1 if temp_trial==80
replace temp8=0 if temp_trial!=80
reg chiptime temp0 temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8
The output is:
Source | SS df MS Number of obs = 8309434
---------+------------------------------ F( 9,8309424) =61786.51
Model | 1.8525e+09 9 205829030 Prob > F = 0.0000
Residual | 2.7681e+108309424 3331.29408 R-squared = 0.0627
--------+------------------------------ Adj R-squared = 0.0627
Total | 2.9534e+108309433 3554.22521 Root MSE = 57.717
--------------------------------------------------------------------------
chiptime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+----------------------------------------------------------------
temp0 | -54.13245 6050.204 -0.01 0.993 -11912.32 11804.05
temp1 | -80.76794 6050.204 -0.01 0.989 -11938.95 11777.42
temp2 | -43.89362 6050.203 -0.01 0.994 -11902.08 11814.29
temp3 | -70.23735 6050.203 -0.01 0.991 -11928.42 11787.94
temp4 | -68.11162 6050.203 -0.01 0.991 -11926.29 11790.07
temp5 | -64.31615 6050.203 -0.01 0.992 -11922.5 11793.87
temp6 | -54.81898 6050.203 -0.01 0.993 -11913 11803.36
temp7 | -9.703755 6050.203 -0.00 0.999 -11867.89 11848.48
temp8 | -30.5018 6050.203 -0.01 0.996 -11888.68 11827.68
_cons | 327.269 6050.203 0.05 0.957 -11530.91 12185.45
Note the bins are exhaustive of the entire data set and stata is including a constant in the regression and none of the bins are getting dropped. Is this not incorrect? Given that the constant is being included in the regression, shouldn't one of the bins get dropped to make it the "base case"? I feel as though I am missing something obvious here.
Edit:
Here is a dropbox link for the data and do file:
It contains only the two variables under consideration. The file is 129 mb. I also have a picture of my output at the link.
This too is not an answer, but an extended comment, since I'm tired of fighting with the 600-character limit and the freeze on editing after 5 minutes.
In the comment thread on the original post, #user52932 wrote
Thank you for verifying this. Can you elaborate on what exactly this
precision issue is? Does this only cause problems in this
multicollinearity issue? Could it be that when I am using factor
variables this precision issue may cause my estimates to be wrong?
I want to be unambiguous that the results from the regression using factor variables are as correct as those of any well-specified regression can be.
In the regression using dummy variables, the model was misspecified to include a set of multicollinear variables. Stata is then faulted for failing to detect the multicollinearity.
But there's no magic test for multicollinearity. It's inferred from characteristics of the cross-products matrix. In this case the cross-products matrix represents 8.3 million observations, and despite Stata's use of double-precision throughout, the calculated matrix passed Stata's test and was not detected as containing a multicollinear set of variables. This is the locus of the precision problem to which I referred. Note that by reordering the observations, the accumulated cross-products matrix differed enough so that it now failed Stata's test, and the misspecification was detected.
Now look at the results in the original post obtained from this misspecified regression. Note that if you add 54.13245 to the coefficients on each of the dummy variables and subtract the same amount from the constant, the resulting coefficients and constant are identical to those in the regression using factor variables. This is the textbook definition of the problem with multicollinearity - not that the coefficient estimates are wrong, but that the coefficient estimates are not uniquely defined.
In a comment above, #user52932 wrote
I am unsure what Stata is using as the base case in my data.
The answer is that Stata used no base case; the results are what are to be expected when a set of multicollinear variables is included among the independent variables.
So this question is a reminder to us that statistical packages like Stata cannot infallibly detect multicollinearity. As it turns out, that's part of the genius of factor variable notation, I realize now. With factor variable notation, you tell Stata to create a set of dummy variables that by definition will be multicollinear, and since it understands that relationship between the dummy variables, it can eliminate the multicollinearity ex ante, before constructing the cross-products matrix, rather than attempt to infer the problem ex post, using the cross-products matrix's characteristics.
We should not be surprised that Stata occasionally fails to detect multicollinearity, but rather gratified that it does as well as it does at doing so. After all, the second model is indeed a misspecification, which constitutes an unambiguous violation of the assumptions of OLS regression on the user's part.
This may not be an "answer" but it's too long for a comment, so I write it here.
My results are different. At the final regression, one variable is dropped:
. clear all
. set obs 8309434
number of observations (_N) was 0, now 8,309,434
. set seed 1
. gen avg_temp_scc = floor(90*uniform())
. egen temp_trial = cut(avg_temp_scc), at(0,10,20,30,40,50,60,70,80,90)
. gen chiptime = rnormal()
. reg chiptime i.temp_trial
Source | SS df MS Number of obs = 8,309,434
-------------+---------------------------------- F(8, 8309425) = 0.88
Model | 7.07729775 8 .884662219 Prob > F = 0.5282
Residual | 8308356.5 8,309,425 .999871411 R-squared = 0.0000
-------------+---------------------------------- Adj R-squared = -0.0000
Total | 8308363.58 8,309,433 .9998713 Root MSE = .99994
------------------------------------------------------------------------------
chiptime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
temp_trial |
10 | .0010732 .0014715 0.73 0.466 -.0018109 .0039573
20 | .0003255 .0014713 0.22 0.825 -.0025581 .0032092
30 | .0017061 .0014713 1.16 0.246 -.0011776 .0045897
40 | .0003128 .0014717 0.21 0.832 -.0025718 .0031973
50 | .0007142 .0014715 0.49 0.627 -.0021699 .0035983
60 | .0021693 .0014716 1.47 0.140 -.0007149 .0050535
70 | -.0008265 .0014715 -0.56 0.574 -.0037107 .0020577
80 | -.0005001 .0014714 -0.34 0.734 -.0033839 .0023837
|
_cons | -.0006364 .0010403 -0.61 0.541 -.0026753 .0014025
------------------------------------------------------------------------------
. * "qui tab temp_trial, gen(temp)" is more convenient than "forv ..."
. forv k = 0/8 {
2. gen temp`k' = temp_trial==`k'0
3. }
. reg chiptime temp0-temp8
note: temp6 omitted because of collinearity
Source | SS df MS Number of obs = 8,309,434
-------------+---------------------------------- F(8, 8309425) = 0.88
Model | 7.07729775 8 .884662219 Prob > F = 0.5282
Residual | 8308356.5 8,309,425 .999871411 R-squared = 0.0000
-------------+---------------------------------- Adj R-squared = -0.0000
Total | 8308363.58 8,309,433 .9998713 Root MSE = .99994
------------------------------------------------------------------------------
chiptime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
temp0 | -.0021693 .0014716 -1.47 0.140 -.0050535 .0007149
temp1 | -.0010961 .0014719 -0.74 0.456 -.003981 .0017888
temp2 | -.0018438 .0014717 -1.25 0.210 -.0047282 .0010407
temp3 | -.0004633 .0014717 -0.31 0.753 -.0033477 .0024211
temp4 | -.0018566 .0014721 -1.26 0.207 -.0047419 .0010287
temp5 | -.0014551 .0014719 -0.99 0.323 -.00434 .0014298
temp6 | 0 (omitted)
temp7 | -.0029958 .0014719 -2.04 0.042 -.0058808 -.0001108
temp8 | -.0026694 .0014718 -1.81 0.070 -.005554 .0002152
_cons | .0015329 .0010408 1.47 0.141 -.0005071 .0035729
------------------------------------------------------------------------------
The difference with yours is: (i) different data (I generated random numbers), (ii) I used a forvalue loop instead of manual variable creation. Yet, I see no errors in your codes.

Retrieving standard errors after the command nlcom

In Stata the command nlcom employs the delta method to test nonlinear hypotheses about estimated coefficients. The command displays the standard errors in the results window, though unfortunately does not save them anywhere.
What is available after estimation is just the matrix r(V), but I cannot figure out how to use it to compute the standard errors.
You need to use the post option, like this:
. sysuse auto
(1978 Automobile Data)
. reg price mpg weight
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 2, 71) = 14.74
Model | 186321280 2 93160639.9 Prob > F = 0.0000
Residual | 448744116 71 6320339.67 R-squared = 0.2934
-------------+------------------------------ Adj R-squared = 0.2735
Total | 635065396 73 8699525.97 Root MSE = 2514
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -49.51222 86.15604 -0.57 0.567 -221.3025 122.278
weight | 1.746559 .6413538 2.72 0.008 .467736 3.025382
_cons | 1946.069 3597.05 0.54 0.590 -5226.245 9118.382
------------------------------------------------------------------------------
. nlcom ratio: _b[mpg]/_b[weight], post
ratio: _b[mpg]/_b[weight]
------------------------------------------------------------------------------
price | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ratio | -28.34844 58.05769 -0.49 0.625 -142.1394 85.44254
------------------------------------------------------------------------------
. di _se[ratio]
58.057686
This standard error is the square root of the entry from the variance matrix r(V):
. matrix list r(V)
symmetric r(V)[1,1]
ratio
ratio 3370.6949
. di sqrt(3370.6949)
58.057686
Obviously you need to take square roots of the diagonal elements of r(V). Here's an approach that returns the standard errors as variables in a one-observation data set.
sysuse auto, clear
reg mpg weight turn
nlcom (v1: 1/_b[weight]) (v2: _b[weight]/_b[turn])
mata: se = sqrt(diagonal(st_matrix("r(V)")))'
clear
getmata (se1 se2 ) = se /* supply names as needed */
list

Using user-written command chest in Stata for change-in-estimate effects

I'm using the user-written command chest in Stata to look at the change-in-estimate with the variables in my model.
After running the linear regression of
regress age allelecount gender htn_g dm_g lipid_g i.hx_smoking b_bmi hx_med_asa if cadhx2==0
I run the chest command
chest allelecount, backward nograph
but I only get output for one variable
chest allelecount, backward
Change-in-estimate
regress regression. Outcome: age
number of obs = 476 Exposure: allelecount
----------------------------------------------------------
Variables |
removed | Coef. [95% Conf. Interval] Change, %
----------+-----------------------------------------------
Adj.All | -0.3691 -0.6819 -0.0564
-lipid_g | -0.3688 -0.6804 -0.0571 -0.0996
----------------------------------------------------------
Can anyone explain this?
Using the auto data of Stata, I find no problem:
sysuse auto
regress price mpg rep78 headroom
Source | SS df MS Number of obs = 69
-------------+------------------------------ F( 3, 65) = 7.51
Model | 148497605 3 49499201.8 Prob > F = 0.0002
Residual | 428299354 65 6589220.82 R-squared = 0.2575
-------------+------------------------------ Adj R-squared = 0.2232
Total | 576796959 68 8482308.22 Root MSE = 2566.9
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -289.3462 62.53921 -4.63 0.000 -414.2456 -164.4467
rep78 | 670.8971 343.5213 1.95 0.055 -15.16242 1356.957
headroom | -300.0293 398.0516 -0.75 0.454 -1094.993 494.9346
_cons | 10921.33 2153.003 5.07 0.000 6621.487 15221.17
chest mpg,backward
Change-in-estimate
regress regression. Outcome: price
number of obs = 69 Exposure: mpg
----------------------------------------------------------
Variables |
removed | Coef. [95% Conf. Interval] Change, %
----------+-----------------------------------------------
Adj.All | -289.3462 -411.9208 -166.7715
-headroom | -271.6425 -384.8719 -158.4132 -6.1185
-rep78 | -226.3607 -332.1613 -120.5600 -16.6697
----------------------------------------------------------