Running manual example of mixed command in Stata:
use http://www.stata-press.com/data/r13/pig
mixed weight week || id:
I get following results:
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log likelihood = -1014.9268
Iteration 1: log likelihood = -1014.9268
Computing standard errors:
Mixed-effects ML regression Number of obs = 432
Group variable: id Number of groups = 48
Obs per group: min = 9
avg = 9.0
max = 9
Wald chi2(1) = 25337.49
Log likelihood = -1014.9268 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
weight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
week | 6.209896 .0390124 159.18 0.000 6.133433 6.286359
_cons | 19.35561 .5974059 32.40 0.000 18.18472 20.52651
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity |
var(_cons) | 14.81751 3.124226 9.801716 22.40002
-----------------------------+------------------------------------------------
var(Residual) | 4.383264 .3163348 3.805112 5.04926
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) = 472.65 Prob >= chibar2 = 0.0000
My question is - can I programmatically access the estimates of 'Random-effects Parameters': var(_cons) and var(Residual)?
I tried going over return(list) & ereturn(list) but they don't seem to be available there.
I found one option on the UCLA's website:
* var(cons)
_diparm lns1_1_1, f(exp(#)^2) d(2*exp(#)^2)
* var(Residual)
_diparm lnsig_e, f(exp(#)^2) d(2*exp(#)^2)
Related
I need to compare two different estimation methods and see if there are statistically same or not. However one of my estimation methods is SUR (Seemingly Unrelated Regression). And, I estimated the my 11 different models using
sureg (Y1 trend X1 .... X106) (Y2 trend X1..... X181) ..... (Y11 trend X1 .... X 130)
Then I estimated single OLS model as shown in following
glm(Y1 trend X1 ...... X106)
Now I need to test if parameter estimates of X1 to X106 comming from sureg is equal to glm estimates of same variables or not? I need to use Haussman specification test. I couldn't figure how can I store parameters estimates for specific equation in an SUR system estimation.
I couldn't find object should I add to estimates store XXX to subset a part of SUR estimates.
It's not easy to give a working example using my own crowded data, but let me present same problem using stata's auto data.
. sysuse auto (1978 automobile data)
. sureg (price mpg headroom) (trunk weight length) (gear_ratio turn headroom)
Seemingly unrelated regression
------------------------------------------------------------------------------ Equation Obs Params RMSE "R-squared" chi2 P>chi2
------------------------------------------------------------------------------ price 74 2 2576.37 0.2266 21.24
0.0000 trunk 74 2 2.912933 0.5299 82.93 0.0000 gear_ratio 74 2 .3307276 0.4674 65.12 0.0000
------------------------------------------------------------------------------
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+---------------------------------------------------------------- price |
mpg | -258.2886 57.06953 -4.53 0.000 -370.1428 -146.4344
headroom | -419.4592 390.4048 -1.07 0.283 -1184.639 345.7201
_cons | 12921.65 2025.737 6.38 0.000 8951.277 16892.02
-------------+---------------------------------------------------------------- trunk |
weight | -.0010525 .0013499 -0.78 0.436 -.0036983 .0015933
length | .1735274 .0471176 3.68 0.000 .0811785 .2658762
_cons | -15.6766 5.182878 -3.02 0.002 -25.83485 -5.518345
-------------+---------------------------------------------------------------- gear_ratio |
turn | -.0652416 .0097031 -6.72 0.000 -.0842594 -.0462238
headroom | -.0601831 .0505198 -1.19 0.234 -.1592001 .0388339
_cons | 5.781748 .3507486 16.48 0.000 5.094293 6.469202
------------------------------------------------------------------------------
. glm (price mpg headroom)
Iteration 0: log likelihood = -686.17715
Generalized linear models Number of obs = 74 Optimization : ML Residual df = 71
Scale parameter = 6912463 Deviance = 490784895.4 (1/df) Deviance = 6912463 Pearson = 490784895.4 (1/df) Pearson = 6912463
Variance function: V(u) = 1 [Gaussian] Link function : g(u) = u [Identity]
AIC = 18.62641 Log likelihood = -686.1771533 BIC = 4.91e+08
------------------------------------------------------------------------------
| OIM
price | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
mpg | -259.1057 58.42485 -4.43 0.000 -373.6163 -144.5951
headroom | -334.0215 399.5499 -0.84 0.403 -1117.125 449.082
_cons | 12683.31 2074.497 6.11 0.000 8617.375 16749.25
------------------------------------------------------------------------------
as you see for the price model (glm) parameter estimate of mpg coef is -259.10 and parameter for same variable estimated in SUR system is -258.288
Now I wanted to test if parameter estimates of GLM and SUR methods are statistically equal or not.
I am fitting the mixed model below:
. mixed y trt || clst:trt, nocons reml dfmethod(sat)
Performing EM optimization:
Performing gradient-based optimization:
Iteration 0: log restricted-likelihood = -1295.3123
Iteration 1: log restricted-likelihood = -1295.3098
Iteration 2: log restricted-likelihood = -1295.3098
Computing standard errors:
Computing degrees of freedom:
Mixed-effects REML regression Number of obs = 919
Group variable: clst Number of groups = 49
Obs per group:
min = 1
avg = 18.8
max = 30
DF method: Satterthwaite DF: min = 888.00
avg = 900.91
max = 913.83
F(1, 913.83) = 0.40
Log restricted-likelihood = -1295.3098 Prob > F = 0.5251
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
trt | .1455914 .2290005 0.64 0.525 -.3038366 .5950193
_cons | .3951269 .2241477 1.76 0.078 -.0447941 .835048
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
clst: Identity |
var(trt) | .0341507 .0173905 .0125877 .092652
-----------------------------+------------------------------------------------
var(Residual) | .9546016 .0453034 .8698131 1.047655
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 9.46 Prob >= chibar2 = 0.0010
. return list
scalars:
r(level) = 95
matrices:
r(table) : 9 x 4
Next, I calculate the ICC as follows:
. nlcom (icc_est: (exp(_b[lns1_1_1:_cons])^2)/((exp(_b[lns1_1_1:_cons])^2)+(exp(_b[lnsig_e:_cons])^2)))
icc_est: (exp(_b[lns1_1_1:_cons])^2)/((exp(_b[lns1_1_1:_cons])^2)+(exp(_b[lnsig_e:_cons])^2))
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
icc_est | .0345392 .0171907 2.01 0.045 .0008461 .0682323
------------------------------------------------------------------------------
How can I save the results in the dataset?
I want to keep all the three tables shown: fixed effects, random effects and the ICC results.
Consider the following reproducible example using Stata's pig toy dataset:
webuse pig, clear
mixed weight week || id:week, nocons reml dfmethod(sat)
nlcom (icc_est: (exp(_b[lns1_1_1:_cons])^2)/((exp(_b[lns1_1_1:_cons])^2)+(exp(_b[lnsig_e:_cons])^2))), post
------------------------------------------------------------------------------
weight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
icc_est | .1380299 .0265754 5.19 0.000 .0859431 .1901167
------------------------------------------------------------------------------
The following works for me:
generate double coef = _b[icc_est]
generate double se = _se[icc_est]
generate p = string(2 * (normal(-(_b[icc_est] / _se[icc_est]))), "%9.3f")
generate double upper = _b[icc_est] + _se[icc_est] * invnormal(0.025)
generate double lower = _b[icc_est] + _se[icc_est] * invnormal(0.975)
list coef se p upper lower in 1
+-------------------------------------------------------+
| coef se p upper lower |
|-------------------------------------------------------|
1. | .13802987 .02657538 0.000 .08594308 .19011667 |
+-------------------------------------------------------+
save mydata.dta
The process is similar for the results of the main model.
As a follow-up, getting the random intercept variance and SE and residual variance and SE easily will take one more line of code. But as the previous reply indicated, the results from the main model are obtained in the same way as the ICC results. See code below.
mixed y trt || clst:trt, nocons reml dfmethod(sat)
gen double fixedcoef = _b[trt]
gen double fixedse = _se[trt]
_diparm lns1_1_1, f(exp(#)^2) d(2*exp(#)^2)
gen double randomcoef = r(est)
gen double randomse = r(se)
_diparm lnsig_e, f(exp(#)^2) d(2*exp(#)^2)
gen double residcoef = r(est)
gen double residse = r(se)
I want to do something very easy, but it doesnt work!
I need to see the predictions (and errors) of a GARCH model. The Main Variable es "dowclose", and my idea is look if the GARCH model has a good fitting on this variable.
Im using this easy code, but the prediction are just 0's
webuse dow1.dta
arch dowclose, noconstant arch(1) garch(1)
predict dow_hat, y
ARCH Results:
ARCH family regression
Sample: 1 - 9341 Number of obs = 9341
Distribution: Gaussian Wald chi2(.) = .
Log likelihood = -76191.43 Prob > chi2 = .
------------------------------------------------------------------------------
| OPG
dowclose | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
arch |
L1. | 1.00144 6.418855 0.16 0.876 -11.57929 13.58217
|
garch |
L1. | -.001033 6.264372 -0.00 1.000 -12.27898 12.27691
|
_cons | 56.60589 620784.7 0.00 1.000 -1216659 1216772
------------------------------------------------------------------------------
This is to be expected: you have no covariates and no intercept, so there's nothing to predict.
Here's a simple OLS regression that makes the problem apparent:
. sysuse auto
(1978 Automobile Data)
. reg price, nocons
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 0, 74) = 0.00
Model | 0 0 . Prob > F = .
Residual | 3.4478e+09 74 46592355.7 R-squared = 0.0000
-------------+------------------------------ Adj R-squared = 0.0000
Total | 3.4478e+09 74 46592355.7 Root MSE = 6825.9
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
. predict phat
(option xb assumed; fitted values)
. sum phat
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
phat | 74 0 0 0 0
In Stata the command nlcom employs the delta method to test nonlinear hypotheses about estimated coefficients. The command displays the standard errors in the results window, though unfortunately does not save them anywhere.
What is available after estimation is just the matrix r(V), but I cannot figure out how to use it to compute the standard errors.
You need to use the post option, like this:
. sysuse auto
(1978 Automobile Data)
. reg price mpg weight
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 2, 71) = 14.74
Model | 186321280 2 93160639.9 Prob > F = 0.0000
Residual | 448744116 71 6320339.67 R-squared = 0.2934
-------------+------------------------------ Adj R-squared = 0.2735
Total | 635065396 73 8699525.97 Root MSE = 2514
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -49.51222 86.15604 -0.57 0.567 -221.3025 122.278
weight | 1.746559 .6413538 2.72 0.008 .467736 3.025382
_cons | 1946.069 3597.05 0.54 0.590 -5226.245 9118.382
------------------------------------------------------------------------------
. nlcom ratio: _b[mpg]/_b[weight], post
ratio: _b[mpg]/_b[weight]
------------------------------------------------------------------------------
price | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ratio | -28.34844 58.05769 -0.49 0.625 -142.1394 85.44254
------------------------------------------------------------------------------
. di _se[ratio]
58.057686
This standard error is the square root of the entry from the variance matrix r(V):
. matrix list r(V)
symmetric r(V)[1,1]
ratio
ratio 3370.6949
. di sqrt(3370.6949)
58.057686
Obviously you need to take square roots of the diagonal elements of r(V). Here's an approach that returns the standard errors as variables in a one-observation data set.
sysuse auto, clear
reg mpg weight turn
nlcom (v1: 1/_b[weight]) (v2: _b[weight]/_b[turn])
mata: se = sqrt(diagonal(st_matrix("r(V)")))'
clear
getmata (se1 se2 ) = se /* supply names as needed */
list
I was trying to examine whether Stata is taking the initial values in the model NormalReg (sample model) that I used from previous reg. However, it seems to me by looking at iteration 0 that it is not taking into account my initial values. Any help to fix this issue will be highly appreciated.
set seed 123
set obs 1000
gen x = runiform()*2
gen u = rnormal()*5
gen y = 2 + 2*x + u
reg y x
Source | SS df MS Number of obs = 1000
-------------+------------------------------ F( 1, 998) = 52.93
Model | 1335.32339 1 1335.32339 Prob > F = 0.0000
Residual | 25177.012 998 25.227467 R-squared = 0.0504
-------------+------------------------------ Adj R-squared = 0.0494
Total | 26512.3354 999 26.5388743 Root MSE = 5.0227
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1.99348 .2740031 7.28 0.000 1.455792 2.531168
_cons | 2.036442 .3155685 6.45 0.000 1.417188 2.655695
------------------------------------------------------------------------------
cap program drop NormalReg
program define NormalReg
args lnlk xb sigma2
qui replace `lnlk' = -ln(sqrt(`sigma2'*2*_pi)) - ($ML_y-`xb')^2/(2*`sigma2')
end
ml model lf NormalReg (reg: y = x) (sigma2:)
ml init reg:x = `=_b[x]'
ml init reg:_cons = `=_b[_cons]'
ml max,iter(1) trace
ml max,iter(1) trace
initial: log likelihood = -<inf> (could not be evaluated)
searching for feasible values .+
feasible: log likelihood = -28110.03
rescaling entire vector .+.
rescale: log likelihood = -14623.922
rescaling equations ...+++++.
rescaling equations ....
rescale eq: log likelihood = -3080.0872
------------------------------------------------------------------------------
Iteration 0:
Parameter vector:
reg: reg: sigma2:
x _cons _cons
r1 3.98696 1 32
log likelihood = -3080.0872
------------------------------------------------------------------------------
Iteration 1:
Parameter vector:
reg: reg: sigma2:
x _cons _cons
r1 2.498536 1.773872 24.10726
log likelihood = -3035.3553
------------------------------------------------------------------------------
convergence not achieved
Number of obs = 1000
Wald chi2(1) = 86.45
Log likelihood = -3035.3553 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
reg |
x | 2.498536 .2687209 9.30 0.000 1.971853 3.02522
_cons | 1.773872 .3086854 5.75 0.000 1.16886 2.378885
-------------+----------------------------------------------------------------
sigma2 |
_cons | 24.10726 1.033172 23.33 0.000 22.08228 26.13224
------------------------------------------------------------------------------
Warning: convergence not achieved
Apparently, if you want ml to evaluate the likelihood at the specified initial values at iteration 0, you must also supply a value for sigma2;. Change the last section of your code to:
matrix rmse = e(rmse)
scalar mse = rmse[1,1]^2
ml model lf NormalReg (reg: y = x) (sigma2:)
ml init reg:x = `=_b[x]'
ml init reg:_cons = `=_b[_cons]'
ml init sigma2:_cons = `=scalar(mse)'
ml maximize, trace
Note that the ML estimate of sigma^2 will differ from the root mean square error because ML doesn't know about degrees of freedom. With n = 1,000 sigma2 = (998/1000)*rmse.
Stuff like this is very sensitive. You are trusting that the results from the previous regression are still visible at the exact point the program is defined. That could be undermined directly or indirectly by several different operations. It's best to treat arguments you want to use as arguments to be fed to your program using the program's options at the point it runs.