I tried creating a table using the community-contributed family of commands estout:
esttab est1 est2 est3 using table3.tex, se label nobaselevels ///
star(* 0.10 ** 0.05 *** 0.01) cell((coef(fmt(%9.2f)) sd(fmt(%9.2f)))) ///
drop(_Iprovince* _Iyear*) stats(year province robust r2 N, ///
label("Year Fixed Effects" "Province Fixed Effects" "Robust SE" "R-squared")) ///
replace booktabs
However, Stata produces the following error:
coefficient _Iprovince* not found
These are "fixed effect" dummies that I want to drop them out.
The code works fine when I take out cell().
Finally, how can I also round up the coefficient estimates and standard errors?
Unless you have a very old version of Stata, don't use xi to create your FEs. Use factor variable notation i.province and i.year instead.
The principal problem with your code is that you should have b instead of coef (Stata cannot drop coefficients since they are not included unless you tell Stata that you want them):
sysuse auto
eststo est1: reg price mpg i.rep78
esttab est1, ///
stats(b year province robust r2 N, label("Year Fixed Effects" "Province Fixed Effects" "Robust SE" "R-squared")) ///
replace booktabs drop(*.rep78) se label nobaselevels star(* 0.10 ** 0.05 *** 0.01) cell((b(fmt(%9.2f)) sd(fmt(%9.2f))))
Note the reproducible example on a shared dataset.
The code does not execute because you are using in esttab the suboption sd
instead of se:
sysuse auto, clear
eststo est1: xi: reg price mpg i.rep78
i.rep78 _Irep78_1-5 (naturally coded; _Irep78_1 omitted)
Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(5, 63) = 4.39
Model | 149020603 5 29804120.7 Prob > F = 0.0017
Residual | 427776355 63 6790100.88 R-squared = 0.2584
-------------+---------------------------------- Adj R-squared = 0.1995
Total | 576796959 68 8482308.22 Root MSE = 2605.8
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -280.2615 61.57666 -4.55 0.000 -403.3126 -157.2103
_Irep78_2 | 877.6347 2063.285 0.43 0.672 -3245.51 5000.78
_Irep78_3 | 1425.657 1905.438 0.75 0.457 -2382.057 5233.371
_Irep78_4 | 1693.841 1942.669 0.87 0.387 -2188.274 5575.956
_Irep78_5 | 3131.982 2041.049 1.53 0.130 -946.7282 7210.693
_cons | 10449.99 2251.041 4.64 0.000 5951.646 14948.34
------------------------------------------------------------------------------
esttab est1, cell((coef(fmt(%9.2f)) sd(fmt(%9.2f)))) label nobaselevels ///
star(* 0.10 ** 0.05 *** 0.01) stats(b r2 N) drop(_Irep78*)
coefficient _Irep78* not found
r(111);
If you use the correct suboption se the code runs:
esttab est1, cell((coef(fmt(%9.2f)) se(fmt(%9.2f)))) label nobaselevels ///
star(* 0.10 ** 0.05 *** 0.01) stats(r2 N) drop(_Irep78*)
----------------------------------------------
(1)
Price
coef se
----------------------------------------------
Mileage (mpg) 61.58
Constant 2251.04
----------------------------------------------
r2 0.26
N 69.00
----------------------------------------------
However, the suboption coeflabels (coef in your code) is supposed to only
specify labels for beta coefficients, not include them.
As such, instead you need to use the suboption b as suggested in
#Dimitriy's answer:
esttab est1, cell((b(fmt(%9.2f)) se(fmt(%9.2f)))) label nobaselevels ///
star(* 0.10 ** 0.05 *** 0.01) stats(r2 N) drop(_Irep78*)
----------------------------------------------
(1)
Price
b se
----------------------------------------------
Mileage (mpg) -280.26 61.58
Constant 10449.99 2251.04
----------------------------------------------
r2 0.26
N 69.00
----------------------------------------------
Here's the full output in esttab (wihout dropping anything):
esttab est1, cell((b(fmt(%9.2f)) se(fmt(%9.2f)))) label nobaselevels ///
star(* 0.10 ** 0.05 *** 0.01) stats(r2 N)
----------------------------------------------
(1)
Price
b se
----------------------------------------------
Mileage (mpg) -280.26 61.58
rep78==2 877.63 2063.28
rep78==3 1425.66 1905.44
rep78==4 1693.84 1942.67
rep78==5 3131.98 2041.05
Constant 10449.99 2251.04
----------------------------------------------
r2 0.26
N 69.00
----------------------------------------------
Related
I need to compare two different estimation methods and see if there are statistically same or not. However one of my estimation methods is SUR (Seemingly Unrelated Regression). And, I estimated the my 11 different models using
sureg (Y1 trend X1 .... X106) (Y2 trend X1..... X181) ..... (Y11 trend X1 .... X 130)
Then I estimated single OLS model as shown in following
glm(Y1 trend X1 ...... X106)
Now I need to test if parameter estimates of X1 to X106 comming from sureg is equal to glm estimates of same variables or not? I need to use Haussman specification test. I couldn't figure how can I store parameters estimates for specific equation in an SUR system estimation.
I couldn't find object should I add to estimates store XXX to subset a part of SUR estimates.
It's not easy to give a working example using my own crowded data, but let me present same problem using stata's auto data.
. sysuse auto (1978 automobile data)
. sureg (price mpg headroom) (trunk weight length) (gear_ratio turn headroom)
Seemingly unrelated regression
------------------------------------------------------------------------------ Equation Obs Params RMSE "R-squared" chi2 P>chi2
------------------------------------------------------------------------------ price 74 2 2576.37 0.2266 21.24
0.0000 trunk 74 2 2.912933 0.5299 82.93 0.0000 gear_ratio 74 2 .3307276 0.4674 65.12 0.0000
------------------------------------------------------------------------------
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+---------------------------------------------------------------- price |
mpg | -258.2886 57.06953 -4.53 0.000 -370.1428 -146.4344
headroom | -419.4592 390.4048 -1.07 0.283 -1184.639 345.7201
_cons | 12921.65 2025.737 6.38 0.000 8951.277 16892.02
-------------+---------------------------------------------------------------- trunk |
weight | -.0010525 .0013499 -0.78 0.436 -.0036983 .0015933
length | .1735274 .0471176 3.68 0.000 .0811785 .2658762
_cons | -15.6766 5.182878 -3.02 0.002 -25.83485 -5.518345
-------------+---------------------------------------------------------------- gear_ratio |
turn | -.0652416 .0097031 -6.72 0.000 -.0842594 -.0462238
headroom | -.0601831 .0505198 -1.19 0.234 -.1592001 .0388339
_cons | 5.781748 .3507486 16.48 0.000 5.094293 6.469202
------------------------------------------------------------------------------
. glm (price mpg headroom)
Iteration 0: log likelihood = -686.17715
Generalized linear models Number of obs = 74 Optimization : ML Residual df = 71
Scale parameter = 6912463 Deviance = 490784895.4 (1/df) Deviance = 6912463 Pearson = 490784895.4 (1/df) Pearson = 6912463
Variance function: V(u) = 1 [Gaussian] Link function : g(u) = u [Identity]
AIC = 18.62641 Log likelihood = -686.1771533 BIC = 4.91e+08
------------------------------------------------------------------------------
| OIM
price | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
mpg | -259.1057 58.42485 -4.43 0.000 -373.6163 -144.5951
headroom | -334.0215 399.5499 -0.84 0.403 -1117.125 449.082
_cons | 12683.31 2074.497 6.11 0.000 8617.375 16749.25
------------------------------------------------------------------------------
as you see for the price model (glm) parameter estimate of mpg coef is -259.10 and parameter for same variable estimated in SUR system is -258.288
Now I wanted to test if parameter estimates of GLM and SUR methods are statistically equal or not.
I display the results of two regressions using the community-contributed command esttab:
sysuse auto, clear
quietly reg price weight
est store ols
quietly nl (price = {b0} + {b1} * weight)
est store nls
esttab *
--------------------------------------------
(1) (2)
price price
--------------------------------------------
main
weight 2.044***
(5.42)
_cons -6.707 -6.707
(-0.01) (-0.01)
--------------------------------------------
b1
_cons 2.044***
(5.42)
--------------------------------------------
N 74 74
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
How can I make the b1 coefficient from the nl command to appear in the weight row?
The easiest way of doing this is the following:
sysuse auto, clear
estimates clear
regress price weight
estimates store ols
nl (price = {b0} + {b1} * weight)
matrix b = e(b)
matrix V = e(V)
matrix coleq b = " "
matrix coleq V = " "
matrix colnames b = _cons weight
matrix colnames V = _cons weight
erepost b = b V = V, rename
estimates store nls
Results:
esttab ols nls
--------------------------------------------
(1) (2)
price price
--------------------------------------------
weight 2.044*** 2.044***
(5.42) (5.42)
_cons -6.707 -6.707
(-0.01) (-0.01)
--------------------------------------------
N 74 74
--------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Note that erepost is a community-contributed command, which you can download from SSC:
ssc install erepost
I am running a regression on categorical variables in Stata:
regress y i.age i.birth
Part of the regression results output is below:
coef
age
28 .1
29 -.2
birth
1958 .2
1959 .5
I want the above results to be shown in the reverse order, so that I can export them to Excel using the putexcel command:
coef
age
29 -.2
28 .1
birth
1959 .5
1958 .2
I tried sorting the birth and age variables before regression, but this does not work.
Can someone help?
You cannot directly reverse the factor levels of a variable in the regression output.
However, if your end goal is to create a table in Microsoft Excel one way to do this is the following:
sysuse auto.dta, clear
estimates clear
keep if !missing(rep78)
tabulate rep78, generate(rep)
regress price mpg weight rep2-rep5
estimates store r1
regress price mpg weight rep5 rep4 rep3 rep2
estimates store r2
Normal results:
esttab r1 using results.csv, label refcat(rep2 "Repair record", nolabel)
------------------------------------
(1)
Price
------------------------------------
Mileage (mpg) -63.10
(-0.72)
Weight (lbs.) 2.093**
(3.29)
Repair record
rep78== 2.0000 753.7
(0.39)
rep78== 3.0000 1349.4
(0.76)
rep78== 4.0000 2030.5
(1.12)
rep78== 5.0000 3376.9
(1.78)
Constant -599.0
(-0.15)
------------------------------------
Observations 69
------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Reversed results:
esttab r2 using results.csv, label refcat(rep5 "Repair record", nolabel)
------------------------------------
(1)
Price
------------------------------------
Mileage (mpg) -63.10
(-0.72)
Weight (lbs.) 2.093**
(3.29)
Repair record
rep78== 5.0000 3376.9
(1.78)
rep78== 4.0000 2030.5
(1.12)
rep78== 3.0000 1349.4
(0.76)
rep78== 2.0000 753.7
(0.39)
Constant -599.0
(-0.15)
------------------------------------
Observations 69
------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Note that here I am using the commmunity-contributed command esttab to export the results.
You can make further tweaks if you fiddle with its options.
EDIT:
This solution manually creates dummies for esttab but instead you can also create a new variable with the reverse coding and use the opposite base level as #NickCox demonstrates in his solution.
You can reverse the coding and apply value labels to insist on what you will see:
sysuse auto, clear
generate rep78_2 = 6 - rep78
label define new 1 "5" 2 "4" 3 "3" 4 "2" 5 "1"
label values rep78_2 new
regress mpg i.rep78_2
Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(4, 64) = 4.91
Model | 549.415777 4 137.353944 Prob > F = 0.0016
Residual | 1790.78712 64 27.9810488 R-squared = 0.2348
-------------+---------------------------------- Adj R-squared = 0.1869
Total | 2340.2029 68 34.4147485 Root MSE = 5.2897
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rep78_2 |
4 | -5.69697 2.02441 -2.81 0.006 -9.741193 -1.652747
3 | -7.930303 1.86452 -4.25 0.000 -11.65511 -4.205497
2 | -8.238636 2.457918 -3.35 0.001 -13.14889 -3.32838
1 | -6.363636 4.066234 -1.56 0.123 -14.48687 1.759599
|
_cons | 27.36364 1.594908 17.16 0.000 24.17744 30.54983
------------------------------------------------------------------------------
regress mpg ib5.rep78_2
Source | SS df MS Number of obs = 69
-------------+---------------------------------- F(4, 64) = 4.91
Model | 549.415777 4 137.353944 Prob > F = 0.0016
Residual | 1790.78712 64 27.9810488 R-squared = 0.2348
-------------+---------------------------------- Adj R-squared = 0.1869
Total | 2340.2029 68 34.4147485 Root MSE = 5.2897
------------------------------------------------------------------------------
mpg | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rep78_2 |
5 | 6.363636 4.066234 1.56 0.123 -1.759599 14.48687
4 | .6666667 3.942718 0.17 0.866 -7.209818 8.543152
3 | -1.566667 3.863059 -0.41 0.686 -9.284014 6.150681
2 | -1.875 4.181884 -0.45 0.655 -10.22927 6.479274
|
_cons | 21 3.740391 5.61 0.000 13.52771 28.47229
------------------------------------------------------------------------------
If you wanted to see the same variable name as before, you could also do the following:
drop rep78
rename rep78_2
Below you can find my code:
#delimit;
local fixed_effect "Yes";
estout pre_post using output.xls, cells(b(star fmt(4) keep(post
`ctrlVars')) t(par fmt(2) keep(post `ctrlVars')))
legend starlevels( * 0.10 ** 0.05 *** 0.010) stats(`fixed_effect' r2 N
labels("Industry fixed effects" "Adjusted R-squared")) varlabels(_cons
Constant) append;
This produces the following error message:
( invalid name
"Industry fixed effects invalid name
"Adjusted R-squared invalid name
) invalid name
r(7);
What is wrong?
EDIT:
Sorry for not being clear enough. This is what I would like to have:
----------------------------
(1)
Industry FEs
b/t
----------------------------
mpg -174.3133*
(-1.99)
headroom -520.2934
(-1.23)
length 31.3659
(1.30)
Constant 5540.3487
(0.94)
----------------------------
Industry FE Yes
Adjusted R~d 0.2454
N 74
----------------------------
* p<0.10, ** p<0.05, *** p<0.010
I can reproduce your problem using Stata's toy dataset auto as follows:
sysuse auto, clear
regress price mpg headroom length
#delimit;
esttab ., cells(b(star fmt(4)) t(par fmt(2)))
legend starlevels( * 0.10 ** 0.05 *** 0.010) stats(r2 N
label("Industry fixed effects" "Adjusted R-squared")) varlabels(_cons
Constant);
( invalid name
"Industry fixed effects invalid name
"Adjusted R-squared invalid name
) invalid name
r(7);
This error happens because you are using the options of the community-contributed command estout incorrectly: labels() is a sub-option of stats() and thus it has to be separated using a comma. In addition, you need the standalone option mlabels() to specify a custom model name:
esttab ., cells(b(star fmt(4)) t(par fmt(2))) legend ///
starlevels(* 0.10 ** 0.05 *** 0.010) stats(r2 N, labels("Adjusted R-squared")) ///
mlabels("Industry FEs") varlabels(_cons Constant)
----------------------------
(1)
Industry FEs
b/t
----------------------------
mpg -174.3133*
(-1.99)
headroom -520.2934
(-1.23)
length 31.3659
(1.30)
Constant 5540.3487
(0.94)
----------------------------
Adjusted R~d 0.2454
N 74.0000
----------------------------
* p<0.10, ** p<0.05, *** p<0.010
Note that delimit also appears to cause some issues.
EDIT:
You need to use estadd for that:
sysuse auto, clear
regress price mpg headroom length
estadd local fe Yes
esttab ., cells(b(star fmt(4)) t(par fmt(2))) legend ///
starlevels(* 0.10 ** 0.05 *** 0.010) stats(fe r2 N, ///
labels("Industry FE" "Adjusted R-squared")) ///
mlabels("Industry FEs") varlabels(_cons Constant)
----------------------------
(1)
Industry FEs
b/t
----------------------------
mpg -174.3133*
(-1.99)
headroom -520.2934
(-1.23)
length 31.3659
(1.30)
Constant 5540.3487
(0.94)
----------------------------
Industry FE Yes
Adjusted R~d 0.2454
N 74.0000
----------------------------
* p<0.10, ** p<0.05, *** p<0.010
I am running several simple regressions and I wish to save the value of the significance (P > |t|) of a regression for a given coefficient in a local macro.
For example, I know that:
local consCoeff = _b[_cons]
will save the coefficient for the constant, and that with _se[_cons] I can get the standard error. However, there doesn't seem to be any documentation on how to get the significance.
It would be best if the underscore format worked (like _pt etc.), but anything will do.
There is no need to calculate anything yourself because Stata already does that for you.
For example:
. sysuse auto, clear
(1978 Automobile Data)
. regress price weight mpg
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(2, 71) = 14.74
Model | 186321280 2 93160639.9 Prob > F = 0.0000
Residual | 448744116 71 6320339.67 R-squared = 0.2934
-------------+---------------------------------- Adj R-squared = 0.2735
Total | 635065396 73 8699525.97 Root MSE = 2514
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | 1.746559 .6413538 2.72 0.008 .467736 3.025382
mpg | -49.51222 86.15604 -0.57 0.567 -221.3025 122.278
_cons | 1946.069 3597.05 0.54 0.590 -5226.245 9118.382
------------------------------------------------------------------------------
The results are also returned in matrix r(table):
. matrix list r(table)
r(table)[9,3]
weight mpg _cons
b 1.7465592 -49.512221 1946.0687
se .64135379 86.156039 3597.0496
t 2.7232382 -.57468079 .54101802
pvalue .00812981 .56732373 .59018863
ll .46773602 -221.30248 -5226.2445
ul 3.0253823 122.27804 9118.3819
df 71 71 71
crit 1.9939434 1.9939434 1.9939434
eform 0 0 0
So for the p-value of, say weight, you type:
. matrix A = r(table)
. local pval = A[4,1]
. display `pval'
.00812981
The t-stat for the coefficient is the coefficient divided by the standard error. The p-value can then be calculated using the ttail function with the appropriate degrees of freedom. Since you are looking for the two-tailed p-value, the result gets multiplied by two.
In your case, the following should do it:
local consPvalue = (2 * ttail(e(df_r), abs(_b[cons]/_se[cons])))