I have a panel dataset with the following years:
tab year
year | Freq. Percent Cum.
------------+-----------------------------------
2000 | 31 12.55 12.55
2001 | 31 12.55 25.10
2002 | 30 12.15 37.25
2003 | 31 12.55 49.80
2004 | 31 12.55 62.35
2005 | 31 12.55 74.90
2006 | 31 12.55 87.45
2007 | 31 12.55 100.00
------------+-----------------------------------
Total | 247 100.00
When I do xtreg dv iv i.year, I see that year 2000 is not included, as well as 2007:
xtreg local_gr rtxdum i.year
note: 2007.year omitted because of collinearity
Random-effects GLS regression Number of obs = 247
Group variable: province_n~e Number of groups = 31
R-sq: Obs per group:
within = 0.6194 min = 7
between = 0.0016 avg = 8.0
overall = 0.2356 max = 8
Wald chi2(7) = 341.51
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
local_gr | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rtxdum | -753799.7 291543.7 -2.59 0.010 -1325215 -182384.5
|
year |
2001 | 388246 291543.7 1.33 0.183 -183169.2 959661.2
2002 | 745406.4 294294.5 2.53 0.011 168599.8 1322213
2003 | 1175610 291543.7 4.03 0.000 604194.4 1747025
2004 | 1773982 291543.7 6.08 0.000 1202567 2345397
2005 | 2600005 291543.7 8.92 0.000 2028589 3171420
2006 | 4425318 291543.7 15.18 0.000 3853903 4996734
2007 | 0 (omitted)
|
_cons | 1564670 447832.4 3.49 0.000 686934.1 2442405
-------------+----------------------------------------------------------------
sigma_u | 2217878.8
sigma_e | 1150064.9
rho | .78809251 (fraction of variance due to u_i)
------------------------------------------------------------------------------
The message says 2007 was omitted due to collinearity, but I don't understand why year 2000 would not show up in the results?
Because it is the base level. You can see it by using the allbaselevels option:
webuse nlswork, clear
xtset idcode
xtreg ln_w grade tenure i.race not_smsa south, allbaselevels
Random-effects GLS regression Number of obs = 28,091
Group variable: idcode Number of groups = 4,697
R-sq: Obs per group:
within = 0.1005 min = 1
between = 0.4498 avg = 6.0
overall = 0.3305 max = 15
Wald chi2(6) = 6509.50
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grade | .07605 .0018128 41.95 0.000 .0724969 .0796031
tenure | .0361319 .0006298 57.37 0.000 .0348975 .0373663
|
race |
white | 0 (base)
black | -.0530121 .0102916 -5.15 0.000 -.0731832 -.0328409
other | .0762678 .0415911 1.83 0.067 -.0052492 .1577849
|
not_smsa | -.1289554 .0074296 -17.36 0.000 -.1435172 -.1143936
south | -.0786512 .0075533 -10.41 0.000 -.0934555 -.063847
_cons | .6759773 .0244723 27.62 0.000 .6280125 .7239421
-------------+----------------------------------------------------------------
sigma_u | .26440074
sigma_e | .30295598
rho | .43235646 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Related
Similar to the question posed here, but I think I am not employing it correctly.
I used help fvvarlist to guide me on interactions.
I am employing a triple interaction with 3 binary variables:
As a toy model, let us assume:
x = gender (1 = male, 0 = female)
y = health (1 = good, 0 = poor)
z = employment (1 = employed, 0 = not employed)
using the following regression:
reg x##y##z if state == "NY" & year >1985
I am interested in the results for 1.x#1.y#1.z, but this coefficient is omitted.
1.x#1.y#1.z omitted because of collinearity
Is there a way I can keep this interaction?
It would be best to verify that you actually have this combination in your data with egen, group.
You should also use i. prefixes to keep Stata from treating your variables as continuous, which has the added benefit of a more informative error message: interaction identifies no observations in the sample rather than a mysterious collinearity one.
Here is a reproducible example:
. sysuse auto, clear
(1978 automobile data)
. sum mpg weight
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
mpg | 74 21.2973 5.785503 12 41
weight | 74 3019.459 777.1936 1760 4840
. gen efficient = mpg > 21
. lab define efficient 0 "Inefficient" 1 "Efficient"
. lab val efficient efficient
. gen heavy = weight > 3e3
. lab define heavy 0 "Light" 1 "Heavy"
. lab val heavy heavy
. egen group = group(foreign efficient heavy), label(group)
. tab group, sort
group(foreign efficient |
heavy) | Freq. Percent Cum.
---------------------------+-----------------------------------
Domestic Inefficient Heavy | 34 45.95 45.95
Foreign Efficient Light | 15 20.27 66.22
Domestic Efficient Light | 13 17.57 83.78
Foreign Inefficient Light | 5 6.76 90.54
Domestic Efficient Heavy | 3 4.05 94.59
Domestic Inefficient Light | 2 2.70 97.30
Foreign Inefficient Heavy | 2 2.70 100.00
---------------------------+-----------------------------------
Total | 74 100.00
. reg price c.foreign##c.efficient##c.heavy, robust
note: c.foreign#c.efficient#c.heavy omitted because of collinearity.
Linear regression Number of obs = 74
F(6, 67) = 74.67
Prob > F = 0.0000
R-squared = 0.2830
Root MSE = 2606.9
-----------------------------------------------------------------------------------------------
| Robust
price | Coefficient std. err. t P>|t| [95% conf. interval]
------------------------------+----------------------------------------------------------------
foreign | 3007.6 960.0626 3.13 0.003 1091.307 4923.893
efficient | 513.2308 394.6504 1.30 0.198 -274.4948 1300.956
|
c.foreign#c.efficient | -1810.164 1071.875 -1.69 0.096 -3949.636 329.3076
|
heavy | 3283.176 696.5873 4.71 0.000 1892.782 4673.571
|
c.foreign#c.heavy | 2462.724 1196.996 2.06 0.044 73.50896 4851.938
|
c.efficient#c.heavy | -2783.741 744.4813 -3.74 0.000 -4269.732 -1297.75
|
c.foreign#c.efficient#c.heavy | 0 (omitted)
|
_cons | 3739 332.9212 11.23 0.000 3074.486 4403.514
-----------------------------------------------------------------------------------------------
. reg price i.foreign##i.efficient##i.heavy, robust
note: 1.foreign#1.efficient#1.heavy identifies no observations in the sample.
Linear regression Number of obs = 74
F(6, 67) = 74.67
Prob > F = 0.0000
R-squared = 0.2830
Root MSE = 2606.9
------------------------------------------------------------------------------------------
| Robust
price | Coefficient std. err. t P>|t| [95% conf. interval]
-------------------------+----------------------------------------------------------------
foreign |
Foreign | 3007.6 960.0626 3.13 0.003 1091.307 4923.893
|
efficient |
Efficient | 513.2308 394.6504 1.30 0.198 -274.4948 1300.956
|
foreign#efficient |
Foreign#Efficient | -1810.164 1071.875 -1.69 0.096 -3949.636 329.3076
|
heavy |
Heavy | 3283.176 696.5873 4.71 0.000 1892.782 4673.571
|
foreign#heavy |
Foreign#Heavy | 2462.724 1196.996 2.06 0.044 73.50896 4851.938
|
efficient#heavy |
Efficient#Heavy | -2783.741 744.4813 -3.74 0.000 -4269.732 -1297.75
|
foreign#efficient#heavy |
Foreign#Efficient#Heavy | 0 (empty)
|
_cons | 3739 332.9212 11.23 0.000 3074.486 4403.514
------------------------------------------------------------------------------------------
There are no foreign, efficient, and heavy cars in the data, and when you let Stata know that you have categorical variables on the RHS, you get an understandable error message about why the triple interaction is missing.
I have a dataset in Stata that looks something like this
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
dv2 | 1,904 .5395645 .427109 -1.034977 1.071396
xvar | 1,904 3.074055 1.387308 1 5
with xvar being a categorical independent variable and dv2 a dependent variable of interest.
I am estimating a simple model with the categorical variable as a dummy:
reg dv2 ib4.xvar
eststo myest
Source | SS df MS Number of obs = 1,904
-------------+---------------------------------- F(4, 1899) = 13.51
Model | 9.60846364 4 2.40211591 Prob > F = 0.0000
Residual | 337.540713 1,899 .177746558 R-squared = 0.0277
-------------+---------------------------------- Adj R-squared = 0.0256
Total | 347.149177 1,903 .182422058 Root MSE = .4216
------------------------------------------------------------------------------
dv2 | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
xvar |
A | .015635 .0307356 0.51 0.611 -.044644 .075914
B | .1435987 .029325 4.90 0.000 .0860861 .2011113
C | .1711176 .0299331 5.72 0.000 .1124124 .2298228
E | .1337754 .0295877 4.52 0.000 .0757477 .1918032
|
_cons | .447794 .020191 22.18 0.000 .4081952 .4873928
------------------------------------------------------------------------------
These are the results. As you can see B, C and E have larger effect than D which is the excluded category.
However, coefplot does not account for the in categorical variable the coefficient is composite true_A=D+A.
coefplot myest, scheme(s1color) vert
As you can see the plot shows the constant to be the largest coefficient, while the other to be smaller.
Is there a systematic way I can adjust for this problem and plot the true coefficients and SEs of each category?
Thanks a lot for your help
In response to your second comment, here is an example of how you can use marginsplot to plot estimated effects from a linear regression.
sysuse auto, clear
replace price = price/100
reg price i.rep78, cformat(%9.2f)
------------------------------------------------------------------------------
price | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
rep78 |
2 | 14.03 23.56 0.60 0.554 -33.04 61.10
3 | 18.65 21.76 0.86 0.395 -24.83 62.13
4 | 15.07 22.21 0.68 0.500 -29.31 59.45
5 | 13.48 22.91 0.59 0.558 -32.28 59.25
|
_cons | 45.65 21.07 2.17 0.034 3.55 87.74
------------------------------------------------------------------------------
margins i.rep78, cformat(%9.2f)
------------------------------------------------------------------------------
| Delta-method
| Margin std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
rep78 |
1 | 45.65 21.07 2.17 0.034 3.55 87.74
2 | 59.68 10.54 5.66 0.000 38.63 80.73
3 | 64.29 5.44 11.82 0.000 53.42 75.16
4 | 60.72 7.02 8.64 0.000 46.68 74.75
5 | 59.13 8.99 6.58 0.000 41.18 77.08
------------------------------------------------------------------------------
marginsplot
Note that these values are the constant plus the appropriate coefficient.
And then using the marginsplot command we can produce the following plot, which includes the marginal estimates and confidence intervals:
I have the following model:
ivreg ldemand social_housing transport year (lprice = utilities)
However, I want to make year as a dummy variable.
How can I do it in Stata?
Using i.year doesn't work for the ivreg command.
Cross-posted on Statalist.
The command ivreg does not allow factor variables:
. webuse hsng2, clear
. ivreg rent pcturban i.region (hsngval = faminc)
factor variables not allowed
r(101);
However, you can still use the xi prefix to create dummies on the fly:
. xi: ivreg rent pcturban i.region (hsngval = faminc)
i.region _Iregion_1-4 (naturally coded; _Iregion_1 omitted)
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 50
-------------+---------------------------------- F(5, 44) = 9.10
Model | 12735.4667 5 2547.09334 Prob > F = 0.0000
Residual | 48507.6533 44 1102.44667 R-squared = 0.2079
-------------+---------------------------------- Adj R-squared = 0.1179
Total | 61243.12 49 1249.85959 Root MSE = 33.203
------------------------------------------------------------------------------
rent | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hsngval | .0038683 .0008958 4.32 0.000 .0020629 .0056737
pcturban | -.4980121 .5179779 -0.96 0.342 -1.541928 .5459039
_Iregion_2 | 1.528672 15.14086 0.10 0.920 -28.98572 32.04306
_Iregion_3 | 7.74279 15.10906 0.51 0.611 -22.70752 38.1931
_Iregion_4 | -40.61235 19.60999 -2.07 0.044 -80.13369 -1.091002
_cons | 88.26681 31.69154 2.79 0.008 24.39671 152.1369
------------------------------------------------------------------------------
Instrumented: hsngval
Instruments: pcturban _Iregion_2 _Iregion_3 _Iregion_4 faminc
------------------------------------------------------------------------------
It is important to note that according to the command's help file:
Out-of-date command
ivreg is an out-of-date command as of Stata 10. ivreg has been replaced with the ivregress command.
Thus, it is best to switch to ivregress instead:
. ivregress 2sls rent pcturban i.region (hsngval = faminc), small
Instrumental variables (2SLS) regression
Source | SS df MS Number of obs = 50
-------------+------------------------------ F( 5, 44) = 9.10
Model | 12735.4667 5 2547.09334 Prob > F = 0.0000
Residual | 48507.6533 44 1102.44667 R-squared = 0.2079
-------------+------------------------------ Adj R-squared = 0.1179
Total | 61243.12 49 1249.85959 Root MSE = 33.203
------------------------------------------------------------------------------
rent | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
hsngval | .0038683 .0008958 4.32 0.000 .0020629 .0056737
pcturban | -.4980121 .5179779 -0.96 0.342 -1.541928 .5459039
|
region |
N Cntrl | 1.528672 15.14086 0.10 0.920 -28.98572 32.04306
South | 7.74279 15.10906 0.51 0.611 -22.70752 38.1931
West | -40.61235 19.60999 -2.07 0.044 -80.13369 -1.091002
|
_cons | 88.26681 31.69154 2.79 0.008 24.39671 152.1369
------------------------------------------------------------------------------
Instrumented: hsngval
Instruments: pcturban 2.region 3.region 4.region faminc
Type help ivregress from Stata's command prompt for more details.
I want to do something very easy, but it doesnt work!
I need to see the predictions (and errors) of a GARCH model. The Main Variable es "dowclose", and my idea is look if the GARCH model has a good fitting on this variable.
Im using this easy code, but the prediction are just 0's
webuse dow1.dta
arch dowclose, noconstant arch(1) garch(1)
predict dow_hat, y
ARCH Results:
ARCH family regression
Sample: 1 - 9341 Number of obs = 9341
Distribution: Gaussian Wald chi2(.) = .
Log likelihood = -76191.43 Prob > chi2 = .
------------------------------------------------------------------------------
| OPG
dowclose | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
arch |
L1. | 1.00144 6.418855 0.16 0.876 -11.57929 13.58217
|
garch |
L1. | -.001033 6.264372 -0.00 1.000 -12.27898 12.27691
|
_cons | 56.60589 620784.7 0.00 1.000 -1216659 1216772
------------------------------------------------------------------------------
This is to be expected: you have no covariates and no intercept, so there's nothing to predict.
Here's a simple OLS regression that makes the problem apparent:
. sysuse auto
(1978 Automobile Data)
. reg price, nocons
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 0, 74) = 0.00
Model | 0 0 . Prob > F = .
Residual | 3.4478e+09 74 46592355.7 R-squared = 0.0000
-------------+------------------------------ Adj R-squared = 0.0000
Total | 3.4478e+09 74 46592355.7 Root MSE = 6825.9
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
. predict phat
(option xb assumed; fitted values)
. sum phat
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
phat | 74 0 0 0 0
I'm using the user-written command chest in Stata to look at the change-in-estimate with the variables in my model.
After running the linear regression of
regress age allelecount gender htn_g dm_g lipid_g i.hx_smoking b_bmi hx_med_asa if cadhx2==0
I run the chest command
chest allelecount, backward nograph
but I only get output for one variable
chest allelecount, backward
Change-in-estimate
regress regression. Outcome: age
number of obs = 476 Exposure: allelecount
----------------------------------------------------------
Variables |
removed | Coef. [95% Conf. Interval] Change, %
----------+-----------------------------------------------
Adj.All | -0.3691 -0.6819 -0.0564
-lipid_g | -0.3688 -0.6804 -0.0571 -0.0996
----------------------------------------------------------
Can anyone explain this?
Using the auto data of Stata, I find no problem:
sysuse auto
regress price mpg rep78 headroom
Source | SS df MS Number of obs = 69
-------------+------------------------------ F( 3, 65) = 7.51
Model | 148497605 3 49499201.8 Prob > F = 0.0002
Residual | 428299354 65 6589220.82 R-squared = 0.2575
-------------+------------------------------ Adj R-squared = 0.2232
Total | 576796959 68 8482308.22 Root MSE = 2566.9
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -289.3462 62.53921 -4.63 0.000 -414.2456 -164.4467
rep78 | 670.8971 343.5213 1.95 0.055 -15.16242 1356.957
headroom | -300.0293 398.0516 -0.75 0.454 -1094.993 494.9346
_cons | 10921.33 2153.003 5.07 0.000 6621.487 15221.17
chest mpg,backward
Change-in-estimate
regress regression. Outcome: price
number of obs = 69 Exposure: mpg
----------------------------------------------------------
Variables |
removed | Coef. [95% Conf. Interval] Change, %
----------+-----------------------------------------------
Adj.All | -289.3462 -411.9208 -166.7715
-headroom | -271.6425 -384.8719 -158.4132 -6.1185
-rep78 | -226.3607 -332.1613 -120.5600 -16.6697
----------------------------------------------------------