I have ran a regression of type
reg foo I.year
and would like to plot the yearly effects. The regression result table looks like this:
foo | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
year |
2001 | .1253994 .0047826 26.22 0.000 .1160255 .1347734
2002 | .06168 .0045566 13.54 0.000 .052749 .0706109
2003 | .1324228 .005008 26.44 0.000 .122607 .1422385
2004 | .1177605 .0051766 22.75 0.000 .1076143 .1279066
2005 | .1007163 .005018 20.07 0.000 .090881 .1105516
2006 | .0792936 .0047979 16.53 0.000 .0698897 .0886974
Unfortunately, when I use coefplot, vert, it says on the x-axis Survey year=2001, Survey year=2002 and so on which consumes a lot of space. I understand that coeflabels allows me to relabel coefficients, but do I have to do that for every single one of these? What if I had 30 years - is there a more generic version of relabeling it?
Sounds like a weird solution but it did work for me.
Simply add any value label to your survey year variable and it should recognize the years as their values.
In case adding any value label does not work, you can create a loop to set a value label for each year as its own year.
levelsof year, local(years)
foreach lvl of local years {
lab def year `lvl' "`lvl'", modify
}
lab val year year
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
So I'm in a bit of a trouble. I need to put constraint on my data in state to run a mlogit, but it keeps stating that the equation have not been found. Can someone help?
It is hard to determine exactly what is going wrong since you don't provide complete output or an MCVE. Error 303 means:
You referred to a coefficient or stored result corresponding to an
equation or outcome that cannot be found.
This means you probably are defining the constraint incorrectly by referring to the numeric values rather than the corresponding value label that Stata used to label each equation.
Instead of
constraint 1 [insure=3]:age =-1*[insure=2]:age
try something like this:
. webuse sysdsn1
(Health insurance data)
. label list insure
insure:
1 Indemnity
2 Prepaid
3 Uninsure
. constraint 1 [Uninsure]:age =-1*[Prepaid]:age
. mlogit insure age male i.site, constraints(1) nolog
Multinomial logistic regression Number of obs = 615
Wald chi2(7) = 20.43
Log likelihood = -544.32915 Prob > chi2 = 0.0047
( 1) [Prepaid]age + [Uninsure]age = 0
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Indemnity | (base outcome)
-------------+----------------------------------------------------------------
Prepaid |
age | -.0054932 .0046759 -1.17 0.240 -.0146579 .0036714
male | .4673414 .1980637 2.36 0.018 .0791436 .8555392
|
site |
2 | -.0030463 .2045162 -0.01 0.988 -.4038907 .397798
3 | -.4001761 .2179144 -1.84 0.066 -.8272804 .0269282
|
_cons | .1940359 .2672973 0.73 0.468 -.3298572 .717929
-------------+----------------------------------------------------------------
Uninsure |
age | .0054932 .0046759 1.17 0.240 -.0036714 .0146579
male | .4015065 .3642457 1.10 0.270 -.3124019 1.115415
|
site |
2 | -1.192478 .4670081 -2.55 0.011 -2.107797 -.2771591
3 | -.1393011 .3592712 -0.39 0.698 -.8434596 .5648575
|
_cons | -1.868834 .3596923 -5.20 0.000 -2.573818 -1.16385
------------------------------------------------------------------------------
I am working with several kinds of regressions in Stata (probit, logit, quantile regression,...) I would like to know how to predict the dependent variable at the regressors' sample means. This is straightforward for OLS, but donĀ“t see how to get it for a quantile regression.
The margins command is useful for this:
. sysuse auto
(1978 Automobile Data)
. qreg price weight length i.foreign, nolog
Median regression Number of obs = 74
Raw sum of deviations 71102.5 (about 4934)
Min sum of deviations 54411.29 Pseudo R2 = 0.2347
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | 3.933588 1.328718 2.96 0.004 1.283543 6.583632
length | -41.25191 45.46469 -0.91 0.367 -131.9284 49.42456
|
foreign |
Foreign | 3377.771 885.4198 3.81 0.000 1611.857 5143.685
_cons | 344.6489 5182.394 0.07 0.947 -9991.31 10680.61
------------------------------------------------------------------------------
. margins, at((mean) _continuous (base) _factor)
Warning: cannot perform check for estimable functions.
Adjusted predictions Number of obs = 74
Model VCE : IID
Expression : Linear prediction, predict()
at : weight = 3019.459 (mean)
length = 187.9324 (mean)
foreign = 0
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 4469.386 418.7774 10.67 0.000 3648.597 5290.175
This predicts the median at means of the covariates for continuous variables and the base for the dummies (so you can avoid nonsensical values like fractionally pregnant).
I'm running a regression in Stata for which I would like to use cluster2 (http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/se_programming.htm).
I encounter the following problem. Stata reports factor variables and time-series operators not allowed. I am using a large vector of controls, extensively applying the methods Stata offers for interactions.
For example: state##c.wind_speed##L.c.relative_humidity. cluster2 and also other Stata packages do not allow to include such expressions as independent variables. Is there a productive way how to create such a long vector of interaction variables myself?
I believe that one can trick ivreg2 by Baum-Shaffer-Stillman into running OLS with two-way clustering and interactions thusly:
. webuse nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. ivreg2 ln_w grade c.age##c.ttl_exp tenure, cluster(idcode year)
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on idcode and year
Number of clusters (idcode) = 4697 Number of obs = 28099
Number of clusters (year) = 15 F( 5, 14) = 674.29
Prob > F = 0.0000
Total (centered) SS = 6414.823933 Centered R2 = 0.3206
Total (uncentered) SS = 85448.21266 Uncentered R2 = 0.9490
Residual SS = 4357.997339 Root MSE = .3938
---------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
grade | .0734785 .002644 27.79 0.000 .0682964 .0786606
age | -.0005405 .002259 -0.24 0.811 -.0049681 .0038871
ttl_exp | .0656393 .0068499 9.58 0.000 .0522138 .0790648
|
c.age#c.ttl_exp | -.0010539 .0002217 -4.75 0.000 -.0014885 -.0006194
|
tenure | .0197137 .0029555 6.67 0.000 .013921 .0255064
_cons | .5165052 .0529343 9.76 0.000 .4127559 .6202544
---------------------------------------------------------------------------------
Included instruments: grade age ttl_exp c.age#c.ttl_exp tenure
------------------------------------------------------------------------------
Just to be sure compare that to OLS coefficients:
. reg ln_w grade c.age##c.ttl_exp tenure
Source | SS df MS Number of obs = 28,099
-------------+---------------------------------- F(5, 28093) = 2651.79
Model | 2056.82659 5 411.365319 Prob > F = 0.0000
Residual | 4357.99734 28,093 .155127517 R-squared = 0.3206
-------------+---------------------------------- Adj R-squared = 0.3205
Total | 6414.82393 28,098 .228301798 Root MSE = .39386
---------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
grade | .0734785 .0010414 70.55 0.000 .0714373 .0755198
age | -.0005405 .000663 -0.82 0.415 -.0018401 .0007591
ttl_exp | .0656393 .0030809 21.31 0.000 .0596007 .0716779
|
c.age#c.ttl_exp | -.0010539 .0000856 -12.32 0.000 -.0012216 -.0008862
|
tenure | .0197137 .0008568 23.01 0.000 .0180344 .021393
_cons | .5165052 .0206744 24.98 0.000 .4759823 .557028
---------------------------------------------------------------------------------
You don't include a verifiable example. See https://stackoverflow.com/help/mcve for key advice.
At first sight, however, the problem is that cluster2 is an oldish program written in 2006/2007 whose syntax statement just doesn't allow factor variables.
You could try hacking a clone of the program to fix that; I have no idea whether that would be sufficient.
No specific comment is possible on the "other Stata packages" you imply to have the same problem except that it may well arise for the same reason. Factor variables were introduced in Stata 11 (see here for documentation) in 2009 and older programs won't allow them without modification.
In general, I would ask questions like this on Statalist. It's quite likely that this program has been superseded by some different program.
If you find a Stata program on the internet without a help file, as appears to be the case here, it is usually an indicator that the program was written ad hoc and is not being maintained. In this case, it is evident also that the program has not been updated in the 6 years since Stata 11.
You could also, as you imply, just create the interaction variables yourself. I don't think anyone has written a really general tool to automate that: there would be no point (since 2009) in a complicated alternative to factor variable notation.
I'm trying to do a multinomial logit and my independent variables are categorical. I have two categorical variables - edu1 for those with high school degrees and edu2 for those with college degrees. The variables are dummy variables (edu1=1 denotes those with high school degrees, edu1=0 without) I want the results so that I can compare results against those who have college degrees. However, when I do mlogit edu*, the model automatically includes edu1 not edu2 in the model. Is there way to reverse this and include edu2 and not include edu1 instead?
You can't have both in the model unless you drop the constant. Google "dummy variable trap" to see why. Here's an example:
. webuse sysdsn1, clear
(Health insurance data)
. recode male (0=1) (1=0), gen(female)
(644 differences between male and female)
. mlogit insure male female, nocons nolog
Multinomial logistic regression Number of obs = 616
Wald chi2(4) = 149.44
Log likelihood = -553.40712 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Indemnity | (base outcome)
-------------+----------------------------------------------------------------
Prepaid |
male | .3001046 .1703301 1.76 0.078 -.0337363 .6339455
female | -.1772065 .0968274 -1.83 0.067 -.3669847 .0125718
-------------+----------------------------------------------------------------
Uninsure |
male | -1.529395 .3059244 -5.00 0.000 -2.128996 -.9297944
female | -1.989585 .1884768 -10.56 0.000 -2.358993 -1.620177
------------------------------------------------------------------------------
. mlogit insure male, nolog
Multinomial logistic regression Number of obs = 616
LR chi2(2) = 6.38
Prob > chi2 = 0.0413
Log likelihood = -553.40712 Pseudo R2 = 0.0057
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Indemnity | (base outcome)
-------------+----------------------------------------------------------------
Prepaid |
male | .477311 .1959283 2.44 0.015 .0932987 .8613234
_cons | -.1772065 .0968274 -1.83 0.067 -.3669847 .0125718
-------------+----------------------------------------------------------------
Uninsure |
male | .46019 .3593233 1.28 0.200 -.2440708 1.164451
_cons | -1.989585 .1884768 -10.56 0.000 -2.358993 -1.620177
------------------------------------------------------------------------------
Note that in second specification, the constant is the female effect and males get constant plus the male coefficient. This matches what you get with the no constant specification above.
If you have other dummies in the model, things get a bit more complicated. The constant will correspond to all the omitted categories from each set of dummy variables.
I was wondering if there's a way to include panel-specific or just varying trends in a first-difference regression when clustering on the panel id and the time variable.
Here's an example of with Stata:
. webuse nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. ivreg2 S1.(ln_wage tenure) , cluster(idcode year)
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on idcode and year
Number of clusters (idcode) = 3660 Number of obs = 10528
Number of clusters (year) = 8 F( 1, 7) = 2.81
Prob > F = 0.1378
Total (centered) SS = 1004.098948 Centered R2 = 0.0007
Total (uncentered) SS = 1035.845686 Uncentered R2 = 0.0314
Residual SS = 1003.36326 Root MSE = .3087
------------------------------------------------------------------------------
| Robust
S.ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tenure |
S1. | .0076418 .0042666 1.79 0.073 -.0007206 .0160043
|
_cons | .0501738 .0070986 7.07 0.000 .0362608 .0640868
------------------------------------------------------------------------------
Included instruments: S.tenure
------------------------------------------------------------------------------
. ivreg2 S1.(ln_wage tenure i.c_city), cluster(idcode year)
factor variables not allowed
r(101);
In the specification above, the constant corresponds to a common time trend. Putting the factor variable outside the seasonal difference operator errors as well.
I understand that the differencing operator does not play well with factor variables or interactions, but I feel there must be some hack to get around that.
The ivreg2 is a bit of a red herring. I am not doing IV estimation, I just want to use two-way clustering.
You get the same solution as #Metrics if you do xi: ivreg2 S1.(ln_wage tenure) i.ind_code , cluster(idcode year)