I am running a multi-level model with interaction.
melogit outcome i.var1##i.var2 ||lev2:, or
I also tried using runmlwin as it is a lot faster.
runmlwin outcome i.var1##i.var2 cons,level2(lev2: cons) level1(_n:) nopause or discrete(distribution(binomial) link(logit) denominator(cons))
The estimates are pretty much similar.
The problem is that I want to use pwcompare afterrunmlwin similar to the way I can use after melogit.
It is throwing an error. How can I achieve pairwise comparison?
Edited:
I was able to calculate contrast manually from e(b) table. However, I have an issue calculating confidence intervals. For a fixed-effect model, I can use e(V) to get a variance which can be used to get confidence bounds. However, if it's a random intercept model, I am unable to calculate variance. e.g. In the following example, I am trying to SE for (urban#1) vs (urban#0) similar to the output in pwcompare but to no avail.
. webuse bangladesh, clear
(Bangladesh Fertility Survey, 1989)
.
. gen children1 = 0 if inlist(children ,0,1)
(1,050 missing values generated)
.
. replace children1 = 1 if inlist(children ,2,3)
(1,050 real changes made)
.
. melogit c_use i.urban##i.children1 || district:
Fitting fixed-effects model:
Iteration 0: log likelihood = -1258.2413
Iteration 1: log likelihood = -1256.5543
Iteration 2: log likelihood = -1256.5533
Iteration 3: log likelihood = -1256.5533
Refining starting values:
Grid node 0: log likelihood = -1248.2739
Fitting full model:
Iteration 0: log likelihood = -1248.2739 (not concave)
Iteration 1: log likelihood = -1236.1736
Iteration 2: log likelihood = -1235.4412
Iteration 3: log likelihood = -1235.4296
Iteration 4: log likelihood = -1235.4296
Mixed-effects logistic regression Number of obs = 1,934
Group variable: district Number of groups = 60
Obs per group:
min = 2
avg = 32.2
max = 118
Integration method: mvaghermite Integration pts. = 7
Wald chi2(3) = 61.28
Log likelihood = -1235.4296 Prob > chi2 = 0.0000
---------------------------------------------------------------------------------
c_use | Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+----------------------------------------------------------------
urban |
urban | .7510512 .1632623 4.60 0.000 .4310631 1.071039
1.children1 | .5938377 .1216843 4.88 0.000 .3553409 .8323344
|
urban#children1 |
urban#1 | -.1030926 .2136775 -0.48 0.629 -.5218927 .3157076
|
_cons | -1.049338 .114393 -9.17 0.000 -1.273545 -.8251322
----------------+----------------------------------------------------------------
district |
var(_cons)| .2063726 .0705496 .1056001 .4033108
---------------------------------------------------------------------------------
LR test vs. logistic model: chibar2(01) = 42.25 Prob >= chibar2 = 0.0000
.
. pwcompare i.urban##i.children1
Pairwise comparisons of marginal linear predictions
Margins : asbalanced
-------------------------------------------------------------------------
| Unadjusted
| Contrast Std. Err. [95% Conf. Interval]
------------------------+------------------------------------------------
c_use |
urban |
urban vs rural | .6995049 .1172543 .4696907 .9293191
|
children1 |
1 vs 0 | .5422914 .1075603 .331477 .7531058
|
urban#children1 |
(rural#1) vs (rural#0) | .5938377 .1216843 .3553409 .8323344
(urban#0) vs (rural#0) | .7510512 .1632623 .4310631 1.071039
(urban#1) vs (rural#0) | 1.241796 .1639783 .9204047 1.563188
(urban#0) vs (rural#1) | .1572135 .1540996 -.1448162 .4592432
(urban#1) vs (rural#1) | .6479586 .1538558 .3464068 .9495104
(urban#1) vs (urban#0) | .4907451 .1765231 .1447661 .8367241
-------------------------------------------------------------------------
.
. mat list e(V)
symmetric e(V)[10,10]
c_use: c_use: c_use: c_use: c_use: c_use: c_use: c_use: c_use: /:
0b. 1. 0b. 1. 0b.urban# 0b.urban# 1o.urban# 1.urban# var(
urban urban children1 children1 0b.children1 1o.children1 0b.children1 1.children1 _cons _cons[dis~t])
c_use:
0b.urban 0
1.urban 0 .02665456
0b.children1 0 0 0
1.children1 0 .00885747 0 .01480706
0b.urban#
0b.children1 0 0 0 0 0
0b.urban#
1o.children1 0 0 0 0 0 0
1o.urban#
0b.children1 0 0 0 0 0 0 0
1.urban#
1.children1 0 -.02432051 0 -.01465235 0 0 0 .04565806
_cons 0 -.00890614 0 -.00901114 0 0 0 .00873016 .01308577
/:
var(
_cons[dis~t]) 0 -.00073136 0 .0003903 0 0 0 .00039822 -.0009874 .00497724
.
Related
Similar to the question posed here, but I think I am not employing it correctly.
I used help fvvarlist to guide me on interactions.
I am employing a triple interaction with 3 binary variables:
As a toy model, let us assume:
x = gender (1 = male, 0 = female)
y = health (1 = good, 0 = poor)
z = employment (1 = employed, 0 = not employed)
using the following regression:
reg x##y##z if state == "NY" & year >1985
I am interested in the results for 1.x#1.y#1.z, but this coefficient is omitted.
1.x#1.y#1.z omitted because of collinearity
Is there a way I can keep this interaction?
It would be best to verify that you actually have this combination in your data with egen, group.
You should also use i. prefixes to keep Stata from treating your variables as continuous, which has the added benefit of a more informative error message: interaction identifies no observations in the sample rather than a mysterious collinearity one.
Here is a reproducible example:
. sysuse auto, clear
(1978 automobile data)
. sum mpg weight
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
mpg | 74 21.2973 5.785503 12 41
weight | 74 3019.459 777.1936 1760 4840
. gen efficient = mpg > 21
. lab define efficient 0 "Inefficient" 1 "Efficient"
. lab val efficient efficient
. gen heavy = weight > 3e3
. lab define heavy 0 "Light" 1 "Heavy"
. lab val heavy heavy
. egen group = group(foreign efficient heavy), label(group)
. tab group, sort
group(foreign efficient |
heavy) | Freq. Percent Cum.
---------------------------+-----------------------------------
Domestic Inefficient Heavy | 34 45.95 45.95
Foreign Efficient Light | 15 20.27 66.22
Domestic Efficient Light | 13 17.57 83.78
Foreign Inefficient Light | 5 6.76 90.54
Domestic Efficient Heavy | 3 4.05 94.59
Domestic Inefficient Light | 2 2.70 97.30
Foreign Inefficient Heavy | 2 2.70 100.00
---------------------------+-----------------------------------
Total | 74 100.00
. reg price c.foreign##c.efficient##c.heavy, robust
note: c.foreign#c.efficient#c.heavy omitted because of collinearity.
Linear regression Number of obs = 74
F(6, 67) = 74.67
Prob > F = 0.0000
R-squared = 0.2830
Root MSE = 2606.9
-----------------------------------------------------------------------------------------------
| Robust
price | Coefficient std. err. t P>|t| [95% conf. interval]
------------------------------+----------------------------------------------------------------
foreign | 3007.6 960.0626 3.13 0.003 1091.307 4923.893
efficient | 513.2308 394.6504 1.30 0.198 -274.4948 1300.956
|
c.foreign#c.efficient | -1810.164 1071.875 -1.69 0.096 -3949.636 329.3076
|
heavy | 3283.176 696.5873 4.71 0.000 1892.782 4673.571
|
c.foreign#c.heavy | 2462.724 1196.996 2.06 0.044 73.50896 4851.938
|
c.efficient#c.heavy | -2783.741 744.4813 -3.74 0.000 -4269.732 -1297.75
|
c.foreign#c.efficient#c.heavy | 0 (omitted)
|
_cons | 3739 332.9212 11.23 0.000 3074.486 4403.514
-----------------------------------------------------------------------------------------------
. reg price i.foreign##i.efficient##i.heavy, robust
note: 1.foreign#1.efficient#1.heavy identifies no observations in the sample.
Linear regression Number of obs = 74
F(6, 67) = 74.67
Prob > F = 0.0000
R-squared = 0.2830
Root MSE = 2606.9
------------------------------------------------------------------------------------------
| Robust
price | Coefficient std. err. t P>|t| [95% conf. interval]
-------------------------+----------------------------------------------------------------
foreign |
Foreign | 3007.6 960.0626 3.13 0.003 1091.307 4923.893
|
efficient |
Efficient | 513.2308 394.6504 1.30 0.198 -274.4948 1300.956
|
foreign#efficient |
Foreign#Efficient | -1810.164 1071.875 -1.69 0.096 -3949.636 329.3076
|
heavy |
Heavy | 3283.176 696.5873 4.71 0.000 1892.782 4673.571
|
foreign#heavy |
Foreign#Heavy | 2462.724 1196.996 2.06 0.044 73.50896 4851.938
|
efficient#heavy |
Efficient#Heavy | -2783.741 744.4813 -3.74 0.000 -4269.732 -1297.75
|
foreign#efficient#heavy |
Foreign#Efficient#Heavy | 0 (empty)
|
_cons | 3739 332.9212 11.23 0.000 3074.486 4403.514
------------------------------------------------------------------------------------------
There are no foreign, efficient, and heavy cars in the data, and when you let Stata know that you have categorical variables on the RHS, you get an understandable error message about why the triple interaction is missing.
I am running a simple regression of race times against temperature just to develop some basic intuition. My data-set is very large and each observation is the race completion time of a unit in a given race, in a given year.
For starters I am running a very simple regression of race time on temperature bins.
Summary of temp variable:
|
Variable | Obs Mean Std. Dev Min Max
------------+--------------------------------------------
avg_temp_scc| 8309434 54.3 9.4 0 89
Summary of time variable:
Variable | Obs Mean Std. Dev Min Max
------------+--------------------------------------------
chiptime | 8309434 267.5 59.6 122 1262
I decided to make 10 degree bins for temperature and regress time against those.
The code is:
egen temp_trial = cut(avg_temp_scc), at(0,10,20,30,40,50,60,70,80,90)
reg chiptime i.temp_trial
The output is
Source | SS df MS Number of obs = 8309434
---------+------------------------------ F( 8,8309425) =69509.83
Model | 1.8525e+09 8 231557659 Prob > F = 0.0000
Residual | 2.7681e+108309425 3331.29368 R-squared = 0.0627
-----+-------------------------------- Adj R-squared = 0.0627
Total | 2.9534e+108309433 3554.22521 Root MSE = 57.717
chiptime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------+----------------------------------------------------------------
temp_trial |
10 | -26.63549 2.673903 -9.96 0.000 -31.87625 -21.39474
20 | 10.23883 1.796236 5.70 0.000 6.71827 13.75939
30 | -16.1049 1.678432 -9.60 0.000 -19.39457 -12.81523
40 | -13.97918 1.675669 -8.34 0.000 -17.26343 -10.69493
50 | -10.18371 1.675546 -6.08 0.000 -13.46772 -6.899695
60 | -.6865365 1.675901 -0.41 0.682 -3.971243 2.59817
70 | 44.42869 1.676883 26.49 0.000 41.14206 47.71532
80 | 23.63064 1.766566 13.38 0.000 20.16824 27.09305
_cons | 273.1366 1.675256 163.04 0.000 269.8531 276.42
So stata correctly drops the one of the bins (in this case 0-10) of temperature.
Now I manually created the bins and ran the regression again:
gen temp0 = 1 if temp_trial==0
replace temp0 = 0 if temp_trial!=0
gen temp1 = 1 if temp_trial == 10
replace temp1 = 0 if temp_trial != 10
gen temp2 = 1 if temp_trial==20
replace temp2 = 0 if temp_trial!=20
gen temp3 = 1 if temp_trial==30
replace temp3 = 0 if temp_trial!=30
gen temp4=1 if temp_trial==40
replace temp4=0 if temp_trial!=40
gen temp5=1 if temp_trial==50
replace temp5=0 if temp_trial!=50
gen temp6=1 if temp_trial==60
replace temp6=0 if temp_trial!=60
gen temp7=1 if temp_trial==70
replace temp7=0 if temp_trial!=70
gen temp8=1 if temp_trial==80
replace temp8=0 if temp_trial!=80
reg chiptime temp0 temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8
The output is:
Source | SS df MS Number of obs = 8309434
---------+------------------------------ F( 9,8309424) =61786.51
Model | 1.8525e+09 9 205829030 Prob > F = 0.0000
Residual | 2.7681e+108309424 3331.29408 R-squared = 0.0627
--------+------------------------------ Adj R-squared = 0.0627
Total | 2.9534e+108309433 3554.22521 Root MSE = 57.717
--------------------------------------------------------------------------
chiptime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+----------------------------------------------------------------
temp0 | -54.13245 6050.204 -0.01 0.993 -11912.32 11804.05
temp1 | -80.76794 6050.204 -0.01 0.989 -11938.95 11777.42
temp2 | -43.89362 6050.203 -0.01 0.994 -11902.08 11814.29
temp3 | -70.23735 6050.203 -0.01 0.991 -11928.42 11787.94
temp4 | -68.11162 6050.203 -0.01 0.991 -11926.29 11790.07
temp5 | -64.31615 6050.203 -0.01 0.992 -11922.5 11793.87
temp6 | -54.81898 6050.203 -0.01 0.993 -11913 11803.36
temp7 | -9.703755 6050.203 -0.00 0.999 -11867.89 11848.48
temp8 | -30.5018 6050.203 -0.01 0.996 -11888.68 11827.68
_cons | 327.269 6050.203 0.05 0.957 -11530.91 12185.45
Note the bins are exhaustive of the entire data set and stata is including a constant in the regression and none of the bins are getting dropped. Is this not incorrect? Given that the constant is being included in the regression, shouldn't one of the bins get dropped to make it the "base case"? I feel as though I am missing something obvious here.
Edit:
Here is a dropbox link for the data and do file:
It contains only the two variables under consideration. The file is 129 mb. I also have a picture of my output at the link.
This too is not an answer, but an extended comment, since I'm tired of fighting with the 600-character limit and the freeze on editing after 5 minutes.
In the comment thread on the original post, #user52932 wrote
Thank you for verifying this. Can you elaborate on what exactly this
precision issue is? Does this only cause problems in this
multicollinearity issue? Could it be that when I am using factor
variables this precision issue may cause my estimates to be wrong?
I want to be unambiguous that the results from the regression using factor variables are as correct as those of any well-specified regression can be.
In the regression using dummy variables, the model was misspecified to include a set of multicollinear variables. Stata is then faulted for failing to detect the multicollinearity.
But there's no magic test for multicollinearity. It's inferred from characteristics of the cross-products matrix. In this case the cross-products matrix represents 8.3 million observations, and despite Stata's use of double-precision throughout, the calculated matrix passed Stata's test and was not detected as containing a multicollinear set of variables. This is the locus of the precision problem to which I referred. Note that by reordering the observations, the accumulated cross-products matrix differed enough so that it now failed Stata's test, and the misspecification was detected.
Now look at the results in the original post obtained from this misspecified regression. Note that if you add 54.13245 to the coefficients on each of the dummy variables and subtract the same amount from the constant, the resulting coefficients and constant are identical to those in the regression using factor variables. This is the textbook definition of the problem with multicollinearity - not that the coefficient estimates are wrong, but that the coefficient estimates are not uniquely defined.
In a comment above, #user52932 wrote
I am unsure what Stata is using as the base case in my data.
The answer is that Stata used no base case; the results are what are to be expected when a set of multicollinear variables is included among the independent variables.
So this question is a reminder to us that statistical packages like Stata cannot infallibly detect multicollinearity. As it turns out, that's part of the genius of factor variable notation, I realize now. With factor variable notation, you tell Stata to create a set of dummy variables that by definition will be multicollinear, and since it understands that relationship between the dummy variables, it can eliminate the multicollinearity ex ante, before constructing the cross-products matrix, rather than attempt to infer the problem ex post, using the cross-products matrix's characteristics.
We should not be surprised that Stata occasionally fails to detect multicollinearity, but rather gratified that it does as well as it does at doing so. After all, the second model is indeed a misspecification, which constitutes an unambiguous violation of the assumptions of OLS regression on the user's part.
This may not be an "answer" but it's too long for a comment, so I write it here.
My results are different. At the final regression, one variable is dropped:
. clear all
. set obs 8309434
number of observations (_N) was 0, now 8,309,434
. set seed 1
. gen avg_temp_scc = floor(90*uniform())
. egen temp_trial = cut(avg_temp_scc), at(0,10,20,30,40,50,60,70,80,90)
. gen chiptime = rnormal()
. reg chiptime i.temp_trial
Source | SS df MS Number of obs = 8,309,434
-------------+---------------------------------- F(8, 8309425) = 0.88
Model | 7.07729775 8 .884662219 Prob > F = 0.5282
Residual | 8308356.5 8,309,425 .999871411 R-squared = 0.0000
-------------+---------------------------------- Adj R-squared = -0.0000
Total | 8308363.58 8,309,433 .9998713 Root MSE = .99994
------------------------------------------------------------------------------
chiptime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
temp_trial |
10 | .0010732 .0014715 0.73 0.466 -.0018109 .0039573
20 | .0003255 .0014713 0.22 0.825 -.0025581 .0032092
30 | .0017061 .0014713 1.16 0.246 -.0011776 .0045897
40 | .0003128 .0014717 0.21 0.832 -.0025718 .0031973
50 | .0007142 .0014715 0.49 0.627 -.0021699 .0035983
60 | .0021693 .0014716 1.47 0.140 -.0007149 .0050535
70 | -.0008265 .0014715 -0.56 0.574 -.0037107 .0020577
80 | -.0005001 .0014714 -0.34 0.734 -.0033839 .0023837
|
_cons | -.0006364 .0010403 -0.61 0.541 -.0026753 .0014025
------------------------------------------------------------------------------
. * "qui tab temp_trial, gen(temp)" is more convenient than "forv ..."
. forv k = 0/8 {
2. gen temp`k' = temp_trial==`k'0
3. }
. reg chiptime temp0-temp8
note: temp6 omitted because of collinearity
Source | SS df MS Number of obs = 8,309,434
-------------+---------------------------------- F(8, 8309425) = 0.88
Model | 7.07729775 8 .884662219 Prob > F = 0.5282
Residual | 8308356.5 8,309,425 .999871411 R-squared = 0.0000
-------------+---------------------------------- Adj R-squared = -0.0000
Total | 8308363.58 8,309,433 .9998713 Root MSE = .99994
------------------------------------------------------------------------------
chiptime | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
temp0 | -.0021693 .0014716 -1.47 0.140 -.0050535 .0007149
temp1 | -.0010961 .0014719 -0.74 0.456 -.003981 .0017888
temp2 | -.0018438 .0014717 -1.25 0.210 -.0047282 .0010407
temp3 | -.0004633 .0014717 -0.31 0.753 -.0033477 .0024211
temp4 | -.0018566 .0014721 -1.26 0.207 -.0047419 .0010287
temp5 | -.0014551 .0014719 -0.99 0.323 -.00434 .0014298
temp6 | 0 (omitted)
temp7 | -.0029958 .0014719 -2.04 0.042 -.0058808 -.0001108
temp8 | -.0026694 .0014718 -1.81 0.070 -.005554 .0002152
_cons | .0015329 .0010408 1.47 0.141 -.0005071 .0035729
------------------------------------------------------------------------------
The difference with yours is: (i) different data (I generated random numbers), (ii) I used a forvalue loop instead of manual variable creation. Yet, I see no errors in your codes.
I have a categorical variable n_produttore: A-B-C-D-E-F
and the output of multinomial logit (mlogit).
How do I create the confusion matrix?
mlogit n_produttore UVAtevola NUTILIZZODIVOLTE EXPOSTFIORITURA DIMENSIONE DOSI
Iteration 0: log likelihood = -898.93386
Iteration 1: log likelihood = -868.27679
Iteration 2: log likelihood = -864.38774
Iteration 3: log likelihood = -864.28614
Iteration 4: log likelihood = -864.26279
Iteration 5: log likelihood = -864.25805
Iteration 6: log likelihood = -864.25724
Iteration 7: log likelihood = -864.25705
Iteration 8: log likelihood = -864.25701
Iteration 9: log likelihood = -864.257
Multinomial logistic regression Number of obs = 929
LR chi2(25) = 69.35
Prob > chi2 = 0.0000
Log likelihood = -864.257 Pseudo R2 = 0.0386
----------------------------------------------------------------------------------
n_produttore | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
ALTRO | (base outcome)
-----------------+----------------------------------------------------------------
A |
UVAtevola | -.1579633 .3240817 -0.49 0.626 -.7931517 .4772251
NUTILIZZODIVOLTE | -.2306291 .0957196 -2.41 0.016 -.418236 -.0430221
EXPOSTFIORITURA | -.1879822 .2277447 -0.83 0.409 -.6343536 .2583893
DIMENSIONE | -.0621528 .022512 -2.76 0.006 -.1062755 -.01803
DOSI | .0749472 .0469926 1.59 0.111 -.0171565 .167051
_cons | -.9914935 .2967274 -3.34 0.001 -1.573068 -.4099185
-----------------+----------------------------------------------------------------
B |
UVAtevola | -1.125263 .5444485 -2.07 0.039 -2.192362 -.0581633
NUTILIZZODIVOLTE | -.0667538 .0966905 -0.69 0.490 -.2562637 .1227562
EXPOSTFIORITURA | -.769514 .2891922 -2.66 0.008 -1.33632 -.2027077
DIMENSIONE | -.0293445 .022586 -1.30 0.194 -.0736122 .0149232
DOSI | -.0451004 .1109894 -0.41 0.684 -.2626356 .1724349
_cons | -1.361353 .3900545 -3.49 0.000 -2.125846 -.5968602
-----------------+----------------------------------------------------------------
C |
UVAtevola | -1.232848 1.075072 -1.15 0.251 -3.33995 .8742545
NUTILIZZODIVOLTE | -.1639186 .2256885 -0.73 0.468 -.6062599 .2784227
EXPOSTFIORITURA | -.154228 .5543342 -0.28 0.781 -1.240703 .9322469
DIMENSIONE | -.0993675 .0590232 -1.68 0.092 -.2150508 .0163159
DOSI | .0816812 .1273864 0.64 0.521 -.1679916 .3313541
_cons | -2.727106 .7170044 -3.80 0.000 -4.132409 -1.321803
-----------------+----------------------------------------------------------------
D |
UVAtevola | -14.83818 1290.627 -0.01 0.991 -2544.421 2514.745
NUTILIZZODIVOLTE | -.3792106 .4314916 -0.88 0.379 -1.224919 .4664973
EXPOSTFIORITURA | -.4976473 .8798813 -0.57 0.572 -2.222183 1.226888
DIMENSIONE | -.0976071 .0905061 -1.08 0.281 -.2749958 .0797817
DOSI | -.2036094 .4729157 -0.43 0.667 -1.130507 .7232883
_cons | -2.242187 1.425189 -1.57 0.116 -5.035506 .5511316
-----------------+----------------------------------------------------------------
E |
UVAtevola | .7193533 .2948825 2.44 0.015 .1413942 1.297312
NUTILIZZODIVOLTE | -.1058946 .0921645 -1.15 0.251 -.2865337 .0747446
EXPOSTFIORITURA | -.4057074 .2529228 -1.60 0.109 -.901427 .0900122
DIMENSIONE | -.0641196 .025192 -2.55 0.011 -.113495 -.0147442
DOSI | .0965401 .0441483 2.19 0.029 .010011 .1830692
_cons | -1.615742 .3101875 -5.21 0.000 -2.223698 -1.007786
----------------------------------------------------------------------------------
predict prob*
egen pred_max = rowmax(prob*)
(23 missing values generated)
.
.
.
. g pred_choice = .
(952 missing values generated)
.
. forvalues i = 1/6 {
2.
. replace pred_choice = `i' if (pred_max == prob`i')
3.
. }
(951 real changes made)
(23 real changes made)
(23 real changes made)
(23 real changes made)
(23 real changes made)
(24 real changes made)
.
.
.
. local produttore_lab: value label n_produttore
.
. label values pred_choice `produttore_lab'
.
. tab pred_choice n_produttore
pred_choic | n_produttore
e | ALTRO A B C D E | Total
-----------+------------------------------------------------------------------+----------
ALTRO | 666 95 67 14 6 80 | 928
E | 21 1 0 0 0 2 | 24
-----------+------------------------------------------------------------------+----------
Total | 687 96 67 14 6 82 | 952
where: n_produttore = ALTRO A B C D E
UVAtevola = dummy 0 or 1
NUTILIZZODIVOLTE = 1...15
EXPOSTFIORITURA = dummy 0 or 1
DIMENSIONE = 1....50 kg
DOSI: 1....20
Your code seems to be taken from an answer by Tzygmund McFarlane in Statalist.org. I reproduce the complete, working example below:
webuse sysdsn1, clear
mlogit insure age male nonwhite i.site
predict prob*
egen pred_max = rowmax(prob*)
g pred_choice = .
forv i=1/3 {
replace pred_choice = `i' if (pred_max == prob`i')
}
local insure_lab: value label insure
label values pred_choice `insure_lab'
tab pred_choice insure
Like I said, it works. So unless you give more information on the problem you have at hand, people may not be able to help you. There may be an issue with your model specification, with your data structure, a combination, or something else. Your statement
... but it doesn't work.
gives nothing for people to work with. Good practice is to post exact input/output, including errors. Please read the complete Asking section in https://stackoverflow.com/help.
I want to do something very easy, but it doesnt work!
I need to see the predictions (and errors) of a GARCH model. The Main Variable es "dowclose", and my idea is look if the GARCH model has a good fitting on this variable.
Im using this easy code, but the prediction are just 0's
webuse dow1.dta
arch dowclose, noconstant arch(1) garch(1)
predict dow_hat, y
ARCH Results:
ARCH family regression
Sample: 1 - 9341 Number of obs = 9341
Distribution: Gaussian Wald chi2(.) = .
Log likelihood = -76191.43 Prob > chi2 = .
------------------------------------------------------------------------------
| OPG
dowclose | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
arch |
L1. | 1.00144 6.418855 0.16 0.876 -11.57929 13.58217
|
garch |
L1. | -.001033 6.264372 -0.00 1.000 -12.27898 12.27691
|
_cons | 56.60589 620784.7 0.00 1.000 -1216659 1216772
------------------------------------------------------------------------------
This is to be expected: you have no covariates and no intercept, so there's nothing to predict.
Here's a simple OLS regression that makes the problem apparent:
. sysuse auto
(1978 Automobile Data)
. reg price, nocons
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 0, 74) = 0.00
Model | 0 0 . Prob > F = .
Residual | 3.4478e+09 74 46592355.7 R-squared = 0.0000
-------------+------------------------------ Adj R-squared = 0.0000
Total | 3.4478e+09 74 46592355.7 Root MSE = 6825.9
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
------------------------------------------------------------------------------
. predict phat
(option xb assumed; fitted values)
. sum phat
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
phat | 74 0 0 0 0
I was trying to examine whether Stata is taking the initial values in the model NormalReg (sample model) that I used from previous reg. However, it seems to me by looking at iteration 0 that it is not taking into account my initial values. Any help to fix this issue will be highly appreciated.
set seed 123
set obs 1000
gen x = runiform()*2
gen u = rnormal()*5
gen y = 2 + 2*x + u
reg y x
Source | SS df MS Number of obs = 1000
-------------+------------------------------ F( 1, 998) = 52.93
Model | 1335.32339 1 1335.32339 Prob > F = 0.0000
Residual | 25177.012 998 25.227467 R-squared = 0.0504
-------------+------------------------------ Adj R-squared = 0.0494
Total | 26512.3354 999 26.5388743 Root MSE = 5.0227
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1.99348 .2740031 7.28 0.000 1.455792 2.531168
_cons | 2.036442 .3155685 6.45 0.000 1.417188 2.655695
------------------------------------------------------------------------------
cap program drop NormalReg
program define NormalReg
args lnlk xb sigma2
qui replace `lnlk' = -ln(sqrt(`sigma2'*2*_pi)) - ($ML_y-`xb')^2/(2*`sigma2')
end
ml model lf NormalReg (reg: y = x) (sigma2:)
ml init reg:x = `=_b[x]'
ml init reg:_cons = `=_b[_cons]'
ml max,iter(1) trace
ml max,iter(1) trace
initial: log likelihood = -<inf> (could not be evaluated)
searching for feasible values .+
feasible: log likelihood = -28110.03
rescaling entire vector .+.
rescale: log likelihood = -14623.922
rescaling equations ...+++++.
rescaling equations ....
rescale eq: log likelihood = -3080.0872
------------------------------------------------------------------------------
Iteration 0:
Parameter vector:
reg: reg: sigma2:
x _cons _cons
r1 3.98696 1 32
log likelihood = -3080.0872
------------------------------------------------------------------------------
Iteration 1:
Parameter vector:
reg: reg: sigma2:
x _cons _cons
r1 2.498536 1.773872 24.10726
log likelihood = -3035.3553
------------------------------------------------------------------------------
convergence not achieved
Number of obs = 1000
Wald chi2(1) = 86.45
Log likelihood = -3035.3553 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
reg |
x | 2.498536 .2687209 9.30 0.000 1.971853 3.02522
_cons | 1.773872 .3086854 5.75 0.000 1.16886 2.378885
-------------+----------------------------------------------------------------
sigma2 |
_cons | 24.10726 1.033172 23.33 0.000 22.08228 26.13224
------------------------------------------------------------------------------
Warning: convergence not achieved
Apparently, if you want ml to evaluate the likelihood at the specified initial values at iteration 0, you must also supply a value for sigma2;. Change the last section of your code to:
matrix rmse = e(rmse)
scalar mse = rmse[1,1]^2
ml model lf NormalReg (reg: y = x) (sigma2:)
ml init reg:x = `=_b[x]'
ml init reg:_cons = `=_b[_cons]'
ml init sigma2:_cons = `=scalar(mse)'
ml maximize, trace
Note that the ML estimate of sigma^2 will differ from the root mean square error because ML doesn't know about degrees of freedom. With n = 1,000 sigma2 = (998/1000)*rmse.
Stuff like this is very sensitive. You are trusting that the results from the previous regression are still visible at the exact point the program is defined. That could be undermined directly or indirectly by several different operations. It's best to treat arguments you want to use as arguments to be fed to your program using the program's options at the point it runs.