GMM program evaluator does not like my temporary variable - stata

As far as I can tell, the two programs in the code below are identical. In the first program, I just assign the parameter to a scalar. In the second program, I store this scalar for each observation in a temporary variable.
Mathematically, that should be the same, yet the second program produces "numerical derivatives are approximate" and "flat or discontinuous region encountered".
Why cannot the derivatives be computed properly in the second approach?
clear
set obs 10000
set seed 42
gen x = runiform() * 10
gen eps = rnormal()
gen y = 2 + .3 * x + eps
capture program drop testScalar
program testScalar
syntax varlist [if], at(name)
scalar b0 = `at'[1,1]
scalar b1 = `at'[1,2]
replace `varlist' = y - b0 - b1* x
end
capture program drop testTempvar
program testTempvar
syntax varlist [if], at(name)
tempvar tmp
scalar b0 = `at'[1,1]
scalar b1 = `at'[1,2]
gen `tmp' = b1
replace `varlist' = y - b0 - `tmp'* x
end
gmm testScalar, nequations(1) nparameters(2) instr(x) winitial(identity) onestep
gmm testTempvar, nequations(1) nparameters(2) instr(x) winitial(identity) onestep
Output:
. gmm testScalar, nequations(1) nparameters(2) instr(x) winitial(identity) onestep
(10,000 real changes made)
Step 1
Iteration 0: GMM criterion Q(b) = 417.93313
Iteration 1: GMM criterion Q(b) = 1.690e-23
Iteration 2: GMM criterion Q(b) = 3.568e-30
note: model is exactly identified
GMM estimation
Number of parameters = 2
Number of moments = 2
Initial weight matrix: Identity Number of obs = 10,000
------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/b1 | 2.022865 .0200156 101.06 0.000 1.983635 2.062095
/b2 | .2981147 .003465 86.04 0.000 .2913235 .3049059
------------------------------------------------------------------------------
Instruments for equation 1: x _cons
. gmm testTempvar, nequations(1) nparameters(2) instr(x) winitial(identity) onestep
(10,000 real changes made)
Step 1
Iteration 0: GMM criterion Q(b) = 417.93313
numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 1: GMM criterion Q(b) = 8.073e-17
numerical derivatives are approximate
flat or discontinuous region encountered
Iteration 2: GMM criterion Q(b) = 8.073e-17 (backed up)
note: model is exactly identified
GMM estimation
Number of parameters = 2
Number of moments = 2
Initial weight matrix: Identity Number of obs = 10,000
------------------------------------------------------------------------------
| Robust
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
/b1 | 2.022865 .0201346 100.47 0.000 1.983402 2.062328
/b2 | .2981147 .0034933 85.34 0.000 .291268 .3049613
------------------------------------------------------------------------------
Instruments for equation 1: x _cons
.

In the program testTempvar you need to generate the temporary variable tmp as type double:
generate double `tmp' = b1
In other words, this is a precision problem.

Related

Find (x,y) subpixel coordinates of a maximum using discrete quadratic interpolation

I have to find the subpixel (x,y) coordinates of the maximum value given a set of discrete points.Im ny case, I run cv::matchTemplate function that slides a model window along an image and returns a score value for each pixel position.The result is an image with the score for each position and the location (x0, y0) with of the maximum value, like these values around the maximum value found:
x_1 x0 x1
y_1 |0.91 | 0.89 | 0.90|
y0 |0.92 | 0.99 | 0.89|
y1 |0.95 | 0.95 | 0.90|
I would like to use a quadratic interpolation to find where are the subpixel point coordinates of the interpolated maximum value, using just the nearest neighbors.
In a 1d case, I use this formula (assuming x0 is the origin):
interpolated_x = (x_1-x1)/(2.*(x_1-2.*x0+x1));
For example:
x_1 x0 x1
|0.92 | 0.99 | 0.89|
you get interpolated_x = -0.08823, that is correctly slightly on the left of x0.
Is there some C++ code for the 2d case?

Marginal effects from estat mfx after asclogit

I am trying to understand how Stata calculates both the probability that an alternative is selected and the marginal effect calculated that the mean when I estat mfx after estimating a McFadden / conditional logit model using asclogit.
For example:
asclogit H t, case(ID) alternatives(AQ) casevars(Medicaid) ///
basealternative(1) vce(cluster Medicaid)
estat mfx, varlist(Medicaid)
My goal is to re-create the results by estimating the same model using the clogit and manually calculating the equivalent marginal effects. I am able to reproduce the conditional logit estimates generated by asclogit using clogit but I get stuck reproducing the post estimation calculations.
I have not been able to re-produce computed probability of each alternative being selected, which from reading the documentation for estat mfx I learned is evaluated at the value that is labeled X in the table output.
Here are the estat probability figures:
In case the picture didn't come out:
. matrix baseline = r(pr_1)\r(pr_2)\r(pr_3)\r(pr_4)\r(pr_5)
. matrix list baseline
baseline[5,1]
c1
r1 .04077232
r2 .15206384
r3 .01232535
r4 .10465885
r5 .69017964
Keep in mind that the variables beginning with Valpha and VMedicaid are case specific variables I created for the clogit command. They are respectively the intercept and an indicator for Medicaid coverage.
Here is what I have:
clogit H t Valpha* i.VMedicaid_2 i.VMedicaid_3 i.VMedicaid_4 i.VMedicaid_5 , ///
group(ID) vce(cluster Medicaid)
* Reproducing probability an alternative selected calculated by estat mfx
// calculate covariate means to plug into probability calculations
local V t Valpha* VMedicaid_* Medicaid
foreach var of varlist `V' {
summarize `var'
scalar `var'_MN = r(mean)
}
// alternative specific ZB
scalar zb = _b[t]*t_MN
// numerators attempt 1
foreach j of numlist 2/5 {
scalar XB`j' = exp(zb + (_b[Valpha_`j']*Valpha_`j'_MN) + ///
(_b[1.VMedicaid_`j']*VMedicaid_`j'_MN))
di "this is `j': " XB`j'
}
// numerators attempt 2, documentation for estat mfx said that probably was
// evaluated with Medicaid= 0.68 which is the Medicaid coverage rate across cases
// rather than the mean of the various VMedicaid_ variables are used to estimate
// clogit. Replaced intercept mean with 1
foreach j of numlist 2/5 {
scalar XB`j' = exp(zb + (_b[Valpha_`j']) + (_b[1.VMedicaid_`j' *Medicaid_MN))
di "this is `j': " XB`j'
}
scalar XB1 =exp(zb)
// denominator
scalar DNM = XB1+ XB2+ XB3+ XB4 + XB5
// Baseline
foreach j of numlist 1/5 {
scalar PRB`j' = XB`j'/DNM
di "The probability of choosing hospital `j' is: " PRB`j'
}
The results I get are the following:
The probability of choosing hospital 1 is: .14799075
The probability of choosing hospital 2 is: .21019437
The probability of choosing hospital 3 is: .09046377
The probability of choosing hospital 4 is: .18383085
The probability of choosing hospital 5 is: .36752026

Fixed effects in Stata

Very new to Stata, so struggling a bit with using fixed effects. The data here is made up, but bear with me. I have a bunch of dummy variables that I am doing regression with. My dependent variable is a dummy that is 1 if a customer bought something and 0 if not. My fixed effects are whether or not there was a yellow sign out front or not (dummy variable again). My independent variable is if the store manager said hi or not (dummy variable).
Basically, I want my output to look like this (with standard errors obviously)
Yellow sign No Yellow sign
Manager said hi estimate estimate
You can use the ## operator in a regression to get a saturated model with fixed effects:
First, input data such that you have a binary outcome (bought), a dependent variable (saidhi), and a fixed effects variable (sign). saidhi should be correlated with your outcome (so there is a portion of saidhi that is uncorrelated with bought and a portion that is), and your FE variable should be correlated with both bought and saidhi (otherwise there is no point having it in your regression if you are only interested in the effect of saidhi).
clear
set obs 100
set seed 45
gen bought = runiform() > 0.5 // Binary y, 50/50 probability
gen saidhi = runiform() + runiform()^2*bought
gen sign = runiform() + runiform()*saidhi + runiform()*bought > 0.66666 // Binary FE, correlated with both x and y
replace saidhi = saidhi > 0.5
Now, run your regression:
* y = x + FE + x*FE + cons
reg bought saidhi##sign, r
exit
Your output should be:
Linear regression Number of obs = 100
F(3, 96) = 13.34
Prob > F = 0.0000
R-squared = 0.1703
Root MSE = .46447
------------------------------------------------------------------------------
| Robust
bought | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.saidhi | .3571429 .2034162 1.76 0.082 -.0466351 .7609209
1.sign | .3869048 .1253409 3.09 0.003 .138105 .6357046
|
saidhi#sign |
1 1 | -.1427489 .2373253 -0.60 0.549 -.6138359 .3283381
|
_cons | .0714286 .0702496 1.02 0.312 -.0680158 .210873
------------------------------------------------------------------------------
1.saidhi is the effect of saidhi when sign == 0. 1.sign is the effect of the sign, alone, i.e. when saidhi == 0. The parts under saidhi#sign describe the interaction between these two variables (i.e. the marginal effect of them both being 1 at the same time... keep in mind the total effect of them both being one includes the previous two terms). Your constant is represents the average value of bought when both are 0 (e.g. this as the same as you would get from sum bought if saidhi == 0 & sign == 0).

How to simulate pairs from a joint distribution

I have two normal distributions X and Y and with a given covariance between them and variances for both X and Y, I want to simulate (say 200 points) of pairs of points from the joint distribution, but I can't seem to find a command/way to do this. I want to eventually plot these points in a scatter plot.
so far, I have
set obs 100
set seed 1
gen y = 64*rnormal(1, 5/64)
gen x = 64*rnromal(1, 5/64)
matrix D = (1, .5 | .5, 1)
drawnorm x, y, cov(D)
but this makes an error saying that x and y already exist.
Also, once I have a sample, how would I plot the drawnorm output as a scatter?
A related approach for generating correlated data is to use the corr2data command:
clear
set obs 100
set seed 1
matrix D = (1, .5 \ .5, 1)
drawnorm x1 y1, cov(D)
corr2data x2 y2, cov(D)
. summarize x*
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
x1 | 100 .0630304 1.036762 -2.808194 2.280756
x2 | 100 1.83e-09 1 -2.332422 2.238905
. summarize y*
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
y1 | 100 -.0767662 .9529448 -2.046532 2.726873
y2 | 100 3.40e-09 1 -2.492884 2.797518
It is important to note that unlike drawnorm, the corr2data approach does not generate data that is a sample from an underlying population.
You can then create a scatter plot as follows:
scatter x1 y1
Or to compare the two approaches in a single graph:
twoway scatter x1 y1 || scatter x2 y2
EDIT:
For specific means and variances you need to specify the mean vector μ and covariance matrix Σ in drawnorm. For example, to draw two random variables that are jointly normally distributed with means of 8 and 12, and variances 5 and 8 respectively, you type:
matrix mu = (8, 12)
scalar cov = 0.4 * sqrt(5 * 8) // assuming a correlation of 0.4
matrix sigma = (5, cov \ cov, 8)
drawnorm double x y, means(mu) cov(sigma)
The mean and cov options of drawnorm are both documented in the help file.
Here is an almost minimal example:
. clear
. set obs 100
number of observations (_N) was 0, now 100
. set seed 1
. matrix D = (1, .5 \ .5, 1)
. drawnorm x y, cov(D)
As the help for drawnorm explains, you must supply new variable names. As x and y already exist, drawnorm threw you out. You also had a superfluous comma that would have triggered a syntax error.
help scatter tells you about scatter plots.

Stata - How to get residuals for original equation using estimates for differenced equation

I have variables y, x1 and x2. I estimate a differenced equation (without the intercept) using reg d.(y x1 x2), nocons. Now I want to get the residuals for the original variables using the estimated coefficients. I can do it by
reg d.(y x1 x2), nocons
matrix b = e(b)
gen resid = y - b[1,1]*x1 - b[1,2]*x2
But would there be an easier way? I need to keep those generated residuals for future use. Here is a complete minimal example.
clear all
set obs 100
gen id = floor((_n-1)/5)+1
by id, sort: gen year = 1990+_n
xtset id year
set seed 1
gen x1 = rnormal()
gen x2 = rnormal()
gen y = rnormal()
*** Data generated ***
reg d.(y x1 x2), nocons
matrix b = e(b)
gen resid = y - b[1,1]*x1 - b[1,2]*x2
I wonder if there is a flexible approach because sometimes I want to completely change variable names for the regression (e.g., reg dy dx1 dx2, nocons not just reg d.(y x1 x2)). I thought perhaps predict might be helpful but I don't know. Would it be possible to avoid typing the variable names explicitly?
predict will not work since it will create residuals on the differenced scale. You want residuals in terms of the original y, which is unusual, so there is no off-the-shelf solution.
I think the easiest path is to do something like this:
reg d.(y x1 x2), nocons coefl
local vars:colnames e(b) // get a list of coefficients
foreach x of local vars {
local xvar = subinstr("`x'","D.","",1) // strip out the D. prefix from the coefficient names
local diff "`diff' - _b[`x']*`xvar'"
}
gen resid = y `diff'
If you have covariates like dx1 and dx2, you can modify the prefix stipper like this:
local xvar = subinstr("`x'","d","",1) // strip out the first d prefix