How to make Stata margins work for user-written model - stata

I wonder, what requirements must a user-written estimation and/or prediction program satisfy in order for standard Stata margins command to be able to work with it?
I have created a toy "estimation" program with a prediction module, but when I run margins, dydx(x) after myreg y x, Stata throws r(103) ("too many specified") and produces nothing. Can anyone modify my code so that margins could work with it?
Yes, I know that if e(predict) is not returned, margins assume linear prediction and work OK, but eventually I need to write a nonlinear model and estimate marginal effects for it.
program mypred
version 13
syntax name [if] [in]
marksample touse
local newVar = "`1'"
mat b = e(b)
local columnNames: colfullnames b
tokenize `columnNames'
gen `newVar' = b[1,1] + b[1,2] * `2'
end
program myreg, eclass
version 13
syntax varlist(min=2 max=2) [if] [in]
marksample touse
tempname b V
matrix input b = (1.1, 2.3)
matrix input V = (9, 1 \ 1, 4)
matrix colnames b = _cons `2'
matrix colnames V = _cons `2'
matrix rownames V = _cons `2'
ereturn post b V, esample(`touse')
ereturn local predict "mypred"
ereturn local cmd "myreg"
ereturn display
end

I don't have a complete answer. If there is such a one-stop location within the Stata documentation that answers your question, I'm not aware of it.
The recommendation is to read, at least, the whole entry: [R] margins. Here is a list of conditions that should be considered:
margins cannot be used after estimation commands that do not produce
full variance matrices, such as exlogistic and expoisson (see [R]
exlogistic and [R] expoisson).
margins is all about covariates and
cannot be used after estimation commands that do not post the
covariates, which eliminates gmm (see [R] gmm).
margins cannot be used
after estimation commands that have an odd data organization, and that
excludes asclogit, asmprobit, asroprobit, and nlogit (see [R]
asclogit, [R] asmprobit, [R] asroprobit, and [R] nlogit).
From another subsection:
... as of Stata 11, you are supposed to set in e(marginsok) the list
of options allowed with predict that are okay to use with margins.
Consider also inspecting (help viewsource) user-written commands from experienced user/programmers who allow for this in their commands. Maarten Buis is one of them. (You can run search maarten buis, all to search within Stata.)

Related

Multinomial Logit Fixed Effects: Stata and R

I am trying to run a multinomial logit with year fixed effects in mlogit in Stata (panel data: year-country), but I do not get standard errors for some of the models. When I run the same model using multinom in R I get both coefficients and standard errors.
I do not use Stata frequently, so I may be missing something or I may be running different models in Stata and R and should not be comparing them in the first place. What may be happening?
So a few details about the simple version of the model of interest:
I created a data example to show what the problem is
Dependent variable (will call it DV1) with 3 categories of -1, 0, 1 (unordered and 0 as reference)
Independent variables: 2 continuous variables, 3 binary variables, interaction of 2 of the 3 binary variables
Years: 1995-2003
Number of observations in the model: 900
In R I run the code and get coefficients and standard errors as below.
R version of code creating data and running the model:
## Fabricate example data
library(fabricatr)
data <- fabricate(
N = 900,
id = rep(1:900, 1),
IV1 = draw_binary(0.5, N = N),
IV2 = draw_binary(0.5, N = N),
IV3 = draw_binary(0.5, N = N),
IV4 = draw_normal_icc(mean = 3, N = N, clusters = id, ICC = 0.99),
IV5 = draw_normal_icc(mean = 6, N = N, clusters = id, ICC = 0.99))
library(AlgDesign)
DV = gen.factorial(c(3), 1, center=TRUE, varNames=c("DV"))
year = gen.factorial(c(9), 1, center=TRUE, varNames=c("year"))
DV = do.call("rbind", replicate(300, DV, simplify = FALSE))
year = do.call("rbind", replicate(100, year, simplify = FALSE))
year[year==-4]= 1995
year[year==-3]= 1996
year[year==-2]= 1997
year[year==-1]= 1998
year[year==0]= 1999
year[year==1]= 2000
year[year==2]= 2001
year[year==3]= 2002
year[year==4]= 2003
data1=cbind(data, DV, year)
data1$DV1 = relevel(factor(data1$DV), ref = "0")
## Save data as csv file (to use in Stata)
library(foreign)
write.csv(data1, "datafile.csv", row.names=FALSE)
## Run multinom
library(nnet)
model1 <- multinom(DV1 ~ IV1 + IV2 + IV3 + IV4 + IV5 + IV1*IV2 + as.factor(year), data = data1)
Results from R
When I run the model using mlogit (without fixed effects) in Stata I get both coefficients and standard errors.
So I tried including year fixed effects in the model using Stata three different ways and none worked:
femlogit
factor-variable and time-series operators not allowed
depvar and indepvars may not contain factor variables or time-series operators
mlogit
fe option: fe not allowed
used i.year: omits certain variables and/or does not give me standard errors and only shows coefficients (example in code below)
* Read file
import delimited using "datafile.csv", clear case(preserve)
* Run regression
mlogit DV1 IV1 IV2 IV3 IV4 IV5 IV1##IV2 i.year, base(0) iterate(1000)
Stata results
xtmlogit
error - does not run
error message: total number of permutations is 2,389,461,218; this many permutations require a considerable amount of memory and can result in long run times; use option force to proceed anyway, or consider using option rsample()
Fixed effects and non-linear models (such as logits) are an awkward combination. In a linear model you can simply add dummies/demean to get rid of a group-specific intercept, but in a non-linear model none of that works. I mean you could do it technically (which I think is what the R code is doing) but conceptually it is very unclear what that actually does.
Econometricians have spent a lot of time working on this, which has led to some work-arounds, usually referred to as conditional logit. IIRC this is what's implemented in femlogit. I think the mistake in your code is that you tried to include the fixed effects through a dummy specification (i.year). Instead, you should xtset your data and then run femlogit without the dummies.
xtset year
femlogit DV1 IV1 IV2 IV3 IV4 IV5 IV1##IV2
Note that these conditional logit models can be very slow. Personally, I'm more a fan of running two one-vs-all linear regressions (1=1 and 0/-1 set to zero, then -1=1 and 0/1 set to zero). However, opinions are divided (Wooldridge appears to be a fan too, many others very much not so).

Graph evolution of quantile non-linear coefficient: can it be done with grqreg? Other options?

I have the following model:
Y_{it} = alpha_i + B1*weight_{it} + B2*Dummy_Foreign_{i} + B3*(weight*Dummy_Foreign)_ {it} + e_{it}
and I am interested on the effect on Y of weight for foreign cars and to graph the evolution of the relevant coefficient across quantiles, with the respective standard errors. That is, I need to see the evolution of the coefficients (B1+ B3). I know this is a non-linear effect, and would require some sort of delta method to obtain the variance-covariance matrix to obtain the standard error of (B1+B3).
Before I delve into writing a program that attempts to do this, I thought I would try and ask if there is a way of doing it with grqreg. If this is not possible with grqreg, would someone please guide me into how they would start writing a code that computes the proper standard errors, and graphs the quantile coefficient.
For a cross section example of what I am trying to do, please see code below.
I use grqred to generate the evolution of the separate coefficients (but I need the joint one)-- One graph for the evolution of (B1+B3) with it's respective standard errors.
Thanks.
(I am using Stata 14.1 on Windows 10):
clear
sysuse auto
set scheme s1color
gen gptm = 1000/mpg
label var gptm "gallons / 1000 miles"
gen weight_foreign= weight*foreign
label var weight_foreign "Interaction weight and foreign car"
qreg gptm weight foreign weight_foreign , q(.5)
grqreg weight weight_foreign , ci ols olsci reps(40)
*** Question 1: How to constuct the plot of the coefficient of interest?
Your second question is off-topic here since it is statistical. Try the CV SE site or Statalist.
Here's how you might do (1) in a cross section, using margins and marginsplot:
clear
set more off
sysuse auto
set scheme s1color
gen gptm = 1000/mpg
label var gptm "gallons / 1000 miles"
sqreg gptm c.weight##i.foreign, q(10 25 50 75 95) reps(500) coefl
margins, dydx(weight) predict(outcome(q10)) predict(outcome(q25)) predict(outcome(q50)) predict(outcome(q75)) predict(outcome(q95)) at(foreign=(0 1))
marginsplot, xdimension(_predict) xtitle("Quantile") ///
legend(label(1 "Domestic") label(2 "Foreign")) ///
xlabel(none) xlabel(1 "Q10" 2 "Q25" 3 "Q50" 4 "Q75" 5 "Q95", add) ///
title("Marginal Effect of Weight By Origin") ///
ytitle("GPTM")
This produces a graph like this:
I didn't recast the CI here since it would look cluttered, but that would make it look more like your graph. Just add recastci(rarea) to the options.
Unfortunately, none of the panel quantile regression commands play nice with factor variables and margins. But we can hack something together. First, you can calculate the sums of coefficients with nlcom (instead of more natural lincom, which the lacks the post option), store them, and use Ben Jann's coefplot to graph them. Here's a toy example to give you the main idea where we will look at the effect of tenure for union members:
set more off
estimates clear
webuse nlswork, clear
gen tXu = tenure*union
local quantiles 1 5 10 25 50 75 90 95 99 // K quantiles that you care about
local models "" // names of K quantile models for coefplot to graph
local xlabel "" // for x-axis labels
local j=1 // counter for quantiles
foreach q of numlist `quantiles' {
qregpd ln_wage tenure union tXu, id(idcode) fix(year) quantile(`q')
nlcom (me_tu:_b[tenure]+_b[tXu]), post
estimates store me_tu`q'
local models `"`models' me_tu`q' || "'
local xlabel `"`xlabel' `j++' "Q{sub:`q'}""'
}
di "`models'
di `"`xlabel'"'
coefplot `models' ///
, vertical bycoefs rescale(100) ///
xlab(none) xlabel(`xlabel', add) ///
title("Marginal Effect of Tenure for Union Members On Each Conditional Quantile Q{sub:{&tau}}", size(medsmall)) ///
ytitle("Wage Change in Percent" "") yline(0) ciopts(recast(rcap))
This makes a dromedary curve, which suggests that the effect of tenure is larger in the middle of the wage distribution than at the tails:

Post e(b) vector from a custom program in Stata

I wrote a program that computes a weighted regression and now I want my estimation results to be stored as an e(b) vector so that the bootstrap command can easily access the results, but I keep getting an error. My program looks like:
capture program drop mytest
program mytest, eclass
version 13
syntax varlist [if]
marksample touse
// mata subroutine creates matrix `b', such as mata: bla("`varlist'", "`touse'")
tempname b
matrix `b' = (1\2\3)
ereturn post `b'
end
mytest town_id
ereturn list
But I keep getting a conformability error r(503); upon running the script. When I instead post an ordinary matrix such as ereturn matrix x = b, everything works fine but I would like to have my coefficients stored 'properly' in an e(b) vector.
I checked Stata's documentation but was unable to find out why this is not working. Their advice is to code
tempname b V
// produce coefficient vector `b' and variance–covariance matrix `V'
ereturn post `b' `V', obs(`nobs') depname(`depn') esample(`touse')
The options of ereturn post are all optional. Could anyone tell me what I am missing here? Thanks!
Use a "row" vector instead of a "column" vector. If you check, for example, the stored results of regress, you'll see that this is what is expected.
capture program drop mytest
program mytest, eclass
version 13
syntax varlist [if]
marksample touse
// mata subroutine creates matrix `b', such as mata: bla("`varlist'", "`touse'")
tempname b
matrix `b' = (1,2,3)
ereturn post `b'
end
*----- tests -----
clear
sysuse auto
// mytest test
mytest mpg weight
ereturn list
matrix list e(b)
// regress example
regress price weight mpg
ereturn list
matrix list e(b)

Stata output files in surveys

I have some survey data which I'm using Stata to analyze. I want to compute means of one variable by group and save those means to a Stata file. My code looks like this:
svyset [iw=wtsupp], sdrweight(repwtp1-repwtp160) vce(sdr)
svy: mean x
I tried
svy: by grp: mean x
but that did not work. I could save each mean to a separate file by simply saying
svy: mean x if grp==1
but that's inefficient. Is there a better way?
Saving results to a file like one can use SAS ODS to capture results is also a need. I am not talking about the log here. I need the means and the associated group. I'm thinking
estimates save [path],replace
but I'm not sure if that will give me a Stata file or the group if I can figure out how to use by processing.
Here's a simpler approach that creates a data set of the displayed estimation results: estimated means, standard errors, confidence limits, z statistics, and p-values. svy: mean is called with the over() option, which does away with the need for the foreach loop and computes standard errors appropriate for subpopulation analysis. The estimation results are contained in the returned matrix r(table), which is converted by the svmat command to a Stata data set. While svmat maintains column names, it does not preserve row (group) names, so it is necessary to merge these in to the created data set.
set more off
use http://www.stata-press.com/data/r13/ss07ptx, clear
svyset _n [pw= pwgtp], sdrweight(pwgtp*) vce(sdr)
************************************************ *
* Set name of grouping variable in double quotes *
* in the next line. *
* ************************************************
local gpname "sex"
tempvar gp
egen `gp' = group(`gpname')
preserve
tempfile t1
bys `gp': keep if _n==1
keep `gp' `gpname'
save `t1'
restore
svy: mean agep , over(`gp')
matrix a = r(table)'
clear
qui svmat double a, names(col)
gen `gp'=_n
merge 1:1 `gp' using `t1'
keep `gpname' b se z pvalue ll ul
order `gpname'
save results, replace
list
Edited 10/28
This version contains legibility improvements and the outcome variable and saved datasets are specified in a local macro. Therefore the analyst need not touch the foreach block. Easier to write and read matrix subscript expressions are used instead of the el matrix function: thus m[1,1] instead of el("m",1,1).
sysuse auto, clear
svyset _n
************************************************ *
* Set names of grouping variable and results data *
* set in double quotes in the next line. *
* ************************************************
local yvar mpg // variable for mean
local gpname "foreign"
local d_results "results"
tempvar gp
gen `gp' = `gpname'
tempname memhold
postfile `memhold' ///
`gpname' n mean se sd using `d_results', replace
levelsof `gp', local(lg)
foreach x of local lg{
svy, subpop(if `gp'==`x'): mean `yvar'
matrix m = e(b)
matrix v = e(V)
matrix a = e(V_srssub)
matrix b = e(_N_subp)
matrix c = e(_N)
scalar gx = `x'
scalar mean = m[1,1]
scalar sem = sqrt(v[1,1])
scalar sd = sqrt(b[1,1]*a[1,1])
scalar n = c[1,1]
post `memhold' (gx) (n) (mean) (sem) (sd)
}
postclose `memhold'
use results, clear
list

generating bootstrap standard errors for large number of scalars

Suppose I have four scalars: call them dea_1 dea_2 dea_3 dea_4. They are output from a program samprogram (not shown here).
Now I use the bootstrap command in Stata with these scalars to get bootstrapped standard errors.
set seed 123
bootstrap dea_1=r(dea_1)dea_2=r(dea_2)dea_3=r(dea_3)dea_4=r(dea_4), reps(100): samprogram
This is fine but in my original program, I calculate 30 scalars, dea_1 dea_2 ... dea_30. Now I want to avoid writing each of these 30 scalars in the bootstrap command and for this purpose I wrote a loop as follows:
set seed 234
forvalues i in 1(1)30{
local k dea_`i'
bootstrap dea_`k'=r(dea_`k'), reps(100): samprogram
}
This works, but gives the output for each scalar one at a time. However, I am looking for code which avoids writing all scalars in the bootstrap command but still gives the output for all at the same time (i.e. like the output from the following command)
set seed 345
bootstrap dea_1=r(dea_1)dea_2=r(dea_2)dea_3=r(dea_3)dea_4=r(dea_4)[omitted]...dea_30=r(dea_30), reps(100): samprogram
Any help in this regard will be highly appreciated.
This yields to building-up the contents of a local macro step-by-step.
set seed 123
forval i = 1/30 {
local call `call' dea_`i'=r(dea_`i')
}
bootstrap `call', reps(100) : samprogram
If need be, blank out the macro beforehand by
local call
More discussion in http://www.stata-journal.com/sjpdf.html?articlenum=pr0005 [free .pdf]
(LATER) Note that contrary to your assertion the code
set seed 234
forvalues i in 1(1)30{
local k dea_`i'
bootstrap dea_`k'=r(dea_`k'), reps(100): samprogram
}
would not work as intended. First time round, for example, bootstrap would be looking for r(dea_dea_1) and would return missing for every sample. The code for calling bootstrap repeatedly could simply be
set seed 234
forvalues i = 1/30 {
bootstrap dea_`i'=r(dea_`i'), reps(100): samprogram
}
but that would be a bad idea when you can do what you want in one call.
An alternative solution would be to make your program eclass and return the results in the matrix e(b). This allows the shortcut bootstrap _b, reps(100): samprogram. Below is an example. The key points here are that the different scalars are stored in the row vector `b', which is returned by the program as the row vector e(b) whith the command:
ereturn post `b', esample(`touse')
A complete example is here:
clear all
program define sim, eclass
syntax varlist(numeric) [if] [in], by(varname numeric)
marksample touse
markout `touse' `by'
local k : word count `varlist'
tempname b m0
matrix `b' = J(1,`k',.)
local i = 1
foreach var of local varlist {
sum `var' if `touse' & `by', meanonly
scalar `m0' = r(mean)
sum `var' if `touse' & !`by', meanonly
matrix `b'[1,`i'] = `m0' - r(mean)
local i = `i' + 1
}
ereturn post `b', esample(`touse')
end
sysuse auto
bootstrap _b, reps(100) : sim price mpg length weight trunk, by(foreign)