Stata - Cohort Study - Display Crude Risk Ratio (like r(rr_crude)) - stata

I'm starting to use Stata 14. I'm trying to do some basic risk ratio analysis, but I don't know how to extract single results. Given the following code:
clear all
webuse ugdp
cs case exposed [fw=pop], by(age)
we get an output with four risk ratios, for both age categories, a crude one and a M-H one. With
dis r(rr)
I get the last (?) ratio, but is it possible to specify it? Like
dis r(rr_crude)
dis r(rr_mh)
or something like that? I haven't found a solution. Or is it possible to do something like saving the output in a matrix and indicating it with row and column indices?
I haven't found a solution in the documentation.

Edit:
Just create scalars, which persist in memory
clear all
webuse ugdp
cs case exposed [fw=pop], by(age)
scalar rr_mh = r(rr)
Then use glm:
glm case exposed [fw = pop], family(binomial) link(log)
scalar rr_crude = exp(_b[exposed])
or
cs case exposed [fw = pop]
scalar rr_crude = r(rr)
In either case:
di rr_crude
di rr_mh

Related

How to get y axis range in Stata

Suppose I am using some twoway graph command in Stata. Without any action on my part Stata will choose some reasonable values for the ranges of both y and x axes, based both upon the minimum and maximum y and x values in my data, but also upon some algorithm that decides when it would be prettier for the range to extend instead to a number like '0' instead of '0.0139'. Wonderful! Great.
Now suppose that after (or while) I draw my graph, I want to slap some very important text onto it, and I want to be choosy about precisely where the text appears. Having the minimum and maximum values of the displayed axes would be useful: how can I get these min and max numbers? (Either before or while calling the graph command.)
NB: I am not asking how to set the y or x axis ranges.
Since this issue has been a bit of a headache for me for quite some time and I believe there is no good solution out there yet I wanted to write up two ways in which I was able to solve a similar problem to the one described in the post. Specifically, I was able to solve the issue of gray shading for part of the graph using these.
Define a global macro in the code generating the axis labels This is the less elegant way to do it but it works well. Locate the tickset_g.class file in your ado path. The graph twoway command uses this to draw the axes of any graph. There, I defined a global macro in the draw program that takes the value of the omin and omax locals after they have been set to the minimum between the axis range and data range (the command that does this is local omin = min(.scale.min,omin) and analogously for the max), since the latter sometimes exceeds the former. You could also define the global further up in that code block to only get the axis extent. You can then access the axis range using the globals after the graph command (and use something like addplot to add to the previously drawn graph). Two caveats for this approach: using global macros is, as far as I understand, bad practice and can be dangerous. I used names I was sure wouldn't be included in any program with the prefix userwritten. Also, you may not have administrator privileges that allow you to alter this file based on your organization's decisions. However, it is the simpler way. If you prefer a more elegant approach along the lines of what Nick Cox suggested, then you can:
Use the undocumented gdi natscale command to define your own axis labels The gdi commands are the internal commands that are used to generate what you see as graph output (cf. https://www.stata.com/meeting/dcconf09/dc09_radyakin.pdf). The tickset_g.class uses the gdi natscale command to generate the nice numbers of the axes. Basic documentation is available with help _natscale, basically you enter the minimum and maximum, e.g. from a summarize return, and a suggested number of steps and the command returns a min, max, and delta to be used in the x|ylabel option (several possible ways, all rather straightforward once you have those numbers so I won't spell them out for brevity). You'd have to adjust this approach in case you use some scale transformation.
Hope this helps!
I like Nick's suggestion, but if you're really determined, it seems that you can find these values by inspecting the output after you set trace on. Here's some inefficient code that seems to do exactly what you want. Three notes:
when I import the log file I get this message:
Note: Unmatched quote while processing row XXXX; this can be due to a formatting problem in the file or because a quoted data element spans multiple lines. You should carefully inspect your data after importing. Consider using option bindquote(strict) if quoted data spans multiple lines or option bindquote(nobind) if quotes are not used for binding data.
Sometimes the data fall outside of the min and max range values that are chosen for the graph's axis labels (but you can easily test for this).
The log linesize is actually important to my code below because the key values must fall on the same line as the strings that I use to identify the helpful rows.
* start a log (critical step for my solution)
cap log close _all
set linesize 255
log using "log", replace text
* make up some data:
clear
set obs 3
gen xvar = rnormal(0,10)
gen yvar = rnormal(0,.01)
* turn trace on, run the -twoway- call, and then turn trace off
set trace on
twoway scatter yvar xvar
set trace off
cap log close _all
* now read the log file in and find the desired info
import delimited "log.log", clear
egen my_string = concat(v*)
keep if regexm(my_string,"forvalues yf") | regexm(my_string,"forvalues xf")
drop if regexm(my_string,"delta")
split my_string, parse("=") gen(new)
gen axis = "vertical" if regexm(my_string,"yf")
replace axis = "horizontal" if regexm(my_string,"xf")
keep axis new*
duplicates drop
loc my_regex = "(.*[0-9]+)\((.*[0-9]+)\)(.*[0-9]+)"
gen min = regexs(1) if regexm(new3,"`my_regex'")
gen delta = regexs(2) if regexm(new3,"`my_regex'")
gen max_temp= regexs(3) if regexm(new3,"`my_regex'")
destring min max delta , replace
gen max = min + delta* int((max_temp-min)/delta)
*here is the info you want:
list axis min delta max

Value of coefficient (Beta1) at different values of other covariate (X2), hopefully graphed

(cross-posted at http://www.statalist.org/forums/forum/general-stata-discussion/general/1370770-margins-plot-of-treatment-effect-rather-than-y-for-values-of-a-covariate)
I'm running a multivariate regression (outcome variable is continuous, happens to be GPA). The covariate of interest is a dummy variable for treatment status; another of the covariates is a pre-score. We want to look at how the treatment effect differs at various values of pre-score. The structure of the model is not complicated:
regress GPA treatment pre_score X3 X4 X5...
What I want is a graph that shows what the treatment effect is (values of Beta1) at various values of pre-score (X2). It's straightforward to get a graph with values of the OUTCOME at various values of X2:
margins, at(pre_score= (1(0.25)5)) post
marginsplot
I have consulted an array of resources and tried alternatives using marginscontplot, coefplot with recast, the dy/dx option, and so forth. I remain unsuccessful. But this seems like something that there must be a way to do; wanting to know if a treatment effect varies for values of a control (say, income) must be common.
Can anyone direct me to the right command, or options for Margins, to output values of Beta1 (coefficient on treatment dummy), rather than of Y (GPA), at values of the pre_score?
Question was resolved at Statalist. Turns out that Margins alone can't do what I was trying to; the model needs to be run with an interaction term. Then it's simple.

Calculating difference in survival functions at time t in Stata

I am estimating a Cox model in Stata using stcox. I estimate the model at
stcox treat x1 x2 x3
I can then use the stcurve command to plot the survival function for treatment and control groups, with the x1, x2 and x3 variables set at their means by doing
stcurve, survival at1(treat=0) at2(treat=1)
However, I would also like to calculate the difference in the survival function at specific, discrete points in time. For instance, I'd like to know the probability of survival to 1 year for treated and control groups, with x's set to their means. I think I might be able to do this with the sts generate command and its adjustfor option, but I am a little confused about whether I should use by or strata when using sts generate and I'm also not sure how to hold the control variables at their means rather than at 0. The Stata help pages suggest I can center the values of the controls by subtracting x1's mean from x1, but I am not sure if I am reading this correctly.
I wrote an answer to a similar question which might be useful here, but do not have enough reputation to answer this as a comment, so here goes:
You can do stcox treat x1 x2 x3 and stcurve, survival at1(treat=0) at2(treat=1) outfile(stcurve.dta). In the file stcurve.dta you will have the data that produced the graph, which could be used for you purposes of looking up specific timepoints (not terribly familiar with stcox and stcurve but think it should work).
regarding sts generate: I should use by(treat) if you only want to see the difference between treatment groups. However, using by vs strata is more of a statistical question which I am not qualified to answer, You are reading the help pages correctly, but cannot say if this approach makes statistical sense and do not know if sts generate allows for several variables in adjustfor. My suggestion is just try it and see if it works. Make your own mean variable and use that to make a new x to use in sts generate if the data you need is not in stcurve.dta. Something like this:
local varlist = x1 x2 x3
foreach x in `varlist' {
egen mean_`x'=mean(`x')
gen adj_`x'=`x' - mean_`x'
}
*
sts generate survival=s, by(treat) adjustfor(adj_x1)
then you will have your data in the variable called survival. another possibility is using sts list by(treat) adjustfor(adj_x1) compare instead of sts generate

Dummy and Heckman

I'm using Heckman Selection Model which are two consist of 2 equation. i'm using Probit as a selection equation and multiple regression as a result equation.
how can put in dummy variables in those equation ?
Do we have to make the variables into logaritmic form ?
How can I make logaritmic variables with stata ?
Thank you..
Here's an example of how you might do what you ask. The example looks at the effect of being a union member on log wages:
webuse union3
gen log_wage = ln(wage)
etregress log_wage age grade i.smsa i.black tenure, treat(union = i.south i.black tenure) twostep
etregress estimates an average treatment effect of an endogenous binary-treatment variable. In plain English, that means the "first-stage" is a probit. Estimation is by either full maximum likelihood or a two-step consistent estimator, as above.
The dummies are created on the fly by putting an i. in front of the covariates. This is called factor variable notation, and it also makes interactions a breeze. You can also do tab race, gen(d_) to create d_1, d_2, and d_3 (3 race dummies, one of which you can drop).

What could be causing errors when estimating coefficients with xtgls in stata for unbalanced panel data over 4 years?

I am using unbalanced panel data for 4 years. In trying to decide which time variant model (xtgls, xtreg, re, or xtgee) is most appropriate for my analysis, I am trying to estimate coefficients for xtgls under both the homoskedasticity and hetero assumptions. When I run this model with the hetero option, I get very high z-scores (>30) and a significant effect on a term that is insig in all other models.
Also, when I attempt to run lrtest comparing the hetero and homoskedastic models I get an error that reads “hetero does not contain scalar e(ll)”. I read that one way to address this is to add option igls, which supposedly gives the same coeff as the model without the igls option. However, my model will not converge with the igls option. I thought these odd results for the hetero xtgls model could be because some time invariant variable was miscoded (i.e. person coded as female = 1 for one year and female = 0 for another year). I checked my 2 ivs and this is not the case. I can’t figure out what else could be causing this.
So my specific questions are:
Why would I be getting this error - “hetero does not contain scalar e(ll)” - for the lrtest comparing the homo and hetero models? What does it mean?
Below is my stata code:
xtgls continuous_DV IV1 IV2 IV1xIV2, i(person_id) panels(hetero)
estimates store hetero
xtgls continuous_DV IV1 IV2 IV1xIV2, i(person_id)
local df=e(N_g)-1
disp `df'
lrtest hetero ., df(`df')
I ran xttest3 which indicated errors are hetero.
Is igls an appropriate work around for the error I am getting following the lrtest (“hetero does not contain scalar e(ll)”)? If so, what could be causing this model with the igls option not to converge? Below is the code:
xtgls continuous_DV IV1 IV2 IV1xIV2, i(person_id) panels(hetero) igls
In Stata,
the xtgls command does not estimate a log likelihood because it is not maximum likelihood estimation. So you cannot get a log-likelihood test out of that model. To get a log-likelihood, you need to use the setup you had above but instead use the igls option. That is an appropriate workaround and is entirely appropriate; I don't think you need to start by slashing your dataset.
Alternatively, you can use a different estimator. GLS is appropriate when you have few, wide panels. If you have really short panels (only a couple years per individual), you should probably use something like xtreg.
http://www.stata.com/support/faqs/statistics/xtgls-versus-regress/