Calculating difference in survival functions at time t in Stata - stata

I am estimating a Cox model in Stata using stcox. I estimate the model at
stcox treat x1 x2 x3
I can then use the stcurve command to plot the survival function for treatment and control groups, with the x1, x2 and x3 variables set at their means by doing
stcurve, survival at1(treat=0) at2(treat=1)
However, I would also like to calculate the difference in the survival function at specific, discrete points in time. For instance, I'd like to know the probability of survival to 1 year for treated and control groups, with x's set to their means. I think I might be able to do this with the sts generate command and its adjustfor option, but I am a little confused about whether I should use by or strata when using sts generate and I'm also not sure how to hold the control variables at their means rather than at 0. The Stata help pages suggest I can center the values of the controls by subtracting x1's mean from x1, but I am not sure if I am reading this correctly.

I wrote an answer to a similar question which might be useful here, but do not have enough reputation to answer this as a comment, so here goes:
You can do stcox treat x1 x2 x3 and stcurve, survival at1(treat=0) at2(treat=1) outfile(stcurve.dta). In the file stcurve.dta you will have the data that produced the graph, which could be used for you purposes of looking up specific timepoints (not terribly familiar with stcox and stcurve but think it should work).
regarding sts generate: I should use by(treat) if you only want to see the difference between treatment groups. However, using by vs strata is more of a statistical question which I am not qualified to answer, You are reading the help pages correctly, but cannot say if this approach makes statistical sense and do not know if sts generate allows for several variables in adjustfor. My suggestion is just try it and see if it works. Make your own mean variable and use that to make a new x to use in sts generate if the data you need is not in stcurve.dta. Something like this:
local varlist = x1 x2 x3
foreach x in `varlist' {
egen mean_`x'=mean(`x')
gen adj_`x'=`x' - mean_`x'
}
*
sts generate survival=s, by(treat) adjustfor(adj_x1)
then you will have your data in the variable called survival. another possibility is using sts list by(treat) adjustfor(adj_x1) compare instead of sts generate

Related

Pooled OLS with industry specific effects (reghdfe)

I have a panel dataset for multiple firms throughout 8 years and I'm trying to use a pooled OLS with industry-specific effects with the `reghdfe' command to control for a categorical variable (NAICS Industry Code). I typed
reghdfe DV IV control variables i.year, absorb(NAICS Industry Code)
Is this the correct way to use the command? Is it correct to use i.year within the variables or should I add it to the absorbed variables?
In addition I'm using a Fixed Effect Panel Regression and control for clustered standard errors. Do I have to control for clustered standard errors in the reghdfe as well or is it sufficient to just do it within the fixed effect panel regression?
You should include your variable year in the absorb() option to catch the intended use of reghdfe:
reghdfe y x, absorb(naics year)
Alternatively, you can also use reg y x i.naics i.year.
I assume NAICS codes to be numeric; otherwise, you might need to transform the variable to numeric, e.g. using egen num_naics= group(naics).
Note: The R-squared rests on different assumptions and might differ between the two commands.
Note_2: If your question is specifically about coding, everyone is better off when you provide example data. Statistical questions might be better suited for Cross Validated.

Value of coefficient (Beta1) at different values of other covariate (X2), hopefully graphed

(cross-posted at http://www.statalist.org/forums/forum/general-stata-discussion/general/1370770-margins-plot-of-treatment-effect-rather-than-y-for-values-of-a-covariate)
I'm running a multivariate regression (outcome variable is continuous, happens to be GPA). The covariate of interest is a dummy variable for treatment status; another of the covariates is a pre-score. We want to look at how the treatment effect differs at various values of pre-score. The structure of the model is not complicated:
regress GPA treatment pre_score X3 X4 X5...
What I want is a graph that shows what the treatment effect is (values of Beta1) at various values of pre-score (X2). It's straightforward to get a graph with values of the OUTCOME at various values of X2:
margins, at(pre_score= (1(0.25)5)) post
marginsplot
I have consulted an array of resources and tried alternatives using marginscontplot, coefplot with recast, the dy/dx option, and so forth. I remain unsuccessful. But this seems like something that there must be a way to do; wanting to know if a treatment effect varies for values of a control (say, income) must be common.
Can anyone direct me to the right command, or options for Margins, to output values of Beta1 (coefficient on treatment dummy), rather than of Y (GPA), at values of the pre_score?
Question was resolved at Statalist. Turns out that Margins alone can't do what I was trying to; the model needs to be run with an interaction term. Then it's simple.

Stata - Cohort Study - Display Crude Risk Ratio (like r(rr_crude))

I'm starting to use Stata 14. I'm trying to do some basic risk ratio analysis, but I don't know how to extract single results. Given the following code:
clear all
webuse ugdp
cs case exposed [fw=pop], by(age)
we get an output with four risk ratios, for both age categories, a crude one and a M-H one. With
dis r(rr)
I get the last (?) ratio, but is it possible to specify it? Like
dis r(rr_crude)
dis r(rr_mh)
or something like that? I haven't found a solution. Or is it possible to do something like saving the output in a matrix and indicating it with row and column indices?
I haven't found a solution in the documentation.
Edit:
Just create scalars, which persist in memory
clear all
webuse ugdp
cs case exposed [fw=pop], by(age)
scalar rr_mh = r(rr)
Then use glm:
glm case exposed [fw = pop], family(binomial) link(log)
scalar rr_crude = exp(_b[exposed])
or
cs case exposed [fw = pop]
scalar rr_crude = r(rr)
In either case:
di rr_crude
di rr_mh

create a set of continuous variables from a factor variable and a continuous variable

In Stata, I have a factor variable with 50 levels (state) and an integer-valued variable (year). I want to create 50 new variables: 50 interactions of state indicators with the year variable. Is there a way to do this without writing 50 lines of code?
I can produce the 50 state dummies with tabulate state, generate (state), but I don't know how to get further than that without writing a line to create each individual state-year variable.
I want to use the new state-year variables in a regression. Stata's factor notation makes it easy to include the state-year variables as regressors without creating them beforehand (e.g., with a command like regress y i.state#c.year), but some add-on functions don't support factor notation.
You can try using xi, both as a stand-alone command to create indicator and interaction terms, and as a command prefix. A nonsensical example:
clear all
set more off
sysuse auto
* stand-alone
xi i.rep78*mpg
* as prefix
xi: regress price i.rep78*mpg
Run help xi for all the details.
Edit
To make this a bit clearer, suppose the regress command did not admit the use of either factor variable notation or the xi: prefix. Then using the xi stand-alone syntax you could create the indicator and interaction terms (which answers your original question) and then use those terms with the regress command:
sysuse auto, clear
xi i.rep78*mpg
regress price mpg _Irep78* _IrepXmpg*
(Remember to use Stata's help capabilities. Running search interactions, for example, leads you to xi......Interaction expansion.)

Storing the cluster robust standard error to create a new variable— Stata 12 for Mac

I need to store the value for the cluster robust standard error in order to use it to create a new variable.
I am able to get the cluster robust standard error with the mean command, but stata does not store this value.
Do you have any suggestions about how to calculate the cluster robust standard error for an estimate and then store this value in order to use it to create a new variable?
I think this might almost do the trick. There might be a more elegant way to do this. Toy data, nonsensical example:
/* Get some data */
webuse nhanes2f, clear
svyset psuid [pweight=finalwgt], strata(stratid)
/* get the standard error of the constant, which is the mean */
svy: reg zinc
display _se[_cons]
generate se = _se[_cons]
/* Verify that this is correct */
svy: mean zinc
However, you also want to cluster, which complicates things. I think if you only have survey weights (aka first stage clusters), you can do:
reg zinc [pweight=finalwgt], cluster(region)
There might a way to do what you want with -glamm-, which is user-written command. You should ask this question on Statalist if you don't get much of response here.