Stata -- Durbin Watson statistic by company id in a large dataset - stata

I am trying to run regressions in Stata by company_id using a large dataset. The goal is to get a line for each company_id with results of the regression. I am using the following code that gives me the beta coefficient, std error, adj r-squared and N. But I also need to include the Durbin Watson statistic and have not been successful doing that so far. Can someone help? Thanks.
statsby _b _se r2 = e(r2_a) _N, by (company_id) saving($path\SC_results_`i'.dta, replace): regress ret sptr_ret

A small program that combines regress and dwstat into one command should help. Here's an attempt.
capture program drop reg_dw
program reg_dw, rclass
syntax varlist
regress `varlist'
dwstat
return scalar dw=r(dw)
end
webuse invest2,clear
gen index=_n
tsset index
statsby _b _se r2 = e(r2_a) dw=r(dw) _N, by (company) saving(x.dta, replace): reg_dw invest market
use x, clear
tab _eq2_dw

Related

Fixed effects regression with a loop is not workling

I do following fixed effects regression with a loop. I always get an error "option if is not allowed"!
levelsof Sic, local(Sic)
xtset Year
foreach i of local Sic {
xtreg y mq r d, fe if Sic == `i'
eststo
}
if i do the same regression with a normal OLS regression, its working without any problems. why?
The if qualifier should come before the comma, options afterwards.
levelsof Sic, local(Sic)
foreach i of local Sic {
eststo: xtreg y mq r d if Sic == `i', fe
}
I assume it worked with OLS because you did not have to specify any options.

Can you use lincom with interaction models coded with ## in Stata?

I am trying to use lincom to sum regression coefficients in a Stata model that codes interaction using ##. Exposure and bmi are continuous variables. Sex is binary.
regr bmi c.exposure##sex covar1 covar2 covar3 i.covar4 i.covar5
lincom chemical + chemical#sex
The regression works just fine, but lincom gives the following error:
exposure##sex invalid name
Alternatively, if I code the second line as
lincom chemical + c.chemical##sex
then I get
invalid matrix stripe;
c.l10_mep_i_sg2_pg##sex
Am I doing something wrong or is this not possible with # interaction coding?
Try adding the , coeflegend option at the end of your regression command. This will allow you to see what Stata calls each coefficient.
Here's a reproducible example:
sysuse auto
reg price i.foreign##c.mpg, coeflegend
lincom 1.foreign+ 1.foreign#c.mpg*25
Alternatively, this sort of thing can usually be done much more easily with margins:
margins, dydx(foreign) at(mpg=25)
Both of these give you the marginal effect of foreign origin on price when miles per gallon is 25.

Syntax for bootstrap estimates from ttest command

I am attempting to demonstrated characteristics of various tests for small samples of data. I would like to demonstrate the performance of the t-test, t-test with bootstrap estimation and the ranksum test. I am interested in obtaining the p-value for each test on multiple sets of data using simulate. However, I cannot obtain t-test estimates using the bootstrap prefix and ttest command.
The data is generated by:
clear
set obs 60
gen level = abs(rnormal(0,1))
gen group = "A"
replace group = "B" if [_n] >30
bootstrap, reps(100): ttest level, by(group)
bootstrap _b, reps(100): ttest level, by(group)
bootstrap boot_p = e(p), reps(100): ttest level, by(group)
The errors for each of the procedures in order are:
expression list required
invalid expression: _b
'e(p)' evaluated to missing in full sample
These results are not consistent with the documentation for the bootstrap prefix. Is there some problem with specification of e or r class objects and ttest ?
Edit:
Understanding now that r-class is the correct group of scalars, I still do not generate a variable 'p' given the code provided in the solution. Additionally:
clear
set more off
set obs 60
gen level = abs(rnormal(0,1))
gen group = "A"
replace group = "B" if [_n] >30
bootstrap p=r(p), reps(100): ttest level, by(group)
display r(p)
does not return the p-value.
ttest is an r-class command and stores its reults in r(). You seem to expect for it to save results in e(), like an e-class command. The norm is that the latter kind fit models; ttest is not in this category.
The two-sided p-value is stored in r(p), as indicated in help ttest:
clear
set more off
set obs 60
gen level = abs(rnormal(0,1))
gen group = "A"
replace group = "B" if [_n] >30
bootstrap p=r(p), reps(100): ttest level, by(group)

Stata: Combine table command with ttest and output latex

For regression output, I usually use a combination of eststo to store estimations, estadd to add the R2 and additional tests, then estab to output the lot.
I need to do the same with the table command. I need the mean, median and N for a variable across three by variables and would like to add stars for the result of a ttest==1 on the mean and signtest==1 on the median. I have three by variables, so I've been using table to collate the mean, median and N, which I'm calling like the following pseudo-code:
sysuse auto,clear
table foreign rep78 , ///
contents(mean price median price n price) format(%9.2f)
ttest price==1, by(foreign rep78)
signtest price=1, by(foreign rep78)
I've tried esttab and estpost to no avail. I've also looked at tabstat, tablemat and summarize as alternatives to table, but they don't allow three by variables.
How can I create this table, add the stars for the ttest and signtest p-values and output the full table?
The main point in your question seems to be producing a LaTeX table. However, you show "pseudo-code", that looks pretty much like Stata code, with the caveat that it is illegal.
In particular, for the ttest you can only have one variable in the by() option. But notice that ttest allows also the by: prefix (you can use both, in fact). Their reasons-to-be are different. On the other hand, signtest does not allow a by() option but it does allow the by: prefix. So you should probably clarify what you want to do before creating the table.
If you are trying to use the by: prefix in both cases and afterwards produce a table, you can create a grouping variable, and put the commands in a loop. In this way, you can try tabulating the saved results for each group using the ESTOUT module (by Ben Jann in SSC). Something like:
*clear all
set more off
sysuse auto
keep price foreign rep78
* create group variable
egen grou = group(foreign rep78)
* tests by group
forvalues i = 1/8 {
ttest price == 1 if grou == `i'
signtest price = 1 if grou == `i'
*<complete with estout syntax>
}
See help by, help egen (the group function), help estout and help saved results.

Storing the predictions and coefficients from Stata for n replications

I have a following code where I am trying to replicate the estimation for n times and then generating prediction and coefficients for further use.
capture program drop mypro
program define mypro
drop _all
sysuse auto
bsample
reg mpg price headroom
mat mycoef=e(b)
gen mypri=mycoef[1,1]
gen myhead=mycoef[1,2]
gen mycons=mycoef[1,3]
predict x1b
end
simulate, seed(10) reps(10) nodots : mypro
The simulate by default gives only the coefficients from 10 different samples. However, I am trying to save each sample dataset along with coefficients mpri, myhead, myconst,and x1b . Is it possible to do this using simulate or do I need to use loop?
Updated as per comment of Nick:
capture program drop mypro
program define mypro
set seed 1
local r=10
forvalues i=1/`r'{
drop _all
sysuse auto
bsample
reg mpg price headroom
mat mycoef=e(b)
gen mypri=mycoef[1,1]
gen myhead=mycoef[1,3]
predict x1b
save data`i',replace
}
end
You are calling simulate to run your program to take a bootstrap sample to get regression results.
sysuse auto
bootstrap : reg mpg price headroom
is a much simpler approach. Look at the documentation for bootstrap to learn more.