Stata ' rolling ' command: saving variables instead of .dta files - stata

I am using the rolling command in a foreach loop:
use "MyFile.dta"
tsset time, monthly
foreach i of varlist var1 var2 {
rolling _b, window(12) saving(beta_`i'): reg `i' DependentVariable
}
Now, this code saves a different file for each rolling regression. What I would really like is to save each vector of betas obtained from the rolling estimation as a variable.
The final result I would like to obtain is a dataset with a time variable and a "beta_var#" variable for each rolling:
time | beta_var1 | beta_var2
_________|___________|__________
1990m1 | ## | ##
1990m2 | ## | ##
... | ## | ##
200m12 | ## | ##
1990m1 | ## | ##
(PS: secondary question: is there a shortcut to indicate a varlist = to all the variables in the dataset?)

I misread your post and my initial answer does not give what you ask for. Here's one way. Not elegant nor very efficient but it works (just change directory names):
clear all
set more off
* Do not mix with previous trials
capture erase "/home/roberto/results.dta"
* Load data
sysuse sp500
tsset date
* Set fixed independent variable
local var open
foreach depvar of varlist high low close volume {
rolling _b, window(30) saving(temp, replace): regress `depvar' `var'
use "/home/roberto/temp.dta", clear
rename (_b_`var' _b_cons) (b_`depvar' b_cons_`depvar')
capture noisily merge 1:1 start end using "/home/roberto/results.dta", assert(match)
capture noisily drop _merge
save "/home/roberto/results.dta", replace
sysuse sp500, clear
tsset date
}
* Delete auxiliary file
capture erase "/home/roberto/temp.dta"
* Check results
use "/home/roberto/results.dta"
browse
Maybe other solutions can be proposed using postfile or concatenating vectors of results and converting to a dataset using svmat. I'm not sure.
Original answer
Use the saving() option with replace and provide only one file name (drop the macro suffix) :
clear all
set more off
webuse lutkepohl2
tsset qtr
rolling _b, window(30) saving(results, every(5) replace): regress dln_inv dln_inc dln_consump

Related

Why merge with update does not work as intended?

I am trying to update some missing values in a dataset with values from another.
Here is an example in Stata 14.2:
sysuse auto, clear
// save in order to merge below
save auto, replace
// create some missing to update
replace length = . if length < 175
// just so the two datasets are not exactly the same, which is my real example
drop if _n == _N
merge 1:1 make using auto, nogen keep(master match_update) update
The code above only keeps the observations updated (26 observations). It is exactly the same result if one uses keep(match_update) instead.
Why is Stata not keeping all observations in the master dataset?
Note that not using match_update is not helpful either, as it removes all observations.
My current workaround is to rename original variables, merge all, and then replace if original was missing. However, this defeats the point of using the update option, and it is cumbersome for updating many variables.
Personally, I always prefer to manually drop / keep observations using the _merge variable as it is more transparent and less error prone.
However, the following does what you want:
merge 1:1 make using auto, nogenerate keep(master match match_update) update
Result # of obs.
-----------------------------------------
not matched 0
matched 73
not updated 47
missing updated 26
nonmissing conflict 0
-----------------------------------------
You can confirm that this is the case as follows:
sysuse auto, clear
save auto, replace
replace length = . if length < 175
drop if _n == _N
merge 1:1 make using auto, update
drop if _merge == 2
drop _merge
save m1
sysuse auto, clear
save auto, replace
replace length = . if length < 175
drop if _n == _N
merge 1:1 make using auto, nogen keep(master match match_update) update
save m2
cf _all using m1
display r(Nsum)
0

Regression in Stata by industry: How to get categories in a variable as the title for the resulting regression output?

I have two variables in my firm-level dataset containing the industrial classification and the industry name to which that company belongs. For a given id_class, industry_name might be missing in the data. See below
| id_class | industry_name |
|----------|-------------------|
| 10 | auto |
| 11 | telecommunication |
| 12 | . |
I'm doing regressions by industry using the levelsof command to save each category in id_class to a local macro to allow me to loop through each category
levelsof id_class, (id_class_list)
foreach i of local id_class_list {
reg y x if id_class == `i'
}
I want to save the estimated coefficients for each regression to a table (I know how to do this part), but I want the table to have the title contained in the industry_name variable. How can I do this?
You can use the macro extended functions for extracting data attributes, such as variable value labels, like this:
sysuse auto
levelsof foreign, local(list)
foreach v of local list {
local vl: label (foreign) `v'
di "Car Origin is `vl':"
reg price if foreign==`v'
}
have a look at statsby : check out its help and manual entry

Computing and plotting difference in group means

In what follows I plot the mean of an outcome of interest (price) by a grouping variable (foreign) for each possible value taken by the fake variable time:
sysuse auto, clear
gen time = rep78 - 3
bysort foreign time: egen avg_p = mean(price)
scatter avg_p time if (foreign==0 & time>=0) || ///
scatter avg_p time if (foreign==1 & time>=0), ///
legend(order(1 "Domestic" 2 "Foreign")) ///
ytitle("Average price") xlab(#3)
What I would like to do is to plot the difference in the two group means over time, not the two separate means.
I am surely missing something, but to me it looks complicated because the information about the averages is stored "vertically" (in avg_p).
The easiest way to do this is to arguably use linear regression to estimate the differences:
/* Regression Way */
drop if time < 0 | missing(time)
reg price i.foreign##i.time
margins, dydx(foreign) at(time =(0(1)2))
marginsplot, noci title("Foreign vs Domestic Difference in Price")
If regression is hard to wrap your mind around, the other is involves mangling the data with a reshape:
/* Transform the Data */
keep price time foreign
collapse (mean) price, by(time foreign)
reshape wide price, i(time) j(foreign)
gen diff = price1-price0
tw connected diff time
Here is another approach. graph dot will happily plot means.
sysuse auto, clear
set scheme s1color
collapse price if inrange(rep78, 3, 5), by(foreign rep78)
reshape wide price, i(rep78) j(foreign)
rename price0 Domestic
label var Domestic
rename price1 Foreign
label var Foreign
graph dot (asis) Domestic Foreign, over(rep78) vertical ///
marker(1, ms(Oh)) marker(2, ms(+))

How to efficiently create lag variable using Stata

I have panel data (time: date, name: ticker). I want to create 10 lags for variables x and y. Now I create each lag variable one by one using the following code:
by ticker: gen lag1 = x[_n-1]
However, this looks messy.
Can anyone tell me how can I create lag variables more efficiently, please?
Shall I use a loop or does Stata have a more efficient way of handling this kind of problem?
#Robert has shown you the streamlined way of doing it. For completion, here is the "traditional", boring way:
clear
set more off
*----- example data -----
set obs 2
gen id = _n
expand 20
bysort id: gen time = _n
tsset id time
set seed 12345
gen x = runiform()
gen y = 10 * runiform()
list, sepby(id)
*----- what you want -----
// "traditional" loop
forvalues i = 1/10 {
gen x_`i' = L`i'.x
gen y_`i' = L`i'.y
}
list, sepby(id)
And a combination:
// a combination
foreach v in x y {
tsrevar L(1/10).`v'
rename (`r(varlist)') `v'_#, addnumber
}
If the purpose is to create lagged variables to use them in some estimation, know you can use time-series operators within many estimation commands, directly; that is, no need to create the lagged variables in the first place. See help tsvarlist.
You can loop to do this but you can also take advantage of tsrevar to generate temporary lagged variables. If you need permanent variables, you can use rename group to rename them.
clear
set obs 2
gen id = _n
expand 20
bysort id: gen time = _n
tsset id time
set seed 12345
gen x = runiform()
gen y = 10 * runiform()
tsrevar L(1/10).x
rename (`r(varlist)') x_#, addnumber
tsrevar L(1/10).y
rename (`r(varlist)') y_#, addnumber
Note that if you are doing this to calculate a statistic on a rolling window, check out tsegen (from SSC)

Stata output files in surveys (proportions)

I need to modify the code below which I'm using on some CPS data to capture insurance coverage. I need to output a file with the percent covered by Census region (there are four). It should look something like this:
region n percent
1 xxx xx
2 xxx xx
3 xxx xx
4 xxx xx
I could live with two rows defining the percentages covered and not covered in each region if necessary, but I really only need the percentage covered.
Here's the code I'm using:
svyset [iw=hinswt], sdrweight(repwt1-repwt160) vce(sdr)
tempname memhold
postfile `memhold' region_rec n prop using Insurance, replace
levelsof region_rec, local(lf)
foreach x of local lf{
svy, subpop(if region_rec==`x' & age>=3 & age<=17): proportion hcovany
scalar forx = `x'
scalar prop = _b[hcovany]
matrix b = e(_N_subp)
matrix c = e(_N)
scalar n = el(c,1,1)
post `memhold' (forx) (n) (prop)
}
postclose `memhold'
use Insurance, clear
list
This is what it produces:
Survey: Proportion estimation Number of obs = 210648
Population size = 291166198
Subpop. no. obs = 10829
Subpop. size = 10965424.5
Replications = 160
_prop_1: hcovany = Not covered
--------------------------------------------------------------
| SDR
| Proportion Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
hcovany |
_prop_1 | .0693129 .0046163 .0602651 .0783607
Covered | .9306871 .0046163 .9216393 .9397349
--------------------------------------------------------------
[hcovany] not found
r(111);
I can't figure out how to get around the error message at the bottom and get it to save the results. I think a SE and CV would be a desirable feature as well, but I'm not sure how to handle that within the matrix framework.
EDIT: Additional output
+----------------------------------+
| region~c n prop se |
|----------------------------------|
| 1 9640 .9360977 2 |
| 2 12515 .9352329 2 |
| 3 14445 .8769684 2 |
| 4 13241 .8846368 2 |
+----------------------------------+
Try changing _b[hcovany] for _b[some-value-label]. To be clear, the following non-sensical example is similar to your code, but instead of using _b[sex], where sex is a variable, it uses _b[Male], where Male is a value label for sex. Subpopulation sizes and standard errors
are also saved.
clear all
set more off
webuse nhanes2f
svyset [pweight=finalwgt]
tempname memhold
tempfile results
postfile `memhold' region nsubpop maleprop stderr using `results', replace
levelsof region, local(lf)
foreach x of local lf{
svy, subpop(if region == `x' & inrange(age, 20, 40)): proportion sex
post `memhold' (`x') (e(N_subpop)) (_b[Male]) (_se[Male])
}
postclose `memhold'
use `results', clear
list
If we were to use _b[sex] instead of _b[Male], we would get the same r(111) error as in your original post.
For this example, lets see what the matrix e(b), containing the estimated proportions, looks like:
. matrix list e(b)
e(b)[1,2]
sex: sex:
Male Female
y1 .48821487 .51178513
Therefore, if we wanted to extract the proportions for females instead
of males, we could use _b[Female].
Yet another option is to save the estimation result in a matrix and use numerical subscripts:
<snip>
matrix b = e(b)
post `memhold' (`x') (b[1,2])
<snip>
There are other slight changes like the use of inrange and direct use of returned estimation results with post.
Also, you may want to take a look at help _variables and its link:
[U] 13.5 Accessing coefficients and standard errors.