I have data with income variable, with weight, and I want to calculate the 5% quantiles by year.
Is there a way to do that?
For the weight I can use regular xtile:
xtile quan = salary [aw=weight], n(20)
And for the years I can use xtile from egenmore:
egen quan = xtile(salary), by(year) nq(20)
But how can I do it for weights and by year together?
There is a weights() option, as stated in help egenmore:
clear
set more off
sysuse auto
keep mpg foreign weight
// egenmore
egen mpg4 = xtile(mpg), by(foreign) nq(4) weights(weight)
// compare with xtile
xtile mpg4_1 = mpg [aweight=weight] if foreign, nq(4)
xtile mpg4_2= mpg [aweight=weight] if !foreign, nq(4)
egen mpg42 = rowtotal(mpg4_1 mpg4_2)
assert mpg4 == mpg42
sort foreign mpg weight
list, sepby(foreign)
In the ado-file for egen's xtile function, you can check how weights are set:
if "`weights'" ~= "" {
local weight "[aw = `weights']"
}
See viewsource _gxtile.ado.
Related
In Stata's auto data the following command creates all missing values: why?
bysort mpg: egen n1 = mean(price) if rep78[_n]!=rep78
For example take the 14 mpg group:
price mpg rep78
11385 14 3
14500 14 2
6303 14 4
12990 14
5379 14 4
13466 14 3
I expected that n1 for the first row will be mean(14500,6303,12990,5379). Basically I want the mean after excluding the first and last rows because for them we have rep78[_n]==rep78 (equals 3). But instead, I get all missing values.
The subscript [_n] is harmless but vacuous here as referring to the current observation. So the condition is just equivalent to rep78 != rep78 or rep78[_n] != rep78[_n] -- which is never true and so no observations satisfy the condition and the mean is returned as missing.
You're hoping or imagining that the prefix by: implies comparisons within a group, but at best that works only if subscripts are explicit and different.
This works for your problem:
sysuse auto, clear
gen wanted = .
quietly forval i = 1/`=_N' {
su price if mpg == mpg[`i'] & rep78 != rep78[`i'], meanonly
replace wanted = r(mean) in `i'
}
There may be a way to do this with rangestat or rangerun from SSC, or otherwise, in which case a better solution may follow.
EDIT: The OP's code suggestion in comments
bysort mpg rep78: egen sum_m_r_price = sum(price)
bysort mpg rep78: egen count_m_r_price = count(price)
bysort mpg: egen sum_r_price = sum(price)
bysort mpg: egen count_r_price = count(price)
gen b_wanted = ( sum_r_price-sum_m_r_price)/ (count_r_price-count_m_r_price)
appears equivalent.
In reverse, this should be faster than that:
rangestat (sum) sum2=price (count) count2=price, i(rep78 0 0) by(mpg)
rangestat (sum) sum1=price (count) count1=price, i(mpg 0 0)
gen double wanted = (sum1 - sum2) / (count1 - count2)
Consider the following toy example:
sysuse auto, clear
tab foreign, sum(price)
| Summary of Price
Car type | Mean Std. Dev. Freq.
------------+------------------------------------
Domestic | 6,072.423 3,097.104 52
Foreign | 6,384.682 2,621.915 22
------------+------------------------------------
Total | 6,165.257 2,949.496 74
How can I save the results in an Excel file?
Using the community-contributed command esttab, the following works for me:
sysuse auto, clear
egen m_total = mean(price)
egen s_total = sd(price)
scalar mtotal = m_total
scalar stotal = s_total
scalar N = _N
collapse (mean) Mean=price (sd) StdDev=price (count) Freq = price, by(foreign)
set obs 3
replace Mean = mtotal in 3
replace StdDev = stotal in 3
replace Freq = N in 3
mkmat Mean StdDev Freq, matrix(A)
esttab matrix(A) using myfilename.xls, varlabels(r1 Domestic r2 Foreign r3 Total) ///
title(" Summary of Price") mlabels(none)
Summary of Price
---------------------------------------------------
Mean StdDev Freq
---------------------------------------------------
Domestic 6072.423 3097.104 52
Foreign 6384.682 2621.915 22
Total 6165.257 2949.496 74
---------------------------------------------------
I am creating a summary statistics table using the community-contributed command estout.
The code looks like this:
sysuse auto, clear
eststo clear
eststo: estpost ttest price mpg weight headroom trunk if rep78 ==3, by(foreign)
eststo: estpost ttest price mpg weight headroom trunk if rep78 ==4, by(foreign)
estout, cells("mu_1 mu_2 b(star)")
The result looks as follows:
--------------------------------------------------------------------------------------------
est1 est2
mu_1 mu_2 b mu_1 mu_2 b
--------------------------------------------------------------------------------------------
price 6607.074 4828.667 1778.407 5881.556 6261.444 -379.8889
mpg 19 23.33333 -4.333333 18.44444 24.88889 -6.444444**
weight 3442.222 2010 1432.222*** 3532.222 2207.778 1324.444***
headroom 3.222222 2.666667 .5555556 3.444444 2.5 .9444444*
trunk 15.59259 12.33333 3.259259 16.66667 10.33333 6.333333**
--------------------------------------------------------------------------------------------
I would like to know how I could stack est1 and est2 on top of each other.
The command estout cannot automatically stack results from stored estimates. Consequently, the use of eststo is redundant. In this case, the easiest way to obtain the desired output is to simply create two matrices with the results and stack one on top of the other.
For example:
sysuse auto, clear
matrix A = J(5, 3, .)
local i 0
foreach var of varlist price mpg weight headroom trunk {
local ++i
ttest `var' if rep78 == 3, by(foreign)
matrix A[`i', 1] = r(mu_1)
matrix A[`i', 2] = r(mu_2)
matrix A[`i', 3] = r(mu_1) - r(mu_2)
local matnamesA `matnamesA' "rep78==3:`var'"
}
matrix rownames A = `matnamesA'
matrix B = J(5, 3, .)
local i 0
foreach var of varlist price mpg weight headroom trunk {
local ++i
ttest `var' if rep78 == 4, by(foreign)
matrix B[`i', 1] = r(mu_1)
matrix B[`i', 2] = r(mu_2)
matrix B[`i', 3] = r(mu_1) - r(mu_2)
local matnamesB `matnamesB' "rep78==4:`var'"
}
matrix rownames B = `matnamesB'
matrix C = A \ B
esttab matrix(C), nomtitles collabels("mu_1" "mu_2" "diff")
---------------------------------------------------
mu_1 mu_2 diff
---------------------------------------------------
rep78==3
price 6607.074 4828.667 1778.407
mpg 19 23.33333 -4.333333
weight 3442.222 2010 1432.222
headroom 3.222222 2.666667 .5555556
trunk 15.59259 12.33333 3.259259
---------------------------------------------------
rep78==4
price 5881.556 6261.444 -379.8889
mpg 18.44444 24.88889 -6.444444
weight 3532.222 2207.778 1324.444
headroom 3.444444 2.5 .9444444
trunk 16.66667 10.33333 6.333333
---------------------------------------------------
I want to store results from ordinary least squares (OLS) regressions in Stata within a double loop.
Here is the structure of my code:
foreach i2 of numlist 1 2 3{
foreach i3 of numlist 1 2 3 4{
quiet: eststo: reg dep covariates, robust
}
}
The end goal is to have a table in Excel composed by twelve rows (one for each model) and seven columns (number of observations, estimated constant, five estimated coefficients).
Any suggestion on how can I do this?
Such a table can be created simply by using the community-contributed command esttab:
sysuse auto, clear
eststo clear
eststo m1: quietly regress price weight
eststo m2: quietly regress price weight mpg
quietly esttab
matrix A = r(coefs)'
matrix C = r(stats)'
tokenize "`: rownames A'"
forvalues i = 1 / `=rowsof(A)' {
if strmatch("``i''", "*b*") matrix B = nullmat(B) \ A[`i', 1...]
}
matrix C = B , C
matrix rownames C = "Model 1" "Model 2"
Result:
esttab matrix(C) using table.csv, eqlabels(none) mlabels(none) varlabels("Model 1" "Model 2")
----------------------------------------------------------------
weight mpg _cons N
----------------------------------------------------------------
Model 1 2.044063 -6.707353 74
Model 2 1.746559 -49.51222 1946.069 74
----------------------------------------------------------------
How could I create a variable by dividing it by an IQR? I have done it through a long way as follows.
Sample data and code is the following:
use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear
foreach var of varlist read-socst {
egen `var'75 = pctile(`var'), p(75)
egen `var'25 = pctile(`var'), p(25)
gen `var'q =`var'75 - `var'25
drop `var'75 `var'25
}
gen readI = read/readq
gen sciI = science/scienceq
The simplest way is just to use summarize results directly:
sysuse auto, clear
quietly foreach v of var price-foreign {
su `v', detail
gen `v'q = `v' / (r(p75) - r(p25))
}
The egen route is overkill if it means creating new variables for each original variable, just to hold the quartiles or the IQR as repeated constants. But egen comes into its own when you want to do this by groups:
bysort foreign: egen mpg_upq = pctile(mpg), p(75)
by foreign: egen mpg_loq = pctile(mpg), p(25)
gen mpg_Q = mpg / (mpg_upq - mpg_loq)
Note that the IQR can be 0, and will often be 0 for indicator variables.