Consider the following toy example:
sysuse auto, clear
tab foreign, sum(price)
| Summary of Price
Car type | Mean Std. Dev. Freq.
------------+------------------------------------
Domestic | 6,072.423 3,097.104 52
Foreign | 6,384.682 2,621.915 22
------------+------------------------------------
Total | 6,165.257 2,949.496 74
How can I save the results in an Excel file?
Using the community-contributed command esttab, the following works for me:
sysuse auto, clear
egen m_total = mean(price)
egen s_total = sd(price)
scalar mtotal = m_total
scalar stotal = s_total
scalar N = _N
collapse (mean) Mean=price (sd) StdDev=price (count) Freq = price, by(foreign)
set obs 3
replace Mean = mtotal in 3
replace StdDev = stotal in 3
replace Freq = N in 3
mkmat Mean StdDev Freq, matrix(A)
esttab matrix(A) using myfilename.xls, varlabels(r1 Domestic r2 Foreign r3 Total) ///
title(" Summary of Price") mlabels(none)
Summary of Price
---------------------------------------------------
Mean StdDev Freq
---------------------------------------------------
Domestic 6072.423 3097.104 52
Foreign 6384.682 2621.915 22
Total 6165.257 2949.496 74
---------------------------------------------------
Related
for simplification, let's assume following script to create a simple regression table:
sysuse auto
eststo clear
qui regress price weight mpg
esttab using "table.rtf", cells(t) mtitles onecell nogap ///
stats(N, labels("Observations")) label ///
compress replace
eststo clear
Output:
(1)
.
t
Weight (lbs.) 2.723238
Mileage (mpg) -.5746808
Constant .541018
Observations 74
Question:
Would it be possible to mark every t-value above 0.5 or below 0.5 with an asterisk? (= greater than absolute value 0.5)
Please note: In the specific application case, I can't work with given p-values, and need a custom solution that works with thresholds of t.
Desired outcome:
(1)
.
t
Weight (lbs.) 2.723238*
Mileage (mpg) -.5746808*
Constant .541018*
Observations 74
Crossposting can be found here:
Thank you for your help!
You cannot do this directly with estout but the following works all the same:
sysuse auto, clear
regress price weight mpg
quietly esttab, mtitles onecell nogap stats(N, labels("Observations")) label ///
compress replace star staraux
matrix A = r(coefs)
matrix A = A[1...,2]
svmat A
generate A2 = "*" if abs(A1) >= 0.5
generate A4 = string(A1) + A2
local names : rownames A
generate A3 = ""
forvalues i = 1 / `: word count `names'' {
replace A3 = `"`: word `i' of `names''"' in `i'
}
list A3 A4 if !missing(A3)
+---------------------+
| A3 A4 |
|---------------------|
1. | weight 2.723238* |
2. | mpg -.5746808* |
3. | _cons .541018* |
+---------------------+
preserve
keep if !missing(A3)
export delimited A3 A4 using table.txt, delimiter(" ") novarnames
restore
You will have to do some more gymnastics to get the variable labels etc.
I have a dataset that looks like this:
I would like to create a table that groups by area and shows the total amount for the area both as a percentage of total amount and as a raw number, as well as the percent of the total number of records/observations per area and total number of records/observations as a raw number.
The code below works to generate a table of raw numbers but does not the show percent of total:
tabstat amount, by(county) stat(sum count)
There isn't a canned command for doing what you want. You will have to program the table yourself.
Here's a quick example using auto.dta:
. sysuse auto, clear
(1978 Automobile Data)
. tabstat price, by(foreign) stat(sum count)
Summary for variables: price
by categories of: foreign (Car type)
foreign | sum N
---------+--------------------
Domestic | 315766 52
Foreign | 140463 22
---------+--------------------
Total | 456229 74
------------------------------
You can do the calculations and save the raw numbers in variables as follows:
. generate total_obs = _N
. display total_obs
74
. count if foreign == 0
52
. generate total_domestic_obs = r(N)
. count if foreign == 1
22
. generate total_foreign_obs = r(N)
. egen total_domestic_price = total(price) if foreign == 0
. sort total_domestic_price
. local tdp = total_domestic_price
. display total_domestic_price
315766
. egen total_foreign_price = total(price) if foreign == 1
. sort total_foreign_price
. local tfp = total_foreign_price
. display total_foreign_price
140463
. generate total_price = `tdp' + `tfp'
. display total_price
456229
And for the percentages:
. generate pct_domestic_price = (`tdp' / total_price) * 100
. display pct_domestic_price
69.212173
. generate pct_foreign_price = (`tfp' / total_price) * 100
. display pct_foreign_price
30.787828
EDIT:
Here's a more automated way to do the above without having to specify individual values:
program define foo
syntax varlist(min=1 max=1), by(string)
generate total_obs = _N
display total_obs
quietly levelsof `by', local(nlevels)
foreach x of local nlevels {
count if `by' == `x'
quietly generate total_`by'`x'_obs = r(N)
quietly egen total_`by'`x'_`varlist' = total(`varlist') if `by' == `x'
sort total_`by'`x'_`varlist'
local tvar`x' = total_`by'`x'_`varlist'
local tvarall `tvarall' `tvar`x'' +
display total_`by'`x'_`varlist'
}
quietly generate total_`varlist' = `tvarall' 0
display total_`varlist'
foreach x of local nlevels {
quietly generate pct_`by'`x'_`varlist' = (`tvar`x'' / total_`varlist') * 100
display pct_`by'`x'_`varlist'
}
end
The results are identical:
. foo price, by(foreign)
74
52
315766
22
140463
456229
69.212173
30.787828
You will obviously need to format the results in a table of your liking.
Here's another approach. I stole #Pearly Spencer's example. It could be generalised to a command. The main message I want to convey is that list is useful for tabulations and other reports, with just usually some obligation to calculate what you want to show beforehand.
. sysuse auto, clear
(1978 Automobile Data)
. preserve
. collapse (sum) total=price (count) obs=price, by(foreign)
. egen pc2 = pc(total)
. egen pc1 = pc(obs)
. char pc2[varname] "%"
. char pc1[varname] "%"
. format pc* %2.1f
. list foreign obs pc1 total pc2 , subvarname noobs sum(obs pc1 total pc2)
+-----------------------------------------+
| foreign obs % total % |
|-----------------------------------------|
| Domestic 52 70.3 315766 69.2 |
| Foreign 22 29.7 140463 30.8 |
|-----------------------------------------|
Sum | 74 100.0 456229 100.0 |
+-----------------------------------------+
. restore
EDIT Here's an essay in egen with similar flavour but leaving the original data in place and new variables also available for export or graphics.
. sysuse auto, clear
(1978 Automobile Data)
. egen total = sum(price), by(foreign)
. egen obs = count(price), by(total)
. egen tag = tag(foreign)
. egen pc2 = pc(total) if tag
(72 missing values generated)
. egen pc1 = pc(obs) if tag
(72 missing values generated)
. char pc2[varname] "%"
. char pc1[varname] "%"
. format pc* %2.1f
. list foreign obs pc1 total pc2 if tag, subvarname noobs sum(obs pc1 total pc2)
+-----------------------------------------+
| foreign obs % total % |
|-----------------------------------------|
| Domestic 52 70.3 315766 69.2 |
| Foreign 22 29.7 140463 30.8 |
|-----------------------------------------|
Sum | 74 100.0 456229 100.0 |
+-----------------------------------------+
I have a simple question about the distinct command in Stata.
When using with a by prefix, can it return a one dimension matrix of r(N)?
For example:
sysuse auto,clear
bysort foreign: distinct rep78
Can I store a [2,1] matrix, with each row representing the number of distinct values of rep78?
The manual seems to suggest that it only stores the number of distinct values of the last by value.
You can easily create your own wrapper for that:
sysuse auto,clear
sort foreign
levelsof foreign, local(foreign_levels)
local number_of_foreign_levels : word count `foreign_levels'
matrix distinct_mat = J(`number_of_foreign_levels', 1, 0)
forvalues i = 1 / `number_of_foreign_levels' {
quietly distinct rep78 if foreign == `i' - 1
matrix distinct_mat[`i', 1] = r(ndistinct)
}
matrix list distinct_mat
distinct_mat[2,1]
c1
r1 5
r2 3
Note that the number of distinct observations is stored in r(ndistinct), not r(N).
Here is another way to get numbers of distinct values into a matrix.
. sysuse auto
(1978 Automobile Data)
. egen tag = tag(foreign rep78)
. tab foreign if tag, matcell(foo)
Car type | Freq. Percent Cum.
------------+-----------------------------------
Domestic | 5 62.50 62.50
Foreign | 3 37.50 100.00
------------+-----------------------------------
Total | 8 100.00
I am creating a summary statistics table using the community-contributed command estout.
The code looks like this:
sysuse auto, clear
eststo clear
eststo: estpost ttest price mpg weight headroom trunk if rep78 ==3, by(foreign)
eststo: estpost ttest price mpg weight headroom trunk if rep78 ==4, by(foreign)
estout, cells("mu_1 mu_2 b(star)")
The result looks as follows:
--------------------------------------------------------------------------------------------
est1 est2
mu_1 mu_2 b mu_1 mu_2 b
--------------------------------------------------------------------------------------------
price 6607.074 4828.667 1778.407 5881.556 6261.444 -379.8889
mpg 19 23.33333 -4.333333 18.44444 24.88889 -6.444444**
weight 3442.222 2010 1432.222*** 3532.222 2207.778 1324.444***
headroom 3.222222 2.666667 .5555556 3.444444 2.5 .9444444*
trunk 15.59259 12.33333 3.259259 16.66667 10.33333 6.333333**
--------------------------------------------------------------------------------------------
I would like to know how I could stack est1 and est2 on top of each other.
The command estout cannot automatically stack results from stored estimates. Consequently, the use of eststo is redundant. In this case, the easiest way to obtain the desired output is to simply create two matrices with the results and stack one on top of the other.
For example:
sysuse auto, clear
matrix A = J(5, 3, .)
local i 0
foreach var of varlist price mpg weight headroom trunk {
local ++i
ttest `var' if rep78 == 3, by(foreign)
matrix A[`i', 1] = r(mu_1)
matrix A[`i', 2] = r(mu_2)
matrix A[`i', 3] = r(mu_1) - r(mu_2)
local matnamesA `matnamesA' "rep78==3:`var'"
}
matrix rownames A = `matnamesA'
matrix B = J(5, 3, .)
local i 0
foreach var of varlist price mpg weight headroom trunk {
local ++i
ttest `var' if rep78 == 4, by(foreign)
matrix B[`i', 1] = r(mu_1)
matrix B[`i', 2] = r(mu_2)
matrix B[`i', 3] = r(mu_1) - r(mu_2)
local matnamesB `matnamesB' "rep78==4:`var'"
}
matrix rownames B = `matnamesB'
matrix C = A \ B
esttab matrix(C), nomtitles collabels("mu_1" "mu_2" "diff")
---------------------------------------------------
mu_1 mu_2 diff
---------------------------------------------------
rep78==3
price 6607.074 4828.667 1778.407
mpg 19 23.33333 -4.333333
weight 3442.222 2010 1432.222
headroom 3.222222 2.666667 .5555556
trunk 15.59259 12.33333 3.259259
---------------------------------------------------
rep78==4
price 5881.556 6261.444 -379.8889
mpg 18.44444 24.88889 -6.444444
weight 3532.222 2207.778 1324.444
headroom 3.444444 2.5 .9444444
trunk 16.66667 10.33333 6.333333
---------------------------------------------------
I want to store results from ordinary least squares (OLS) regressions in Stata within a double loop.
Here is the structure of my code:
foreach i2 of numlist 1 2 3{
foreach i3 of numlist 1 2 3 4{
quiet: eststo: reg dep covariates, robust
}
}
The end goal is to have a table in Excel composed by twelve rows (one for each model) and seven columns (number of observations, estimated constant, five estimated coefficients).
Any suggestion on how can I do this?
Such a table can be created simply by using the community-contributed command esttab:
sysuse auto, clear
eststo clear
eststo m1: quietly regress price weight
eststo m2: quietly regress price weight mpg
quietly esttab
matrix A = r(coefs)'
matrix C = r(stats)'
tokenize "`: rownames A'"
forvalues i = 1 / `=rowsof(A)' {
if strmatch("``i''", "*b*") matrix B = nullmat(B) \ A[`i', 1...]
}
matrix C = B , C
matrix rownames C = "Model 1" "Model 2"
Result:
esttab matrix(C) using table.csv, eqlabels(none) mlabels(none) varlabels("Model 1" "Model 2")
----------------------------------------------------------------
weight mpg _cons N
----------------------------------------------------------------
Model 1 2.044063 -6.707353 74
Model 2 1.746559 -49.51222 1946.069 74
----------------------------------------------------------------