Is there a code for including a p-value to test for normality of variables? - stata

I'd like to include a statistic in my summary statistics table within Stata, with the summarize command. Is there any possibility or other convenient way to include p-values (of the normality test) of the variables included? Would be very helpful. I'm using Stata 17.

There are many tests for normality, and several are included in Stata. The Shapiro-Wilk test is a modern classic, and quite popular. The more recent Doornik-Hansen test has been calibrated over a wide range of situations. Here's a token example:
. sysuse auto, clear
(1978 automobile data)
. swilk mpg
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
mpg | 74 0.94821 3.335 2.627 0.00430
. mvtest normality mpg
Test for multivariate normality
Doornik-Hansen chi2(2) = 12.366 Prob>chi2 = 0.0021
The P-value is typically returned as an r-class value, so read the documentation for each command, and try return list after running each.

Related

Stepwise and time & individual/country fixed effects panels in Stata with appropriate s.e. adjustment

I want to run stepwise on a linear probability model with time and individual fixed effects in a panel dataset but stepwise does not support panels out of the box. The solution is to to run xtdata y x, fe followed by reg y x, r . However, the resulting standard errors are too small. One attempt to solve this problem can be found here: http://www.stata.com/statalist/archive/2006-07/msg00629.html but my panel is highly unbalanced (I have a different number of observations for different variables). I also don't understand how stepwise would include this information in its iterations with different variable lists. Since stepwise bases its decision rules on the pvals, this is quite crucial.
Reproducible Example:
webuse nlswork, clear /*unbalancing a bit:*/
replace tenure =. if year <73
replace hours =. if hours==24 | hours==38
replace tenure =. if idcode==3 | idcode == 12 | idcode == 19
xtreg ln_wage tenure hours union i.year, fe vce(robust)
eststo xtregit /*this is what I want to reproduce without xtreg to use it in stepwise */
xi i.year /* year dummies to keep */
xtdata ln_wage tenure hours union _Iyear*, fe clear
reg ln_wage tenure hours union _Iyear*, vce(hc3) /*regression on transformed data */
eststo regit
esttab xtregit regit
As you can see, the estimates are fine but I need to adjust the standard errors. Also, I need to do that in such a way that stepwise understands in its iterations when the number of variables changes, for example. Any help on how to proceed?

Stopping at the variable before a specified variable in a varlist

I'm stuck on a tricky data management question, which I need to do in Stata. I'm using version 13.1.
I have more than 40 datasets I need to work on using a subset of variables that is different in each dataset. I can't include the data or specific analysis I'm doing for proprietary reasons but will try to include examples and code.
I have a set of datasets, A-Z. Each has a set of questions, Q1 through Q200. I need to do an analysis that includes a varlist entry on each dataset that excludes the last few questions (which deal with background info). I know this background info starts with a certain question (e.g. "MALE / FEMALE") although the actual number for that question varies by dataset.
Here's what I have done so far:
foreach X in A B C D E F {
use `X'_YEAR.dta, clear
lookfor "MALE/FEMALE"
local torename = r(varlist)
rename `torename' MF
ANALYSIS Q1 - MF
}
That works but the problem is I'm including the variable that's actually the beginning of where I should start excluding. I know that I can save the varlist as a macro and then use the placement in the macro to exclude, for example, the seventh variable.
However, I'm stuck on taking that a step further - using this as an entry in the varlist to stop at the variable MF. Something like ANALYSIS Q1 - (MF - 1).
Does anyone know if something like that is possible?
I've searched for this issue on this site and Google and haven't found a good solution.
Apologies if this is a simple issue I've missed.
Here's one approach building on your code.
. sysuse auto.dta, clear
(1978 Automobile Data)
. quiet describe, varlist
. local vars `r(varlist)'
. display "vars - `vars'"
vars - make price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign
. lookfor "Circle"
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------
turn int %8.0g Turn Circle (ft.)
. local stopvar `r(varlist)'
. display "stopvar - `stopvar'"
stopvar - turn
. local myvars
. foreach var in `vars' {
2. if "`var'" == "`stopvar'" continue, break
3. local myvars `myvars' `var'
4. }
. display "myvars - `myvars'"
myvars - make price mpg rep78 headroom trunk weight length
And then just use `myvars' wherever you need the list of analysis variables. Alternatively, if your variable list always starts with Q1, you can change the local within the loop to
local lastvar `var'
and use
Q1-`lastvar'
for the list of analysis variables.

Quantile Regression with Quantiles based on independent variable

I am attempting to run a quantile regression on monthly observations (of mutual fund characteristics). What I would like to do is distribute my observations in quintiles for each month (my dataset comprises 99 months). I want to base the quintiles on a variable (lagged fund size i.e. Total Net Assets) that will be later employed as an independent variable to explain fund performance.
What I already tried to do is use the qreg command, but that uses quantiles based on the dependent variable not the independent variable that is needed.
Moreover I tried to use the xtile command to create the quintiles; however, the by: command is not supported.
. by Date: xtile QLagTNA= LagTNA, nq(5)
xtile may not be combined with by
r(190);
Is there a (combination of) command(s) which saves me from creating quintiles manually on a month-by-month basis?
Statistical comments first before getting to your question, which has two Stata answers at least.
Quantile regression is defined by prediction of quantiles of the response (what you call the dependent variable). You may or may not want to do that, but using quantile-based groups for predictors does not itself make a regression a quantile regression.
Quantiles (here quintiles) are values that divide a variable into bands of defined frequency. Here you want the 0, 20, 40, 60, 80, 100% points. The bands, intervals or groups themselves are not best called quantiles, although many statistically-minded people would know what you mean.
What you propose seems common in economics and business, but it is still degrading the information in the data.
All that said, you could always write a loop using forval, something like this
egen group = group(Date)
su group, meanonly
gen QLagTNA = .
quietly forval d = 1/`r(max)' {
xtile work = LagTNA if group == `d', nq(5)
replace QLagTNA = work if group == `d'
drop work
}
For more, see this link
But you will probably prefer to download a user-written egen function [correct term here] to do this
ssc inst egenmore
h egenmore
The function you want is xtile().

Emulate Stata 8 clustered bootstrapped regression

I'm trying to store a series of scalars along the coefficients of a bootstrapped regression model. The code below looks like the example from the Stata [P]rogramming manual for postfile, which is apparently intended for use with such procedures.
The problem is with the // commented lines, which fail to work. More specifically, the problem seems to be that the syntax below worked in Stata 8 but fails to work in Stata 9+ after some change in the bootstrap procedure.
cap pr drop bsreg
pr de bsreg
reg mpg weight gear_ratio
predict yhat
qui sum yhat
// sca mu = r(mean)
// post sim (mu)
end
sysuse auto, clear
postfile sim mu using results , replace
bootstrap, cluster(foreign) reps(5) seed(6112): bsreg
postclose sim
use results, clear
Adding version 8 to the code did not solve the issue. Would anyone know what is wrong with this procedure, and how to fix it for execution in Stata 9+? The problem has been raised in the past and more recently, but without finding an answer.
Sorry for the long description, it's a long problem.
I've presented the issue as if it's a programming one because I'm using this code to replicate some health inequalities research. It's necessary to bootstrap the entire procedure, not just the reg model. I have some quibbles with the methodology, but nothing that would stop me from replicating the analysis.
Adding noisily to the bootstrap showed a problem with the predict command. Here's a fix using a tempvar macro.
cap pr drop bsreg
pr de bsreg
reg mpg weight gear_ratio
tempvar yhat
predict `yhat'
qui sum `yhat'
sca mu = r(mean)
post sim (mu)
end
sysuse auto, clear
postfile sim mu using results , replace
bootstrap, cluster(foreign) reps(5) seed(6112): bsreg
postclose sim
use results, clear

Stata: Hiding command lines

. sysuse auto, clear
(1978 Automobile Data)
. di "I am getting some summary statistics for PRICE"
I am getting some summary statistics for PRICE
. su price
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
.
end of do-file
I want to hide the command lines, and show only the results as follows:
I am getting some summary statistics for PRICE
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
How can I do this? Thanks.
The answer from user1493368 is correct, but writing code like that is tedious and error-prone for more complicated examples. Another answer is just to learn how to write Stata programs! Put this in a do-file editor window and run it
program myprog
qui sysuse auto, clear
di "I am getting some summary statistics for PRICE"
su price
end
Then type interactively
myprog
As in practice one makes lots of little mistakes, a very first line such as
capture program drop myprog
is a good idea.
This really is prominently and well documented: start with the later chapters in [U].
Try this: The output text file (quiet_noise. txt) will have the one you want.
quietly {
log using quiet_noise.log, text replace
sysuse auto
noisily: di "I am getting some summary statistics for PRICE"
noisily: su price
log close
}
Commenting Stata output, especially when you want to share your logfiles become a problem which is very well reflected in your question.
As Nick Cox nicely has explained, Writing a program to display the text is a very good idea. However, including text in a program comes at a cost i.e. you cannot use that program with other variables. For example, if you write a program to run a regression with the given variables, you cannot use that program with other variables if you comment the findings. In other words, writing comments about a particular findings will make the program less useable. As a result, you will end up writing a program for each analysis, which is not that appealing.
So what is my suggestion? Use the MarkDoc pakcage to comment your results.
In MarkDoc (ssc install markdoc) you can write comments using Markdown / HTML /LaTeX and have it exported to a dynamic document within Stata. In your example it would be as follows:
qui log using example, replace
sysuse auto, clear
/***
Writing comments in Stata logfiles
==================================
I am getting some summary statistics for PRICE
***/
summarize price
qui log c
markdoc example, replace export(pdf)
And MarkDoc will produce a PDF for you that has interpreted your comments as Markdown. In addition to pdf, you can convert the same log file to other formats such as docx, html, tex, Open Office odt, slide, and also epub.
The PDF and HTML formats will also have a syntax highlighter for Stata commands, using Statax Syntax Highlighter.