Why is medcouple in Stata not returning any answer? - stata

I have imported my csv file from https://www.kaggle.com/datasets/rupakroy/online-payments-fraud-detection-dataset into Stata.
I did the follwing steps:
ssc install medcouple
medcouple(amount)
But I receive no output.
Note that the sample size of variable amount is 6,362,620

You've not presented a reproducible problem. Here is a demonstration that medcouple can work.
. sysuse auto, clear
(1978 automobile data)
. ssc inst medcouple
checking medcouple consistency and verifying not already installed...
installing into C:\Users\Laptop\ado\plus\...
installation complete.
. medcouple mpg
MEDCOUPLE
---------
The medcouple is: .25
However, my guess is that you are still waiting for an answer. medcouple just crawls very slowly with a dataset that size.

Related

Is there a code for including a p-value to test for normality of variables?

I'd like to include a statistic in my summary statistics table within Stata, with the summarize command. Is there any possibility or other convenient way to include p-values (of the normality test) of the variables included? Would be very helpful. I'm using Stata 17.
There are many tests for normality, and several are included in Stata. The Shapiro-Wilk test is a modern classic, and quite popular. The more recent Doornik-Hansen test has been calibrated over a wide range of situations. Here's a token example:
. sysuse auto, clear
(1978 automobile data)
. swilk mpg
Shapiro–Wilk W test for normal data
Variable | Obs W V z Prob>z
-------------+------------------------------------------------------
mpg | 74 0.94821 3.335 2.627 0.00430
. mvtest normality mpg
Test for multivariate normality
Doornik-Hansen chi2(2) = 12.366 Prob>chi2 = 0.0021
The P-value is typically returned as an r-class value, so read the documentation for each command, and try return list after running each.

How to shade recession periods in Stata?

I'm trying to shade recession period as a backdrop to a lineplot, using a downloaded NBER dummy.
I use this line of code:
twoway area usrec year
and get this linked graph:
How can I make the graph use bars rather than spikes? (Variable is a dummy, it is 4 or zero)
Nick's comment is really the answer to this question.
Here is an example from help nbercycles by Christopher Baum.
First install nbercycles and freduse (for the MWE data) from SSC:
ssc install nbercycles
ssc install freduse
Then load the data and generate and tsset set a monthly date:
freduse MPRIME, clear
generate ym = mofd(daten)
tsset ym, monthly
Then plot the prime rates and shade the recessions:
nbercycles MPRIME if tin(1970m1,1990m1), ///
file(nber2.do) replace

Emulate Stata 8 clustered bootstrapped regression

I'm trying to store a series of scalars along the coefficients of a bootstrapped regression model. The code below looks like the example from the Stata [P]rogramming manual for postfile, which is apparently intended for use with such procedures.
The problem is with the // commented lines, which fail to work. More specifically, the problem seems to be that the syntax below worked in Stata 8 but fails to work in Stata 9+ after some change in the bootstrap procedure.
cap pr drop bsreg
pr de bsreg
reg mpg weight gear_ratio
predict yhat
qui sum yhat
// sca mu = r(mean)
// post sim (mu)
end
sysuse auto, clear
postfile sim mu using results , replace
bootstrap, cluster(foreign) reps(5) seed(6112): bsreg
postclose sim
use results, clear
Adding version 8 to the code did not solve the issue. Would anyone know what is wrong with this procedure, and how to fix it for execution in Stata 9+? The problem has been raised in the past and more recently, but without finding an answer.
Sorry for the long description, it's a long problem.
I've presented the issue as if it's a programming one because I'm using this code to replicate some health inequalities research. It's necessary to bootstrap the entire procedure, not just the reg model. I have some quibbles with the methodology, but nothing that would stop me from replicating the analysis.
Adding noisily to the bootstrap showed a problem with the predict command. Here's a fix using a tempvar macro.
cap pr drop bsreg
pr de bsreg
reg mpg weight gear_ratio
tempvar yhat
predict `yhat'
qui sum `yhat'
sca mu = r(mean)
post sim (mu)
end
sysuse auto, clear
postfile sim mu using results , replace
bootstrap, cluster(foreign) reps(5) seed(6112): bsreg
postclose sim
use results, clear

Stata: Hiding command lines

. sysuse auto, clear
(1978 Automobile Data)
. di "I am getting some summary statistics for PRICE"
I am getting some summary statistics for PRICE
. su price
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
.
end of do-file
I want to hide the command lines, and show only the results as follows:
I am getting some summary statistics for PRICE
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
How can I do this? Thanks.
The answer from user1493368 is correct, but writing code like that is tedious and error-prone for more complicated examples. Another answer is just to learn how to write Stata programs! Put this in a do-file editor window and run it
program myprog
qui sysuse auto, clear
di "I am getting some summary statistics for PRICE"
su price
end
Then type interactively
myprog
As in practice one makes lots of little mistakes, a very first line such as
capture program drop myprog
is a good idea.
This really is prominently and well documented: start with the later chapters in [U].
Try this: The output text file (quiet_noise. txt) will have the one you want.
quietly {
log using quiet_noise.log, text replace
sysuse auto
noisily: di "I am getting some summary statistics for PRICE"
noisily: su price
log close
}
Commenting Stata output, especially when you want to share your logfiles become a problem which is very well reflected in your question.
As Nick Cox nicely has explained, Writing a program to display the text is a very good idea. However, including text in a program comes at a cost i.e. you cannot use that program with other variables. For example, if you write a program to run a regression with the given variables, you cannot use that program with other variables if you comment the findings. In other words, writing comments about a particular findings will make the program less useable. As a result, you will end up writing a program for each analysis, which is not that appealing.
So what is my suggestion? Use the MarkDoc pakcage to comment your results.
In MarkDoc (ssc install markdoc) you can write comments using Markdown / HTML /LaTeX and have it exported to a dynamic document within Stata. In your example it would be as follows:
qui log using example, replace
sysuse auto, clear
/***
Writing comments in Stata logfiles
==================================
I am getting some summary statistics for PRICE
***/
summarize price
qui log c
markdoc example, replace export(pdf)
And MarkDoc will produce a PDF for you that has interpreted your comments as Markdown. In addition to pdf, you can convert the same log file to other formats such as docx, html, tex, Open Office odt, slide, and also epub.
The PDF and HTML formats will also have a syntax highlighter for Stata commands, using Statax Syntax Highlighter.

How to export tabulations

I have a small project where I need to tabulate a dataset with frequencies in various ways and export those tables in a large Excel sheet. Unfortunately, copy and paste truncates text-labels and causes lots of other issues for us.
Is there a way to save/export the result into a CSV or Excel format?
That is, something similar to the write.table command in R, which I can't install at work.
Update 1:
The Stata FAQ provided three solutions which would work for us: http://www.stata.com/support/faqs/data-management/copying-tables/, but Stata support did a followup mail a shortly after pointing to the FAQ with a link to tabout and the tutorial displayed some truly beautiful tabulations.
We've had some progress with the tabout, but we are not really sure if it would do everything we need, but so far creating tabulations with tabout D7 test.xls works nicely although without any proper aligment of labels and such as you would get generating LaTeX.
Update 2:
OK, so lots of tables weren't as straightforward as with tabulate and the by command in combination - some programming was required (not done at current Stata skill-level). The lack of native support for just exporting any result out is a real pain!
outreg is not going to work, as it only works with estimation (regression-like) results. xml_tab can probably produce anything you like (findit xml_tab to install). Obviously, you can export excel your data, although if you need frequency tables, you probably would want to collapse (count) ..., by(varlist) your data first. (I hate collapse though, as I think it is a poor idea that you need to destroy and reload your data; this is one example where R's concept of objects comes handier than Stata's idea of having only one data set in memory at a time.)
When wanting the tabulated output to anything, whether tabulate or regress or clogit, I always close the current log file and begin a new one, not in the .smcl format but with a .log suffix, handy because usually I want to keep a lot of the values from clogit returns
something along the lines of...
*close logs even if there isn't any
capture log close
log using NAMEOFOUTPUT.log
do something like tab or reg or clogit
log close
Your tabulated results from whichever command will then be in that .log file.
Could outreg be a solution?
http://www.kellogg.northwestern.edu/rc/stata-outreg.htm
Since the above will only do regression tables, estout is a good alternative. And the command estpost, I believe creates tables for tabulations:
http://repec.org/bocode/e/estout/estpost.html
For one way frequency tables fre module can be quite handy too. Output can be written to tab-delimited table and LaTeX.
sysuse auto, clear
fre rep78
rep78 -- Repair Record 1978
-----------------------------------------------------------
| Freq. Percent Valid Cum.
--------------+--------------------------------------------
Valid 1 | 2 2.70 2.90 2.90
2 | 8 10.81 11.59 14.49
3 | 30 40.54 43.48 57.97
4 | 18 24.32 26.09 84.06
5 | 11 14.86 15.94 100.00
Total | 69 93.24 100.00
Missing . | 5 6.76
Total | 74 100.00
-----------------------------------------------------------
Download and more info on SSC:
http://ideas.repec.org/c/boc/bocode/s456835.html