Combine two plots in one graph using ciplot - stata

I would like to plot the means and confidence intervals of two variables into one graph. I used ciplot to do this for only one variable, but for two this code is not working.
On the internet I found that you could combine the plots as follows:
ciplot relative_ambition12 relative_ambition22, by(quota)
However, if I run this I get the error:
no observations found
At the same time both of the following do produce graphs:
ciplot relative_ambition12, by(quota)
ciplot relative_ambition22, by(quota)
Does anyone know how I can combine these two graphs into one?

The community-contributed command ciplot expects to work on the same set of observations for all variables specified in varlist.
For example, the following works:
. sysuse auto, clear
. generate price2 = price + 500
. ciplot price price2, by(foreign)
However, the following does not:
. replace price2 = . if foreign == 1
. ciplot price price2, by(foreign)
no observations
r(2000);
Both plots can be graphed separately (i.e. if one variable at a time is specified).
When you have different sets of observations, you can use the inclusive option to produce the desired output to the extent possible:
. ciplot price price2, by(foreign) inclusive

Related

Skewness in Stata

I have tried many different combinations of sktest and sadly nothing works.
I was almost certain that sktest will work with by combination but it doesn't.
The issue is: I have binary data gender (male 0 and female 1) and I want to measure the skewness of returns for each (male and female) in the variable returns. Can you please advise?
I was hoping for a result similar to what we get when we run e.g. by gender: summarize returns
Different questions are bundled together here.
Testing
If you want to run sktest for different groups, you can just repeat the command
sysuse auto, clear
sktest price if foreign == 1
sktest price if foreign == 0
or write your own wrapper program to do the same. sktest in essence shows P-values but no summary measures.
Or do something like this:
preserve
statsby , by(foreign) : sktest price
list
restore
Measuring
If you want to see (moment-based) skewmess measures, you can just repeat summarize
bysort foreign: summarize price, detail
A wrapper is already available on SSC that is more selective.
moments price, by(foreign)
----------------------------------------------------------------------
Group | n mean SD skewness kurtosis
----------+-----------------------------------------------------------
Domestic | 52 6072.423 3097.104 1.778 5.090
Foreign | 22 6384.682 2621.915 1.215 3.555
----------------------------------------------------------------------
.
Warnings
Stata uses one estimator for moment-based skewness. There are others.
There are many ways to measure skewness. Those others mentioned in Section 7 of this paper are not a complete list; perhaps the most important omission is L-skewness (see lmoments from SSC).

Unable to display statistics in the graph note() parameter

I want to display the total count of the data in the graph note().
I tried the following:
note(count)
However, this just displays the literal word "count".
I also tried to create a local variable but I am having difficulty just initializing it.
While I can do the following:
. local N = 100
. di `N'
100
I can't seem to do:
. local N = count
count not found
The total number of observations is stored in _N.
sysuse auto, clear
display _N
74
So the following works for me:
local N = _N
twoway scatter mpg price, note(Total no of observations: `N')
The total number of observations is kept in _N but it is not necessarily the number of observations used in a graph.
The command count displays a result and leaves a saved result, the number counted, in its wake as r(N). This is documented both in the help for count and in the manual entry.
Hence you can verify that this sequence leaves a note 74 observations in the resulting graph.
. sysuse auto, clear
(1978 Automobile Data)
. count if mpg < .
74
. histogram mpg, note(`r(N)' observations)
(bin=8, start=12, width=3.625)
Note that no r-class command should intervene here between count and your use of its result. r-class saved results, like any other saved results, are overwritten easily. In many circumstances you are well advised, as you did, to store the result in a local macro, say by
. local N = r(N)
immediately after the count command and then refer to that later in the note().
This is a more general method because count by itself returns the number of observations and so can be used when this is directly what you want.
Combining the other answers, I ultimately did:
count
local N = r(N)
count if male
local N_male = r(N)
count if !male
local N_female = r(N)
...
note("N = `N'" " `N_male' (Male)" " `N_female' (Female)")
But still can't get the commas to render at the thousands and millions place.

Stata: Combine table command with ttest and output latex

For regression output, I usually use a combination of eststo to store estimations, estadd to add the R2 and additional tests, then estab to output the lot.
I need to do the same with the table command. I need the mean, median and N for a variable across three by variables and would like to add stars for the result of a ttest==1 on the mean and signtest==1 on the median. I have three by variables, so I've been using table to collate the mean, median and N, which I'm calling like the following pseudo-code:
sysuse auto,clear
table foreign rep78 , ///
contents(mean price median price n price) format(%9.2f)
ttest price==1, by(foreign rep78)
signtest price=1, by(foreign rep78)
I've tried esttab and estpost to no avail. I've also looked at tabstat, tablemat and summarize as alternatives to table, but they don't allow three by variables.
How can I create this table, add the stars for the ttest and signtest p-values and output the full table?
The main point in your question seems to be producing a LaTeX table. However, you show "pseudo-code", that looks pretty much like Stata code, with the caveat that it is illegal.
In particular, for the ttest you can only have one variable in the by() option. But notice that ttest allows also the by: prefix (you can use both, in fact). Their reasons-to-be are different. On the other hand, signtest does not allow a by() option but it does allow the by: prefix. So you should probably clarify what you want to do before creating the table.
If you are trying to use the by: prefix in both cases and afterwards produce a table, you can create a grouping variable, and put the commands in a loop. In this way, you can try tabulating the saved results for each group using the ESTOUT module (by Ben Jann in SSC). Something like:
*clear all
set more off
sysuse auto
keep price foreign rep78
* create group variable
egen grou = group(foreign rep78)
* tests by group
forvalues i = 1/8 {
ttest price == 1 if grou == `i'
signtest price = 1 if grou == `i'
*<complete with estout syntax>
}
See help by, help egen (the group function), help estout and help saved results.

Is there a way to get past the "too many values" error in Stata when using tabulate?

I am trying to generate frequencies for a variable in Stata conditional on categories of another variable.
This other categorical variable has about 790,000 observations for the category I am interested in.
Stata's 12,000 rows and 1,200 rows limit for one-way and two-way tables respectively makes this impossible.
Every time I run tab x if y==<category of interest> I get the following error:
too many values
r(134);
I installed the bigtab package and though it gives me tables it cannot be used with by or run statistical tests.
Is there a work around for this?
It seems silly that Stata should have this arbitrary limit when SAS and even SPSS can run the exact same operation without trouble.
To some it might seem silly, or at least puzzling, that people want tables with more than 12000 rows, as there must be a better way to display results or answer the question that is in mind.
That said, the limits of tabulate are hard-wired. But you just need to think of reproducing whatever you want to show. So, for one-way frequencies
. bysort rowvar : gen freq = _N
. by rowvar : gen tag = _n == 1
. gsort -freq rowvar
. list rowvar freq if tag, noobs
and for two-way frequencies
. bysort rowvar colvar : gen freq = _N
. by rowvar colvar : gen tag = _n == 1
. gsort -freq rowvar colvar
. list rowvar freq if tag, noobs
A similar approach, with more bells and whistles, is coded within groups (SSC). An even simpler approach in many ways is to collapse or contract the dataset and then list it.
To flag the general strategy here:
Produce what you want as new variables.
Select just one observation from each group if there are multiple observations.
list, not tabulate.
UPDATE
OP asked
. bysort rowvar : gen freq = _N
OP: This generates the freq variable for the last count of every individual value in my rowvar
Me: No. The freq variable is the count of observations for every distinct value of rowvar.
. by rowvar : gen tag = _n == 1
OP: This generates the tag variable for the first count of every unique observation in rowvar.
Me: Correct, provided you say "distinct", not "unique". Unique values occur once only.
. gsort -freq rowvar
OP: This sorts freq and rowvar in descending order
Me: It sorts freq in descending order and rowvar in ascending order within blocks of constant freq.
. list rowvar freq if tag, noobs
OP: What does if do here?
Me: That one is left as an exercise.
Use the command bigtab. (You have to install the package first: run ssc install bigtab.) For help type h bigtab.

Stata: How to top-code a variable

Using the auto.dta data, I want to top-code the PRICE variable at the median of top X%. For example, X% could be 3%, 4%, etc. How can I do this in Stata?
In answering your question, I am assuming that you want to replace all the values above, say top 10%, with value say X(top 90% in the following code).
Here is the sample code:
program topcode
sysuse auto, clear
pctile pct = price, nq(10)
dis r(r9)
gen newprice=price
replace newprice=r(r9) if newprice>r(r9)
end