In Stata, the following code produces a contour plot:
sysuse auto, clear
twoway (contour headroom mpg price), legend(on)
The plot looks as follows:
How do I change the legend to automatically format its labels as 1.0, 2.0 etc.?
I would like to do this without having to manually create labels and pass them to the legend option.
In my actual application, I'm trying to get this to work for a map that comes from the spmap command, but it's difficult to provide a minimal working example with that command.
It looks like you want to re-format the numbering in the clegend of the contour plot (as opposed to the 'traditional' legend).
In this case, you just need to specify the zlabel option and the required format:
sysuse auto, clear
twoway contour headroom mpg price, zlabel(#5, format(%2.1f))
This will produce the desired output:
Related
I have used the by option of the histogram command to group my two categorical variables:
sysuse auto, clear
hist price, percent by(foreign rep78)
I would like to graph 8 histograms together on the same plot, with the following layout:
XXXX
XXXX
However, as the above graph shows Stata places the histograms by default as follows:
XXX
XXX
XX
How can I achieve a 2x4 orientation of histograms?
I have played around with the aspectratio() option, but this changes the aspect ratio of the individual graphs, not of the whole plot.
You need to specify the cols() sub-option in by():
sysuse auto, clear
histogram price, percent by(foreign rep78, cols(4))
I work with Stata and I have math grades for two different groups: A and B.
I want to see the gap that exists between both groups in each decile. In addition I want to do a box plot of this gap for each decile (I want to have 10 box plots, one for each decile which shows the gap between group grades).
What I first did was to compute the deciles using xtile for both groups:
xtile decileA= mat if group==1, nq(10)
xtile decileB= mat if group==0, nq(10)
However, groups A and B do not have the same number of observations nor the same distribution. I thought of computing quantiles for each decile and group and subtracting them to get the difference in each decile at each quartile to create the boxplot. But I do not know how to proceed afterwards to create the graph, and given that I have a different number of observations in each group decile I do not know if it is correct to proceed this way.
If I try to use the pctile command and compute the difference at each decile, I lose all the variance in the data inside each decile. I only get median differences and not all the quantiles I want.
Example:
pctile decileA= mat if group==1, nq(10)
pctile decileB= mat if group==0, nq(10)
gen qdiff= decileA- decileB if _n<10
gen qtau=_n/10 if _n<10
graph box qdiff, over(tau)
I want to know if there is a way to do the graph I am intending to?
Cross-posted on Statalist.
There is certainly a way to accomplish what you want with a bit of effort, but if the goal is to make a comparison between the two groups at each decile with some notion of variability, you can easily get that from a simultaneous quantile regression and the SEs that it produces:
sysuse auto, clear
sqreg price i.foreign, quantile(.1 .2 .3 .4 .5 .6 .7 .8 .9)
margins, dydx(foreign) ///
predict(outcome(q10)) ///
predict(outcome(q20)) ///
predict(outcome(q30)) ///
predict(outcome(q40)) ///
predict(outcome(q50)) ///
predict(outcome(q60)) ///
predict(outcome(q70)) ///
predict(outcome(q80)) ///
predict(outcome(q90)) ///
post
marginsplot, yline(0) xlab(, grid) ylab(#10, grid angle(90))
This yields a graph showing that foreign origin is associated with a bigger price at higher deciles, with the exception of the top decile, though none of the differences are probably significant here given how much the CIs overlap:
You can even conduct formal hypothesis tests that the effects are equal like this:
. test _b[1.foreign:9._predict] = _b[1.foreign:8._predict]
( 1) - [1.foreign]8._predict + [1.foreign]9._predict = 0
chi2( 1) = 3.72
Prob > chi2 = 0.0537
With 74 cars, we cannot reject that the effect on the 80th and 90th percentile are the same even though the point estimates have the opposite signs but similar magnitude.
I am running a regression with two fixed effects categories (country and year, is economic macro data). Since I am using xtreg, one is autohid, but the other is a variable:
xtreg fiveyearyg taxratio i.year if taxratiocut == 1, i(wbcode1) fe cluster(wbcode1)
estimates store yi
I am running a number of these and I want to graph the coefficients for taxratio from each. But when I store the data, it stores both the taxratio coefficient, and the 50+ coefficients for the year fixed effects.
After a lot of searching, I cannot find any way to store (or recall) just part of the regression output, the one coefficient (with SEs) that I care about. Does anyone know a way to do that?
Here is how you can do that:
webuse grunfeld,clear
qui xtreg mvalue invest i.year,fe cluster(company)
//e(b) stores coefficient matrix and e(V) stores variance-covariance matrix. For details type: ereturn list after running the model
//let's say you want to extract only the coefficient on invest
mat coef_matrix=e(b)
scalar coef_invest=coef_matrix[1,1]
dis coef_invest
1.7178414
//to extract se of the the coefficient on invest
mat var_matrix=e(V)
mat diag_var_matrix=vecdiag(var_matrix) //diagonal elements are variances and the standard errors are square roots of these variances
matmap diag_var_matrix se_matrix , m(sqrt(#))) //you need to install matmap using ssc install matmap, you will get error if variance is negative
scalar se_invest=se_matrix[1,1]
dis se_invest
.14082153
Accessing coefficients is as easy as calling _b[varname]; analogously the corresponding standard errors: _se[varname].
An example:
webuse grunfeld, clear
qui xtreg mvalue invest i.year,fe cluster(company)
// coef for invest
display _b[invest]
// std error for invest
display _se[invest]
// displayed results in matrix
matrix list r(table)
For multiple-equation models use [eqno]_b[varname] where the preceding bracket contains an equation number.
More detail can be found in [U] 13.5 Accessing coefficients and standard errors.
Starting Stata 12, estimation commands also store results in r() [and not just e()]. Notice I listed r(table), which contains most results displayed by the estimation command xtreg.
You show interest in plotting coefficients, so you should read on the user-written command coefplot. Run ssc install coefplot to download and help coefplot to get started. It has many options.
Edit
A complete example that plots only coefficients for invest (leaving out those for year), using coefplot, and based on conditional regressions is:
clear
set more off
webuse grunfeld
xtreg mvalue invest i.year if time <= 10,fe cluster(company)
estimates store before10
xtreg mvalue invest i.year if time > 10,fe cluster(company)
estimates store after10
coefplot before10 after10, keep(invest)
After running glm I can type matrix list r(table) and see a table of all of my results. If I wish, I can write slopes and SEs to variables, e.g., gen B=_b[x1] or gen se=_se[x1]. However, this does not work with the confidence limits, ll and ul. How can I access them in a similar manner?
I am not sure if the _b[] and _se[] results are associated with r(table)--I have thought they are products of e(b) and e(V).
Anyway, since you have r(table), you can just save the results into another matrix, and then use the regular matrix operations to put the lower bounds and upper bounds into new matrices. If for some reason transformation into variables is desired (for example, plotting), there's always -svmat-.
sysuse auto,clear
glm price mpg foreign, f(gaussian)
mat r=r(table)
matrix ll=r["ll",....]' // see -help matrix extraction-; transposed for svmat
svmat ll,names(ll) // lower bounds are in variable ll1
There was a similar question to this before (how to prevent midpoints from extending), but does not answer my question.
I'm creating a histogram as follows and outputting it to a PNG file:
ods graphics on / imagename = "histoOne" imagefmt = png reset=index border=off width=4in;
ods select where=(_name_ ? 'Histogr');
proc univariate data=myData noprint; *(WHERE=(sumStake < 250));
Title1;
var sumStake;
histogram sumStake / name='histogr' vminor=4 grid lgrid=34 endpoints=0 to 250 by 20 cfill=red;
*Omit the inset, because the stats refer to the reduced dataset;
INSET n (comma11.0) mean (5.2) median (5.2) std='Std Dev'(5.2) max='Max' (5.2) / pos = ne
header = 'Summary Statistics' cfill = ywh;
run;
ods graphics off;
I want to display both the histogram and the summary statistics inset. However, the data is so skewed, that it makes no sense to show the maximum value for sumStake on the X-Axis. I want to cap the X-Axis at 250.
SAS keeps extending the ENDPOINTS value. How can I suppress this?
I don't want to use the (WHERE=(sumStake < 250)); filter as the count, mean, median and max in the inset will be based on the reduced sample, rather than the entire sample and will make no sense.
You may need to change your data in some fashion, or do the graph in a different way. Histograms in SAS don't allow much mucking about with the data in this fashion; you have to do it ahead of time. Histograms are meant largely for showing how your data falls out, so it's a bit counterintuitive to 'hide' some of the data fallout - I understand why you want to, but it is not exactly the primary purpose of histograms, hence why the functionality isn't there in SAS.
I don't think in any event that PROC UNIVARIATE gives you any ability to control this, so you may lose the inset. You can control the axis length explicitly in PROC SGPLOT histograms (with an AXIS statement in PROC SGPLOT), but they don't have the same kind of inset - you could make something probably, but not as simply. It also will still make the oversized bins, and won't reallocate those over-binned records.
Another option, particularly if you're making the inset separately anyway, would be to do the SGPLOT histogram (or bar chart) with data you've 'fixed' (right censored) and calculate the inset data separately (on the uncensored data).