I'm using SAS to plot an histogram with the Kernel density. In the documentation, it is specified that we can choose the parameter c: "the standardized bandwidth for a number that is greater than 0 and less than or equal to 100." But I cannot find the default value used to create the following plot.
Does someone have an idea? Thanks!
SGPLOT minimizes the Asymptotic Mean Integrated Square Error (AMISE) for the kernel density function. According to PROC UNIVARIATE, which also can do KDE:
By default, the procedure uses the AMISE method to compute kernel density estimates.
PROC UNIVARIATE documentation
We can confirm that they both have the same default by comparing the output.
proc univariate data=sashelp.cars;
var horsepower;
histogram / kernel;
run;
In the log, we find:
NOTE: The normal kernel estimate for c=0.7852 has a bandwidth of 21.035 and an AMISE of 392E-7.
Let's plot them together and compare the values.
proc sgplot data=sashelp.cars;
density horsepower/TYPE=KERNEL;
density horsepower/TYPE=KERNEL(c=0.7852);
ods output sgplot;
run;
data diff;
set sgplot;
abs_diff = abs(KERNEL_Horsepower____Y - KERNEL_Horsepower_C_0_7852____Y);
run;
proc univariate data=diff;
var abs_diff;
run;
The average difference between all points plotted is 1.65x10^-9, with the overall largest being 6.76x10^-9. This is, essentially, zero. The reason for the differences is that the c-value given to the user in the log is lower precision than the one calculated by proc sgplot. You can get a higher precision estimate with the outkernel= option in proc univariate as well.
Related
I want to create a two-panel graph using proc sgpanel in SAS 9.4. The y-axis should be the same for the two panels, but I want both x-axis to have different values . Can that be done using SGPANEL?
Best regards,
Shaifali
Yes, use the UNISCALE option on your PANELBY statement to specify which axis you want fixed, the col, row or both. The default is both, which is not what you want, so specify that only the cols are fixed.
panelby yourSpecifications / uniscale = cols;
I have created two graphs using density functions but don't know how to calculate the overlap coefficient in SAS using the following codes:
proc sgplot data=combined_cohort;
density S_exposed / legendlabel='exposed' lineattrs=(pattern=solid);
density S_unexposed / legendlabel='unexposed' lineattrs=(pattern=solid);
keylegend / location=inside position=topright across=1;
xaxis display=(nolabel);
I am running a regression with two fixed effects categories (country and year, is economic macro data). Since I am using xtreg, one is autohid, but the other is a variable:
xtreg fiveyearyg taxratio i.year if taxratiocut == 1, i(wbcode1) fe cluster(wbcode1)
estimates store yi
I am running a number of these and I want to graph the coefficients for taxratio from each. But when I store the data, it stores both the taxratio coefficient, and the 50+ coefficients for the year fixed effects.
After a lot of searching, I cannot find any way to store (or recall) just part of the regression output, the one coefficient (with SEs) that I care about. Does anyone know a way to do that?
Here is how you can do that:
webuse grunfeld,clear
qui xtreg mvalue invest i.year,fe cluster(company)
//e(b) stores coefficient matrix and e(V) stores variance-covariance matrix. For details type: ereturn list after running the model
//let's say you want to extract only the coefficient on invest
mat coef_matrix=e(b)
scalar coef_invest=coef_matrix[1,1]
dis coef_invest
1.7178414
//to extract se of the the coefficient on invest
mat var_matrix=e(V)
mat diag_var_matrix=vecdiag(var_matrix) //diagonal elements are variances and the standard errors are square roots of these variances
matmap diag_var_matrix se_matrix , m(sqrt(#))) //you need to install matmap using ssc install matmap, you will get error if variance is negative
scalar se_invest=se_matrix[1,1]
dis se_invest
.14082153
Accessing coefficients is as easy as calling _b[varname]; analogously the corresponding standard errors: _se[varname].
An example:
webuse grunfeld, clear
qui xtreg mvalue invest i.year,fe cluster(company)
// coef for invest
display _b[invest]
// std error for invest
display _se[invest]
// displayed results in matrix
matrix list r(table)
For multiple-equation models use [eqno]_b[varname] where the preceding bracket contains an equation number.
More detail can be found in [U] 13.5 Accessing coefficients and standard errors.
Starting Stata 12, estimation commands also store results in r() [and not just e()]. Notice I listed r(table), which contains most results displayed by the estimation command xtreg.
You show interest in plotting coefficients, so you should read on the user-written command coefplot. Run ssc install coefplot to download and help coefplot to get started. It has many options.
Edit
A complete example that plots only coefficients for invest (leaving out those for year), using coefplot, and based on conditional regressions is:
clear
set more off
webuse grunfeld
xtreg mvalue invest i.year if time <= 10,fe cluster(company)
estimates store before10
xtreg mvalue invest i.year if time > 10,fe cluster(company)
estimates store after10
coefplot before10 after10, keep(invest)
I am running a logistic regression in SAS. SAS output odds ratio estimates with point estimate and 95% confidence limit. How can I output standard error in odds ratio output (not in parameter estimates) together with point estimate and 95% confidence limit
proc logistic data=data1;
class rank / param=ref;
model admit = gre gpa rank;
run;
This is the output for odds ratio estimate, how can I make SAS add standard error for odds ratio estimate?
Odds Ratio Estimates
Point 95% Wald
Effect Estimate Confidence Limits
GRE 1.002 1.000 1.004
GPA 2.235 1.166 4.282
RANK 1 vs 4 4.718 2.080 10.701
RANK 2 vs 4 2.401 1.170 4.927
RANK 3 vs 4 1.235 0.572 2.668
Thanks in advance for any help
There was a similar question to this before (how to prevent midpoints from extending), but does not answer my question.
I'm creating a histogram as follows and outputting it to a PNG file:
ods graphics on / imagename = "histoOne" imagefmt = png reset=index border=off width=4in;
ods select where=(_name_ ? 'Histogr');
proc univariate data=myData noprint; *(WHERE=(sumStake < 250));
Title1;
var sumStake;
histogram sumStake / name='histogr' vminor=4 grid lgrid=34 endpoints=0 to 250 by 20 cfill=red;
*Omit the inset, because the stats refer to the reduced dataset;
INSET n (comma11.0) mean (5.2) median (5.2) std='Std Dev'(5.2) max='Max' (5.2) / pos = ne
header = 'Summary Statistics' cfill = ywh;
run;
ods graphics off;
I want to display both the histogram and the summary statistics inset. However, the data is so skewed, that it makes no sense to show the maximum value for sumStake on the X-Axis. I want to cap the X-Axis at 250.
SAS keeps extending the ENDPOINTS value. How can I suppress this?
I don't want to use the (WHERE=(sumStake < 250)); filter as the count, mean, median and max in the inset will be based on the reduced sample, rather than the entire sample and will make no sense.
You may need to change your data in some fashion, or do the graph in a different way. Histograms in SAS don't allow much mucking about with the data in this fashion; you have to do it ahead of time. Histograms are meant largely for showing how your data falls out, so it's a bit counterintuitive to 'hide' some of the data fallout - I understand why you want to, but it is not exactly the primary purpose of histograms, hence why the functionality isn't there in SAS.
I don't think in any event that PROC UNIVARIATE gives you any ability to control this, so you may lose the inset. You can control the axis length explicitly in PROC SGPLOT histograms (with an AXIS statement in PROC SGPLOT), but they don't have the same kind of inset - you could make something probably, but not as simply. It also will still make the oversized bins, and won't reallocate those over-binned records.
Another option, particularly if you're making the inset separately anyway, would be to do the SGPLOT histogram (or bar chart) with data you've 'fixed' (right censored) and calculate the inset data separately (on the uncensored data).