drawing histogram and boxplot in SAS - sas

I wrote the following code in sas, but I did not get result!
The result histogram in grey and the range of data is not as I specified! what is the problem?
I got the following warning too: WARNING: The MIDPOINTS= list was extended to accommodate the data
what about color?
axis1 order=(0 to 100000 by 50000);
axis2 order=(0 to 100 by 5);
run;
proc capability data=HW2 noprint;
histogram Mvisits/midpoints=0 to 98000 by 10000
haxis=axis1
cfill=blue;
run;
.......................................
I have the same problem with boxplot, for example I got the following plot and I want to change the distances, then I could see the plot better, but I could not.

The below is for proc univariate rather than proc capability, I do not have access to SAS/QC to test, but the user guide shows very similar syntax for the histogram statements. Hopefully, you'll be able to translate it back.
It looks like you are having problems with the colour due to your output system. Your graphs are probably delivered via ODS, in which case the cfill option does not apply (see here and not the Traditional Graphics tag).
To change the colour of the histogram bars in ODS output you can use proc template:
proc template;
define style styles.testStyle;
parent = styles.htmlblue;
style GraphDataDefault /
color = green;
end;
run;
ods listing style = styles.testStyle;
proc univariate data = sashelp.cars;
histogram mpg_city;
run;
An example explaining this can be found here.
Alternatively you can use proc sgplot to create a histogram with more control of the colour as follows:
proc sgplot data = sashelp.cars;
histogram mpg_city / fillattrs = (color = red);
run;
As to your question of truncating the histogram. It doesn't really make a great deal of sense to ignore the extreme values as it will give you an erroneous image of the distribution, which somewhat defeats the purpose of the histogram. That said, you can achieve what you are asking for with bit of a hack:
data tempData;
set sashelp.cars;
tempClass = 1;
run;
proc univariate data = tempData noprint;
class tempClass;
histogram mpg_city / maxnbin = 5 endpoints = 0 to 25 by 5;
run;
In the above a dummy class tempClass is created and then comparative histograms are requested using the class statement. maxnbins will limit the number of bins displayed only in a comparative histogram.
Your other option is to exclude (or cap) your extreme points before creating the histogram, but this will lead to slightly erroneous frequency counts/percentages/bar heights.
data tempData;
set sashelp.cars;
mpg_city = min(mpg_city, 20);
run;
proc univariate data = tempData noprint;
histogram mpg_city / endpoints = 0 to 25 by 5;
run;
This is a possible approach to original question (untested as no SAS/QC or data):
proc capability data = HW2 noprint;
histogram Mvisits /
midpoints = 0 to 300000 by 10000
noplot
outhistogram = histData;
run;
proc sgplot data = histData;
vbar _MIDPT_ /
response = _OBSPCT_
fillattrs = (color = blue);
where _MIDPT_ <= 100000;
run;

Related

I need help printing multiple confidence intervals in sas

I am being asked to provide summary statistics including corresponding confidence interval (CI) with its width for the population mean. I need to print 85% 90% and 99%. I know I can either use univariate or proc means to return 1 interval of your choice but how do you print all 3 in a table? Also could someone explain the difference between univariate, proc means and proc sql and when they are used?
This is what I did and it only printed 85% confidence.
proc means data = mydata n mean clm alpha = 0.01 alpha =0.1 alpha = 0.15;
var variable;
RUN;
To put all three values in one table you can execute your step three times and put the results in one table by using an append step.
For shorter code and easier usage you can define a macro for this purpose.
%macro clm_val(TAB=, VARIABLE=, CONF=);
proc means
data = &TAB. n mean clm
alpha = &CONF.;
ods output summary=result;
var &VARIABLE.;
run;
data result;
length conf $8;
format conf_interval percentn8.0;
conf="&CONF.";
conf_interval=1-&CONF.;
set result;
run;
proc append data = result
base = all_results;
quit;
%mend;
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.01);
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.1);
%clm_val(TAB=sashelp.class, VARIABLE=age, CONF=0.15);
The resulting table looks like this:

How do I adjust bins to endpoint instead of midpoint in proc sgpanel

I've got a panel of three histograms and I've been able to figure out how to tweak all of the formatting except for one thing: getting the ticks to be the endpoints for the bins, instead of the midpoints.
I know that in 'proc univariate,' one can use an 'endpoints=' option in the histogram statement.
However, I cannot find a similar statement in the documentation for 'proc sgpanel'
Here is my code:
ods graphics on;
title "Baseline";
proc sgpanel data=baseline;
panelby scrp_cohort2 / rows=3 layout=rowlattice;
histogram pt_eq5d3l_health_state / boundary=lower group=scrp_cohort2;
where time=0;
colaxis min=0 max=100 grid values=(0 to 100 by 10);
run;
ods graphics off;
Specify a colaxis offsetmin and offsetmax that are 1/2 the bin width (as fraction).
Example:
Three SGPANEL runs to compare and contrast. The final one is the one you want.
data have;
call streaminit(2021);
do panel = 1 to 3;
do _n_ = 1 to 100 + rand('integer',50);
id + 1;
group = rand('integer',3);
do time = 0 to 10;
status = rand('integer',0,100);
output;
end;
end;
end;
stop;
run;
ods html file='gfx.html';
ods graphics on/ height=400 width=500;
title "Baseline";
proc sgpanel data=have;
panelby panel / rows=3 layout=rowlattice;
histogram status / boundary=lower group=group;
where time=0;
run;
proc sgpanel data=have;
panelby panel / rows=3 layout=rowlattice;
histogram status / boundary=lower group=group;
where time=0;
colaxis min=0 max=100 grid values=(0 to 100 by 20);
run;
proc sgpanel data=have;
panelby panel / rows=3 layout=rowlattice;
histogram status / boundary=lower group=group;
where time=0;
colaxis grid values=(0 to 100 by 10)
offsetmin=0.05
offsetmax=0.05
;
run;
ods graphics off;
ods html close;
The issue here is that you're trying to manipualte a histogram, which is a chart that is not a discrete-values chart, even though it looks like it is such a chart. For example, VBAR would offer a discreteoffset option that would let you do exactly what you ask.
However, a histogram is a chart that graphs not discrete values on an x/y axis, just in a particular way that ends up looking sort of like a bar chart. So it won't let you move the labels around, because they're not just labels - they're fixed positions on the axis, which the histogram is collapsing points around.
Unfortunately, the endpoints option isn't available for PROC SGPANEL, which of course would be how you'd ideally solve this issue. You have a couple of options for what would work, depending on what you want to do exactly and what your data look like.
First, you can simply summarize your data using proc univariate or whatever works best, and then use vbar to graph the (now discrete) data. You can get a histogram dataset out of proc univariate easily enough (with ODS OUTPUT or OUTHISTOGRAM= option) with by statement for your group/panel values, and then you can graph that with VBAR in SGPANEL.
Second, you can make some adjustments to how things are done in SGPANEL, which might be enough for your needs. Look at the following graph, using Richard's example data.
proc sgpanel data=have;
panelby panel / rows=3 layout=rowlattice;
histogram status / boundary=lower group=group binstart=-5 binwidth=10;
where time=0;
colaxis min=0 max=100 grid values=(0 to 100 by 10) ;
run;
What it does is start the bins at -5, instead of at 0, but the colaxis is still starting at zero. That's now accurately doing what you want, I think - except that 0 itself ends up in the -5 bar, which you might not want. The bins are now centered at 5/15/25/35/etc., which is hopefully what you do want. If you do have 0 in your data, you may be able to use options to move where 0 is bucketed (but it would affect all of the other exact endpoints also).
This is what that looks like with the 0's removed. If there are actual 0's, then you would have a bar to the left of the plot area, though.
Here is the same thing but with 0's in it, which you'll note means a bar to the left of 0.
This is a similar plot but with 0's allowed, and with boundary=upper which moves all of the exactly-on-bin-boundaries to the upper bin (so 0 goes to the 0-10 bin). Note the other changes - and there is now a 100-110 bar which contains the 100 values.
Code for the latter chart (earlier chart is same but boundary=lower):
title "Baseline";
proc sgpanel data=have;
panelby panel / rows=3 layout=rowlattice;
histogram status / boundary=lower group=group binstart=-5 binwidth=10 boundary=upper;
where time=0;
colaxis min=0 max=100 grid values=(0 to 100 by 10) ;
run;

Only output the ROC curve in SAS

I am looking to create a pdf with 4 nice graphs for different analysis. My question is, how do I output only the ROC curve for my logistic regression?
I use the following code
TITLE2 JUSTIFY=CENTER "Rank ordering characteristic curve (ROC)";
ODS GRAPHICS ON;
PROC LOGISTIC
DATA = input
plots(only)=(roc(id=obs))
;
MODEL y
(Event = '1')= x
/
SELECTION=NONE
LINK=LOGIT;
RUN;
QUIT;
ODS GRAPHICS OFF;
and a dummy dataset can be imagined using this
DATA HAVE;
DO I = 1 TO 100;
Y = RAND('integer',0,1);
x = ranuni(i);
output;
end;
run;
Thanks
EDIT: just to be explicit, I'm looking to output just a plot of the ROC curve and nothing else, i.e. the tables containing the somers' D etc.
ODS SELECT ROCCURVE;
ODS SELECT allows you to control the output and include only the tables/output you want.
You can wrap your code in ODS TRACE ON, ODS TRACE OFF to find out what the table name, or check the documentation.

By group controlling line colors/where clause

I want to plot Y by X plot where I group by year, but color code year based on different variable (dry). So each year shows as separate line but dry=1 years plot one color and dry=0 years plot different color. I actually figured one option (yeah!) which is below. But this doesn't give me much control.
Is there a way to put a where clause in the series statement to select specific categories so that I can specifically assign a color (or other format)? Or is there another way? This would be analogous to R where one can use multiple line statements for different subsets of data.
Thanks!!
This code works.
proc sgplot data = tmp;
where microsite_id = "&msit";
by microsite_id ;
yaxis label= "Pct. Stakes" values = (0 to 100 by 20);
xaxis label= 'Date' values = (121 to 288 by 15);
series y=tpctwett x=jday / markers markerattrs=(symbol=plus) group = year grouplc=dry groupmc=dry;
format jday tadjday metajday jdyfmt.;
label tpctwett='%surface water' tadval1='breed' metaval1='meta';
run;
Use an Attribute map, see the documentation
You can use the DRY variable to set the specific colours. For each year, assign the colour using the DRY variable in a data step.
proc sort data=tmp out=attr_data; by year; run;
data attrs;
set attr_data;
id='year';
if dry=0 then linecolor='green';
if dry=1 then linecolor='red';
keep id linecolor;
run;
Then add the dattrmap=attrs in the PROC SGPLOT statement and the attrid=year in the SGPLOT options.
ods graphics / attrpriority=none;
proc sgplot data = tmp dattrmap=attrs;
where microsite_id = "&msit";
by microsite_id ;
yaxis label= "Pct. Stakes" values = (0 to 100 by 20);
xaxis label= 'Date' values = (121 to 288 by 15);
series y=tpctwett x=jday / markers markerattrs=(symbol=plus) group = year grouplc=dry groupmc=dry attrid=year;
format jday tadjday metajday jdyfmt.;
label tpctwett='%surface water' tadval1='breed' metaval1='meta';
run;
Note that I tested and edited this post so it should work now.

How to construct histograms with unequal class widths in SAS?

I am trying to create histograms in sas with the help of proc univariate in sas. But it gives me histograms with equal class widths. Suppose i want to have a histogram with first class interval from 1 to 10 and second class interval from 10 to 100.
I tried using-
proc univariate data=sasdata1.dataone;
var sum;
histogram sum/ midpoints=0 to 10 by 10 10 to 100 by 90 ;run;
But this does not work. What is the correct way of doing this?
You can't do it with UNIVARIATE as far as I know, but any of the SGPLOT/GPLOT/etc. procedures will work; just bin your data into a categorical variable and VBAR that variable.
If you're okay with frequencies (not percents), this would work:
data test;
set sashelp.class;
do _t = 1 to floor(ranuni(7)*20);
age=age+floor(ranuni(7)*10);
output;
end;
run;
proc format;
value agerange
low-12 = "Pre-Teen"
13-14 = "Early Teen"
15-18 = "Teen"
19-21 = "Young Adult"
22-high = "Adult";
quit;
ods graphics on;
ods preferences;
proc sgplot data=test;
format age agerange.;
vbar age;
run;
I believe if you need percents, you'd want to PROC FREQ or TABULATE your data first and then SGPLOT (or GPLOT) the results.
I did find a macro that can be used to create histograms with unequal endpoints.
The code can be found in the NESUG 2008 proceedings