Binned Bar chart using SAS - sas

I'm trying to make a bar chart using SAS. I have multiple salaries data and I'd like to show a bar chart with the frequencies of these salaries. I've made this:
ODS GRAPHICS ON;
PROC FREQ DATA=WORKERS.SORT ORDER=INTERNAL;
TABLES salaries / NOCUM SCORES=TABLE plots(only)=freq;
RUN;
ODS GRAPHICS OFF;
It works, the problem is, that now I can see all (hundreds) of the salaries on the x-axis. I'd like to have just intervals of these salaries (like 20) so that I can get a more readable chart. I just can't find out how to do it. I've also tried it with this:
PROC CHART DATA=WORK.SORT;
vbar salaries;
RUN;
but that's a text representation of the chart, so I can't use it.

You can create a format and apply the format to the variable you want to group into buckets. Here's an example:
proc format ;
value myfmt
low - 13 = '13 and Under'
14 - high = '14 and Above';
run;
ODS GRAPHICS ON;
PROC FREQ DATA=sashelp.class ORDER=INTERNAL;
format age myfmt.;
TABLES age / NOCUM SCORES=TABLE plots(only)=freq;
RUN;
ODS GRAPHICS OFF;

Use PROC UNIVARIATE with the HISTOGRAM statement. http://support.sas.com/documentation/cdl/en/procstat/66703/HTML/default/viewer.htm#procstat_univariate_toc.htm
ods html;
proc univariate data=sashelp.cars noprint;
var msrp;
histogram;
run;
There are options for specifying bin size:
ods html;
proc univariate data=sashelp.cars noprint;
var msrp;
histogram / midpoints=30000 to 180000 by 30000;
run;

And just for completeness, I'll add another solution in case you want more control over the chart's appearance. Using the Graphics Template Language you can create some very nice looking charts.
The proc template statement defines how the chart will look. The sgrender runs the chart against the specified dataset. There's all kinds of options that are best explored in the online doc: http://support.sas.com/documentation/cdl/en/grstatgraph/65377/HTML/default/viewer.htm#p1sxw5gidyzrygn1ibkzfmc5c93m.htm
I've just taken the sample they provided and added the / nbins=20 option to have it automatically group into 20 bins. It also has options for start and end bin, bin size, etc..
proc template;
define statgraph histogram;
begingraph;
entrytitle "Histogram of Vehicle Weights";
layout overlay /
xaxisopts=(label="Vehicle Weight (LBS)")
yaxisopts=(griddisplay=on);
histogram weight / nbins=20;
endlayout;
endgraph;
end;
run;
proc sgrender data=sashelp.cars template=histogram;
run;

Related

Overlay the average trend on group by trends using Proc sgplot

I want to create a line graph that includes the overall trend of a disease rate and the specific trends for males and females. I use the following code for to create the group by trends. How to add he average trend to this line graph. Thanks for your help.
proc sgplot data=have ;
vline year/response=disease_rate group=sex stat=mean datalabel=disease_rate ;
yaxis values=(0,1) label="Percentage";
run;
Here's an example of summarizing it and then displaying it on the graph. There are more than one way to do this though, this is just one.
data have;
set sashelp.heart(in=a);
year=round(2021-ageAtStart, 10);
disease_rate= status="Dead";
run;
proc means data=have mean noprint;
class sex year;
types sex sex*year;
var disease_rate;
output out=summary_stats mean=average_value;
run;
proc sort data=summary_stats;
by sex year;
run;
data graph_data;
merge summary_stats(where=(_type_=2) rename=average_value=mean_sex_year)
summary_stats(where=(_type_=3) rename=average_value = mean_sex);
by sex;
format mean_sex: percent12.1;
run;
proc sgplot data=graph_data ;
*where year > 1990;
vline year/response=mean_sex_year group=sex stat=mean datalabel=mean_sex_year ;
vline year/response=mean_sex group=sex stat=mean datalabel=mean_sex ;
run;
Use series instead of vline so that you can overlay a regression on top of it to get an average trend line. For example:
proc sql;
create table have as
select date
, region
, sum(sale) as sale
from sashelp.pricedata
group by region, date
order by region, date
;
quit;
proc sgplot data=have;
series x=date y=sale / group=region;
reg x=date y=sale / group=region;
xaxis fitpolicy=rotatethin;
run;

Output distribution charts in a pdf format? I only need the chart from a proc univariate of all the variables in a table

How can I output distribution charts in a pdf format? I only need the chart from a proc univariate of all the variables in a table - not any additional metrics.
ods pdf file="aaaa.pdf
TITLE 'Summary of Weight Variable (in pounds)';
PROC UNIVARIATE DATA = sashelp.class NOPRINT;
HISTOGRAM _all_ / NORMAL;
RUN;
ods pdf close
You use ODS SELECT _chartname_ to limit the output to what you want. You need to remove the NOPRINT option though or no output is generated to display regardless of destination.
It looks like univariate produces: CDFPlot, Histogram, PPplot, Probplot, QQplot so assuming you want just the histogram add the following line to your code:
ods select histogram;
Full code:
ods pdf file="aaaa.pdf;
ods select histogram;
TITLE 'Summary of Weight Variable (in pounds)';
PROC UNIVARIATE DATA = sashelp.class ;
HISTOGRAM _all_ / NORMAL;
RUN;
ods pdf close
Add it before or within your PROC UNIVARIATE.
PS. You're missing a semicolon on your ODS PDF statements at the top and bottom.
A good blog post from the SAS website on this is available here.

How to clear "Results" from Proc Univariate to show only a specific table

I´ve been using the UNIVARIATE proccedure in order to get the p-value from a series of distributions (lognormal, exponential, gamma) and have reached the following problem:
I am using the following code to get the p-values of the goodness of fit tests for each of the distributions:
ods select all/*ParameterEstimates GoodnessOfFit*/;
proc univariate data=results.Parametros_Prueba_1;
var Monto_1.;
histogram /
lognormal (l=1 color=red SHAPE=&ParamLOGN2_1 SCALE=&ParamLOGN1_1)
gamma (l=1 color=red SHAPE=&ParamGAM1_1 SCALE=&ParamGAM2_1)
exponential (l=2 SCALE=&ParamEXP1_1);
ods output GoodnessOfFit=results.Goodness_1;
run;
proc print data=results.Goodness_1;
After running the previous code I get the "Results" which gives me the histogram graphic and other descriptive information about the tests. I am looking for a way to get this "Results" print to show only the last part corresponding to the "proc print" added on the last line.
Thanks in advance!
If you want no output to the screen (results window) from PROC UNIVARIATE, then the simplest answer is:
ods select none;
proc univariate ... ;
run;
ods select all;
proc print ... ;
run;
ods select none; tells ODS to not make any ODS output whatsoever. You'll still get your ODS OUTPUT though as that comes in afterwards.
ods select none;
proc univariate data=sashelp.class;
var height;
histogram name='univhist' /
lognormal (l=1 color=red )
gamma (l=1 color=red )
exponential (l=2 );
ods output GoodnessOfFit=Goodness_1;
run;
ods select all;
proc print data=Goodness_1;
run;
Now, you'll note you don't get your histogram; that one is harder. It unfortunately changes its name every time you run it, and even if you use the NAME= option, that'll only work the first time it's run. You need to use PROC GREPLAY to delete it.
proc greplay nofs igout=work.gseg;
delete 'univhist';
run; quit;
(Assuming UNIVHIST is the name you assign it.)

Output Winsorized dataset

A user here gave me the following code (SAS: PROC UNIVARIATE: Output trimmed mean to dataset) to calculate and output a winsorized mean to a datset:
proc sort data=sashelp.class out=have;
by sex;
run;
ods trace on;
PROC UNIVARIATE DATA=have trimmed=0.05;
VAR age;
by sex;
ods output TrimmedMeans=trimmedMeans;
run;
ods trace off;
How can I output a new version of the sashelp.class dataset with ALL the observations for age winsorized, rather than calculating a winsorized mean by sex. I don't want to winsorize at the category level, as I will be censoring data that is an outlier in that category and not necessarily an outlier in the entire datset.
Could you add a variable with a constant value and group by it?
This should solve the grouping issue.

How to construct histograms with unequal class widths in SAS?

I am trying to create histograms in sas with the help of proc univariate in sas. But it gives me histograms with equal class widths. Suppose i want to have a histogram with first class interval from 1 to 10 and second class interval from 10 to 100.
I tried using-
proc univariate data=sasdata1.dataone;
var sum;
histogram sum/ midpoints=0 to 10 by 10 10 to 100 by 90 ;run;
But this does not work. What is the correct way of doing this?
You can't do it with UNIVARIATE as far as I know, but any of the SGPLOT/GPLOT/etc. procedures will work; just bin your data into a categorical variable and VBAR that variable.
If you're okay with frequencies (not percents), this would work:
data test;
set sashelp.class;
do _t = 1 to floor(ranuni(7)*20);
age=age+floor(ranuni(7)*10);
output;
end;
run;
proc format;
value agerange
low-12 = "Pre-Teen"
13-14 = "Early Teen"
15-18 = "Teen"
19-21 = "Young Adult"
22-high = "Adult";
quit;
ods graphics on;
ods preferences;
proc sgplot data=test;
format age agerange.;
vbar age;
run;
I believe if you need percents, you'd want to PROC FREQ or TABULATE your data first and then SGPLOT (or GPLOT) the results.
I did find a macro that can be used to create histograms with unequal endpoints.
The code can be found in the NESUG 2008 proceedings